Introduction

The data were obtained in a survey of students taking math courses in a Portuguese secondary school[1]. The data set contains 395 observations (students) and 33 variables that tell information about the students, including their gender, family, study, and alcohol consumption. Among all attributes, we find the following:

  • Variables such as school, gender, home address, family size, parent’s cohabitation status, whether being provided with extra-educational support from the school, whether being provided with extra-educational support from their family, whether they are taking extra paid classes, whether there are extra-curricular activities, whether they attend nursery school, whether they want to continue higher education, whether they have internet access at home, and whether they are in a romantic relationship are presented as binary variables.

  • Variables such as father’s job, mother’s job, the reason to choose the school, and guardianship are categorical (nominal) variables.

  • Variables such as mother’s education level, father’s education level, travel time between home and school, weekly study times, quality of the family relationship, free time after school, the frequency of going out with friends, workday alcohol consumption level, weekend alcohol consumption level, and current health status are coded as quantitative ordinal variables.

  • Variables such as age, number of past class failures, grade from the first and second periods, and the final grade are quantitative numerical variables.

In this project, we will study and perform analysis on students’ development in terms of their alcohol consumption, grade in school, health status, and family relationships. We do so by trying to answer four research questions:

  • Is students’ alcohol consumption level related to their family relationship quality and romantic relationship?
  • How do extracurricular activities and the quality of family relationships associate with students’ current health conditions?
  • Does parents’ job and education level relate to students’ grades? If so, in which way.
  • Which variables are correlated with students’ relationship with their family most? How are these variables related to family relationships?

More Data Description

Since our project focuses on the four research questions specifically, only part of the variables is used in the project. To make the interpretation easier, here is a more detailed guide on the variables we used and how they are represented:

  • famrel: quality of family relationship (numeric: from 1-very bad to 5-excellent)
  • romantic: whether the student is involved in a romantic relationship (binary: yes or no)
  • Dalc: workday alcohol consumption (numeric: from 1-very low to 5-very high)
  • Walc: weekend alcohol consumption (numeric: from 1-very low to 5-very high)
  • health: current health status (numeric: from 1-very bad to 5-very good)
  • activities: extra-curricular activities (binary: yes or no)
  • Fjob: father’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)
  • Mjob: mother’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)
  • Fedu: father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
  • Medu: mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education, or 4 – higher education)

Question 2: How do extracurricular activities and the quality of family relationships associate with students’ current health conditions?

Much existing research shows that the involvement in extra-curricular activities and the quality of family relationships can be factors that influence people’s health. Some researchers state that “family relationships are enduring and consequential for well-being across the life course”[4]; while some researchers claim that “teens who participate in extracurriculars have better mental health”[3]. These studies motivate us to find out whether extra-curricular activities and/or quality of family relationships are associated with health conditions for students, and if so, how.

For this research question, we use health as our response variable and use famrel and activities as our two explanatory variables. To get a basic idea of the relationship between the response and explanatory variables, we use the side-by-side facetted boxplot to find the distribution of health given family relationship facetted by whether the students have extra-curricular activities or not.

Based on our facetted side-by-side boxplot, we observe some degree of association between the students’ family relationship and health level. The subjects with very bad family relationship qualities (famrel=1) tend to have worse health conditions since the mean health level for famrel=1 tends to be lower than that for the other famrel values. Subjects with better family relationships tend to have higher health conditions since the purple and blue boxes(representing famrel=4 and famrel=5) have a higher mean health level. However, whether or not the student involved in extra-curricular activity is associated with students’ health level is difficult to be seen, since there is no obvious pattern in the distribution of health given activity in the plot. Thus, we cannot determine a precise enough relationship between activities and health based on this boxplot.

Therefore, to have a clearer and deeper understanding of their associations, we use a mosaic plot to see if we can support the following claims with statistical evidence:

  • Student’s health is independent of their family relationship.
  • Student’s health is independent of whether or not they involve in extra-curricular activities.

From the left mosaic plot on Health vs. Family Relationship, it appears that there are significantly more students with a very poor health condition (health = 1) than we would expect under the null hypothesis of independence when the students have a very poor family relationship (famrel = 1). Therefore, this observation suggests that we should reject the null hypothesis that students’ health level is independent of their family relationships.

The right mosaic plot on Health vs. Extra-curricular Activities doesn’t indicate any substantially large Pearson residual, thereby not providing any evidence to reject the null hypothesis of independence between extra-curricular activities and health conditions for this dataset.

To sum up, we reach the following conclusions:

  • Students’ health and family relationships are dependent.
  • Students’ health and extracurricular are independent of each other.

Question 3: Does parents’ job and education level relate to students’ grade? If so, in what way.

Whether or not parents’ educational background and profession are associated with a student’s academic performance has been an ongoing controversial topic. Motivated by this discussion, we want to explore whether a parent’s job and education level have correlations with the student’s grades in this dataset. To answer this research question, we use Fjob, Mjob, Fedu, and Medu as explanatory variables and G3 as a response variable.

Before diving into exploring our question, we observed that the distribution of our quantitative response variable G3 is almost normally distributed, with some outliers on the left. This indicates that we do not need to perform a transformation to this variable.

We start by making four side-by-side boxplots that display the marginal distribution of the final grade by the categorical variables Fjob, Mjob, Fedu, and Medu respectively. Note that for the variables Fedu and Medu, the higher the value, the higher the education level.

From the above boxplot, we observe that students whose father’s profession is a teacher tend to have the highest grades, while those whose mother is a housewife tend to have lower grades than other students. We also observe that students whose father has a low educational level (level = 0) tend to score highest. Similarly, students whose mother has a low education level (level = 0) tend to score as high as those whose mother’s educational level is high (level = 4).

To better understand how the father or mother’s education and job as a whole relate to a student’s final grade, we visualize the variables by making a side-by-side boxplot of G3 by father’s job and education and another by mother’s job and education.

From the above boxplot, we observe that the students with the highest grades are those whose father has teaching as their profession or whose mother works in the health industry. We also notice a weak but monotonically increasing trend in final grades as parents’ (both mother and father) educational level increases. In other words, we think that there appears to be a positive correlation between parents’ education level (ranging from level = 2 to 5) and the student’s final grade: the higher the parent’s educational level, the better the student’s grades are. Despite these observations, we have to admit that our dataset is limited: we don’t have data present for certain job and education category combinations. In addition, we also need to take into account of the high correlation between professions and education levels when we arrive at our conclusions.

To take a closer look at how our four explanatory variables correlate to G3, we performed four Bartlett tests of homogeneity of variances.

## 
##  One-way analysis of means
## 
## data:  G3 and Fjob
## F = 1.3029, num df = 4, denom df = 390, p-value = 0.2683
## 
##  One-way analysis of means
## 
## data:  G3 and Mjob
## F = 3.7545, num df = 4, denom df = 390, p-value = 0.005195
## 
##  One-way analysis of means
## 
## data:  G3 and Fedu
## F = 2.8906, num df = 4, denom df = 390, p-value = 0.0222
## 
##  One-way analysis of means
## 
## data:  G3 and Medu
## F = 6.0884, num df = 4, denom df = 390, p-value = 9.242e-05

We observe p-values of 0.005195, 0.0222, and 9.242e-05 for Mjob, Fedu, and Medu respectively. Since these p-values are less than 0.05, we are 95% confident to reject the null hypothesis that they are not correlated to a student’s final grade. In other words, we conclude that Mjob, Fedu, and Medu have significant correlations with G3. Yet, since we have a p-value of 0.2683 for Fjob, we cannot conclude that we observed a correlation between the father’s job and the student’s final grade.

In a word, we make the following conclusions:

  • There are certain professions of parents that could be a factor in influencing a student’s grades: students whose father is a teacher or mother works in the health industry tend to have better academic performances.

  • A student’s mother’s job and education, as well as father’s education, have statistically significant correlations with G3.

  • Though we observed a seemly increasing trend in students’ grades as the mother’s or father’s educational level increases, it does not apply to students whose parents have the lowest level of education. Those students tend to score as high as those whose parents are very well-educated.

Conclusion

We derive the following conclusions based on our research study on students’ development in terms of their alcohol consumption, grade in school, health status, and family relationships. First, we find that whether the student is in a romantic relationship or not doesn’t seem to affect students’ alcohol consumption levels while family relationship plays a larger role: there is a negative and significant correlation between family relationship quality and weekend alcohol consumption. Second, students’ health and family relationship quality are dependent, while we do not find evidence that supports dependency between students’ health and extra-curricular activities. Third, certain professions of parents could play a part in a student’s grades (for instance, when a student’s father is a teacher); a student’s mother’s job and education, as well as father’s educational level, have statistically significant correlations with the student’s final grade. Yet, though we see a positive correlation between parents’ educational level and students’ final grades in general, it does not apply to students whose parents have the lowest level of education. Finally, we find that a student’s family relationship is most strongly correlated with their age, sex, romantic relationship, and guardianship. In particular, we find that younger male students who are single and have a mother as guardian tend to have poorer family relationships.

However, we must admit that there are certain limitations to our study. First, our dataset is biased given that we only sample students from one Portuguese school. Second, when defining grades, we only consider math grades from this dataset. It is contentious whether math grades can be used as the only criteria indicating the academic performance of a student. Third, some variables we used in this study are very subjective, such as the quality of family relationships. Our study can be more accurate if these variables are made out of the same scale or can be measured by objective methods. Fourth, since many of our variables are categorical or ordinal, the way to generate data visualization for our purposes is relatively limited.

We believe that there is much more to explore using this dataset. For example, to better comprehend how a student’s relationship with their family can affect alcohol consumption, we could take into account other latent factors such as family size or parent’s cohabitation status that are present in the dataset. Since family relationship quality is a subjective variable from the student’s perspective, these objective measurements could provide more insight into whether our conclusion regarding the weak and negative correlation between family relationships and weekend alcohol consumption makes sense. Similarly, we could explore how factors other than family relationship quality and extra-curricular activities can affect students’ health. Moreover, we could connect our first three research questions and explore how family relationship quality, alcohol consumption, health, and grades correlate. The exploration of these intriguing questions will give us a more comprehensive understanding of this dataset and its implications for the many factors that influence students’ development.

References

[1] UCI Machine Learning. (2016, October 19). Student Alcohol Consumption. Kaggle. Retrieved December 6, 2021, from https://www.kaggle.com/uciml/student-alcohol-consumption. [2] McCrady, B. S., & Flanagan, J. C. (2021, May 6). The role of the family in alcohol use disorder recovery for adults. Alcohol research : current reviews. Retrieved December 6, 2021, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8104924/. [3] University of British Columbia. (2020, November 2). Teens who participate in extracurriculars, get less screen time, have better mental health. ScienceDaily. Retrieved December 5, 2021 from www.sciencedaily.com/releases/2020/11/201102124849.htm [4] Patricia A Thomas, PhD, Hui Liu, PhD, Debra Umberson, PhD, Family Relationships and Well-Being, Innovation in Aging, Volume 1, Issue 3, November 2017, igx025, https://doi.org/10.1093/geroni/igx025