This data is a collection of attributes regarding student achievement at 2 different Portuguese schools. We have 2 datasets to explore: One with 395 math students and another with 649 students studying portuguese. Both datasets include various attributes such as student grades, demographic, social and school related features collected through the school. There are 33 columns in each dataset, but only 3 variables are quantitative. These include, age, Number of absences, and Number of failed classes. The categorical variables in this dataset include, gender, internet access, alcohol consumption (1 to 5 scale), health status (1 to 5 scale), and more.
We are interested in the relationship between student alcohol consumption and academic performance. However, when analyzing said variables, we recognize the presence of potential confounding variables. In essence, we want to see which confounding variable has the biggest impact on student grades. Therefore, we will investigate the following research questions: how do external factors such as family affect academics? Are unstable familial relationships detrimental to school? Does distance from school affect a student’s performance? Does the reason the student chose the school affect their study time for courses? Furthermore, we are interested if these trends will hold regardless of subject (i.e. do these trends hold for both math and portuguese?).
From these graphs, we see that the median final grade generally decreases as the weekday alcohol consumption increases. However, we note that the median final grades are relatively similar. As such, we hypothesize that final grades are also affected by other variables, specifically students skipping class (called the ‘absences’ variable in the dataset) and if students do not study (called the ‘studytime’ variable in the dataset). Below we have scatterplots for absences vs final grades and boxplots for study time vs final grades:
Further analysis of these graphs seem to agree with this hypothesis. For the two scatterplots, we see that the higher the number of absences, the lower the grade. From the boxplots, we also see on average, as study time increases, the final grade also increases. Therefore, we will investigate what variables affect both the number of absences and study time, since these two variables directly correlate to the final grade.
The following 2 dendrograms below will emphasize on the absences variable and weekday alcohol consumption. The graphs show students clustered by absences with labels on the bottom (color is dependent on weekday alcohol consumption). The top dendrogram is for math students and the bottom one for Portuguese students.
The graph is clustered by absences, and the label colors correspond to the different levels of workday alcohol consumption. For each cluster, there doesn’t seem to be any noticeable pattern of the colors. As all colors seem to be spread throughout the dendrogram, we conclude that there may be other confounding variables besides alcohol consumption that affect absences.
We found in our data that absences in school do indeed affect academic performance as measured by grades. Therefore, we will investiage what factors may have a relationship with absences. One factor worth considering is the quality of family relationships, denoted as the variable “famrel” in our math class data set.
We can see that there exists a trend, where a quality of 1 depicts the least absences and a quality of 4 depicts the most absences for both graphs in the facet. However, when we look at family educational support, those who do receive it actually have more absences, which is counter intuitive. Therefore, we conclude that family relationships affect student absences, but in a manner we wouldn’t expect.
We are now interested in the relationship between school travel time and absences. The travel time variable has 4 categories: 1 (<15 min.), 2 (15 to 30 min.), 3 (30 min. to 1 hour), and 4 (>1 hour).
We can see a negative trend between absences and travel time: the longer the travel time, the significantly smaller number of absences students have. While we do not know the exact explanation for this, it is possible that students who live further from the school and chose to commute are more motivated to attend class.
When considering if a student lives in an Urban or Rural area, we also notice a trend: for shorter commute times, more absences are related to students who live in urban areas. With commute times that are longer, the urban and rural absences are close to an even split. Therefore, similar to the findings of quality of family relationships, travel time does seem to have a significant impact on the number of absences.
We wanted to learn about how external factors such as family affects a student’s academics, which suggests we should examine famsup and studytime.
From the math boxplot, we see that students who have parents supporting their educational careers spend more time studying. This finding is not consistent with students in the Portuguese class, as the boxplots look identical. However, this may be due to the fact that math is a much more complex subject and requires more study time. Therefore, we cannot immediately conclude that students who have parents supporting their education tend to commit more time studying and preparing for school.
We want to explore the relationship between the reason variable and study time variable. To do this, we created a mosaic plot between variables ‘reason’ and ‘studytime’
We can see that the reason why the student chose the school actually does have an effect on study time for both the math and Portuguese subjects.
About 90% of students who chose the school by its reputation studied more than 2 hours a week. This group of students also had the highest percentage of students studying more than 5 hours a week. This could mean that the students who chose the school for its reputation could be the students getting higher final grades since increased study time is correlated with higher grades.
Therefore, we conclude that a student’s reason for attending school is correlated with the amount of time and effort they spend studying.
In our research, we investigated what factors influence the final grade for students in 2 subjects: Math and Portuguese. We looked into factors such as absences, studytime, quality of family relationships, travel time, and reason for attending school.
From our research, we found that absences seem to influence Final grades a bit more than Study Time. In particular, an increase in absences means missing lesson plans, and students who miss out/don’t understand the important concepts will do worse on tests, correlating to a lower Final Grade.
Additionally, we found that travel time from home to school, quality of family relationships, and whether the student lives in an urban vs. rural neighborhood all affect the number of student absences. However, weekday alcohol consumption doesn’t seem to have a relationship with absences, which was a very interesting takeaway.
In the last section, we analyzed whether there were any key factors influencing a student’s study time. From the side-by-side boxplots, we noticed that family support did affect study time for the math class, but not for the Portuguese class. Our mosaic plots demonstrated that the reason why the student chose the school (for courses, reputation, proximity, etc.) was influential in determining study time for both Math and Portuguese.
With all these analysis and findings on the relationship between final grades vs. absences/study time, and what factors impact the latter, we believe this report could be influential when deciding to build a new school or renovate an existing school in Portugal. Student demographics can be taken into account in order to reduce student absences and motivate them to study for more hours, so that their knowledge of course material as well as final grades improve.