Transitioning to university life presents a long list of challenges to first-year students, among which is the delicate balance between academic demands and physical/mental well-being. While academic success is often a top priority, maintaining healthy sleep habits is crucial for cognitive function, mood regulation, and overall health. As college students at a notoriously rigorous university, our group has lots of experience with this struggle. So, it’s no surprise that we gravitated towards data that may give us insight into our own lives.
The chosen dataset was reported by they Carnegie Mellon Repository, which collected student’s sleep and academic information from three different universities. It is presented in a wide format with each row representing data from a distinct student. Multiple variables describing student’s demographic, sleep patterns, and academic performance are recorded in different columns:
demo_race: subject’s race, where non-underrepresented was coded as 1 and underrepresented (Black, Hispanic or Latino) was coded as 0
demo_gender: subject’s gender, where female was coded as 1 and male coded as 0
demo_firstgen: whether the subject was the first generation (neither parent completed any college) of college education, where firstgen was coded as 1 and non-firstgen was coded as 0
bedtime_mssd: average of standard deviation of bedtime in hours between days
TotalSleepTime: average time in bed (at night) in minutes
midpoint_sleep: average midpoint of bedtime and wake time, in minutes, after 11 pm (for example, 364 is 5:04 am)
frac_nights_with_data: proportion of nights with captured data
daytime_sleep: average sleep time in minutes outside of the range of the main sleep episode
cum_gpa: cumulative GPA (out of 4.0) of semesters before study
term_gpa: end-of-term GPA (out of 4.0) of the semester when the study conducted
term_units: number of course units of the semester being studied
Zterm_units_ZofZ: standardized work load, where 0 represents an average load; positive values represents loads above-average and negative values represents loads below average
The reported quantitative and categorical variables are complex enough for our interest of research. We think that sleep and academic performance are intuitively related, so we would like to delve into the potential factors that influence students’ sleep patterns as well as how sleep affects academic performance.
While it may seem like common knowledge that you should sleep well if you’re looking for better performance in any respect, many college students often forego sleep to keep up with their workload. So, are those late nights worth it in terms of a student’s semester GPA? Before diving into this question, we would like to visualize the distribution of semester GPA to see if there exists a difference between semester GPA across schools. As such, density curves of semester GPA of each university are displayed on the following ridge plot.
Note, from here on out we will use the following abbreviations for each university:
CMU = Carnegie Mellon University, UW = University of Washington, NDU = Notre Dame University
From a brief glance, it can bee seen that the distributions of semester GPA for CMU and UW have greater dispersion than that for NDU. UW’s distribution has three modes, approximately at: 3, 3.5, and 3.7. There are also three modes in the distribution for CMU, approximately at: 2.5, 3.5, and 3.9. There is only one mode in the distribution for NDU, approximately at 3.7. The distributions of semester GPA for all three universities are skewed left, and the center for NDU has the greatest value. Lets do further comparison of these distributions with a…
Two-Sample KS Test
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: CMU_gpa and UW_gpa
## D = 0.13243, p-value = 0.03061
## alternative hypothesis: two-sided
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: CMU_gpa and NDU_gpa
## D = 0.37457, p-value = 6.376e-11
## alternative hypothesis: two-sided
##
## Asymptotic two-sample Kolmogorov-Smirnov test
##
## data: UW_gpa and NDU_gpa
## D = 0.32141, p-value = 4.596e-09
## alternative hypothesis: two-sided
Since the p-values for each pair-wise comparison are all significant at alpha = 0.05, we conclude that the distribution of semester GPA is different for each university. Thus, we will split up our data by university for subsequent GPA analyses to avoid interplay between distributions.
Now that we know to differentiate semester GPAs by university, lets look at how sleep affects these GPAs by plotting them alongside students’ total sleep time.
The fact that all of the linear regression lines have positive slopes indicates that there may exist a positive linear relationship between total sleep time and semester GPA, implying that more sleep leads to a higher GPA. The points and regression lines are colored by university, allowing us to compare the total sleep time and GPA across universities as well as explore possible interactions between university and total sleep time. Since the slope of regression lines doesn’t variate a lot across school, it is reasonable to hypothesize that there is no interaction between university and total sleep time.
To formally test the above hypotheses, we could like to conduct a multi-linear regression model for more information.
Multi-Linear Regression Model
##
## Call:
## lm(formula = term_gpa ~ TotalSleepTime * school, data = sleep_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.83131 -0.19456 0.07326 0.31226 0.82589
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.5834460 0.2581645 10.007 < 2e-16 ***
## TotalSleepTime 0.0019891 0.0006619 3.005 0.00276 **
## schoolNDU 0.4461754 0.3994227 1.117 0.26440
## schoolUW -0.4002610 0.3529491 -1.134 0.25721
## TotalSleepTime:schoolNDU -0.0003405 0.0010266 -0.332 0.74028
## TotalSleepTime:schoolUW 0.0009881 0.0008806 1.122 0.26224
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4729 on 628 degrees of freedom
## Multiple R-squared: 0.1141, Adjusted R-squared: 0.1071
## F-statistic: 16.18 on 5 and 628 DF, p-value: 5.04e-15
This multi-linear regression model takes CMU as the reference level. The p-value for intercept and TotalSleepTime is less than the significant level of 0.05, indicating that the true value is not 0. This means that the semester GPA for CMU students when their total sleep time equals 0 is 2.58, which is a total extrapolation and meaningless because a person shouldn’t/wouldn’t sleep 0 minutes every day. For CMU students, it is estimated that, on average, every one minute increase in total sleep time will increase their semester GPA by 0.0020.
However, the p-values for schoolNDU and schoolUW are all greater than the significant level, showing that if given the same average total sleep time, the estimated semester GPA for a student at CMU is the same as our estimate for a student at NDU and our estimate for a student at UW. The p-values for TotalSleepTime:schoolNDU and TotalSleepTime:schoolUW are also higher than the significant level. This means that the increase in semester GPA associated with one minute increase in average total sleep time is the same between CMU and NDU, as well as CMU and UW.
Although we have shown that the distributions of GPA differ for each school, the relationships between their students’ total sleep time and their semester GPAs can be modeled using the same linear regression line. This means that our conclusions about CMU students hold for college students at NDU and UW as well. Sleeping more, on average, will increase a student’s term GPA. Our regression line reported a multiple r-squared value of 0.11, meaning that 11% of the variation in a student’s term GPA can be explained by their total sleep time. Considering the multitude of factors that end up determining a student’s GPA, we would argue that 11% is quite substantial. College students should aim to get a good night’s sleep if they want to maximize their academic performance.
Now that we have shown a relationship between sleep and GPA, we wanted to investigate sleep across different demographics to see how well our analysis generalizes to different groups of students. We started with violin plots to get a sense for the conditional distributions of total sleep time given race and gender.
The violin plots show that the median and IQR are around the same for both genders, suggesting similar distributions. Between underrepresented and non-underrepresented races, we see that the median and IQR are also close together. Thus, the overall distributions are similar. Although there are possible outliers for non-underrepresented male, underrepresented female, and non-underrepresented male, they don’t seem to cause significant differences. So far we see almost no influence on sleep from demographic factors.
To have more direct comparison of total sleep time between underrepresented and non-underrepresented as well as female and male respectively, we proceed to make the following stacked histogram.
Two Sample T-Test
##
## Welch Two Sample t-test
##
## data: TotalSleepTime by factor(demo_race)
## t = -0.96638, df = 118.76, p-value = 0.3358
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -19.074733 6.562707
## sample estimates:
## mean in group 0 mean in group 1
## 395.8048 402.0608
##
## Welch Two Sample t-test
##
## data: TotalSleepTime by factor(demo_gender)
## t = -0.42329, df = 406.69, p-value = 0.6723
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.193802 7.227296
## sample estimates:
## mean in group 0 mean in group 1
## 399.6999 401.6832
The above graphs show that total sleep time for underrepresented and non-underrepresented races is similarly distributed, with nearly identical peaks and spread. We see the same for total sleep time and gender, with peaks at around 400 minutes and similar spread between both genders. Overall, we observe that there does not seem to be significant changes in the distribution for total sleep time between underrepresented and non-underrepresented races, or male and female students. This is further supported by the t tests. The t-test between distributions of total sleep time for students of underrepresented and non-underrepresented race demographic has a p-value of 0.3358, which is much larger than a significance level of 0.05. Thus here is not sufficient evidence to conclude that the distributions are statistically significantly different. The t-test for distribution of total sleep time between gender demographic, male and female, has a p-value of 0.6723, which is much larger than 0.05 and thus there is not sufficient evidence that the distributions are significantly different between male and female demographics among students.
Our analysis has shown that demographics should not be considered inferencing about sleep because there are no significant differences between them.
When you’re over-worked and tired, sometimes a nap can re-energize you for the rest of the day. However, on other other occasions, you lie down for an hour-long nap and then wake up three hours later, ruining your sleep schedule for the next few days. Both situations are common, and some people can manage naps very well, while others never take them at all. Is napping good or bad for one’s holistic sleeping habits? We set out to answer this question by first seeing how frequent napping is at each university. Below is a stacked density bar chart showing the proportion of students who reported a high amount of daytime sleep, more than 41 minutes per day, at each university.
## Rows: 634 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): cohort
## dbl (14): subject_id, study, demo_race, demo_gender, demo_firstgen, bedtime_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
While NDU and UW students both reported as approximately 45% high daytime sleepers, CMU students slept much less often during the day, with less than one quarter of them being high daytime sleepers. Does this indicate anything about CMU students’ sleep in general vs students at the other universities? We looked at the distribution of total sleep time for each university to find out.
Although CMU students seem to nap less than students at other universities, you can see in the graph above that they actually sleep around the same amount as NDU students. Both CMU and NDU students, on average, sleep slightly less in total than UW students, however, all three distributions peak at a little over 400 minutes (6.67 hours) of total sleep time. Barely any students at all are getting the recommended 8 hours of sleep.
However, as daytime napping also being the component of student’s overall sleep, we would like to see if daytime sleep complement the lack of sleep at night. Promoted by this question, we make the following heat map which depicts the joint distribution of total sleep time and day time sleep.
From the above graph we can see that most students have around 20 minutes of daytime sleep and around 400 total minutes of sleep, which is approximately 7 hours sleep a day in total. We did not observe a large density of the combination of low total sleep time and high daytime sleep, implying that acquiring longer day time sleep may not be a strategy students’ used to accommodate lack of night time sleep.
We aimed to investigate several key questions regarding sleep patterns and their impact on academic performance, as well as the influence of demographic factors on sleep patterns, and the effect of daytime sleep on total sleep time.
Regarding the relationship between sleep duration and academic performance, our analysis revealed a significant positive association. Students who reported sleeping more in total tended to have better semester GPAs. This finding underscores the importance of adequate sleep in supporting cognitive functions essential for learning and academic success.
Our investigation into the influence of demographic factors on sleep patterns yielded interesting results. Demographic factors such as gender and race did not exert a significant influence on sleep patterns among the study participants. This suggests that while demographic factors may play a role in other aspects of students’ lives, they do not significantly impact their sleep behaviors.
Lastly, our study examined the effect of daytime sleep on total sleep time. Our analysis demonstrated that daytime sleep, in the form of naps, was associated with decreased total sleep time. This finding highlights the importance of maintaining a consistent sleep schedule and avoiding daytime naps, particularly for individuals seeking to maximize their overall sleep duration.
In summary, our study provides evidence supporting the notion that longer sleep durations are associated with better academic performance. Additionally, we found that demographic factors do not significantly influence sleep patterns, and daytime sleep can lead to reduced total sleep time. These findings have implications for both academic institutions and individuals seeking to optimize their sleep habits for improved academic and overall well-being.
Based on the nature of the study design, where the information are collected from students instead of carefully controlled experiment with distinct control and treatment groups, the result from the above tests could only suggest correlations between variables, rather than establishing causal relationships. In other words, there exist potential confounding variables affecting our variables; for example, a lower stress level may result in both higher GPA and longer sleep time. Since there is a moral question of controlling student’s sleep, future research may need a deeper exploration to develop a better study design. Furthermore, as our data include samples only from three universities, the conclusion from our study might not be generalized to all university students. Further research might want to include data from diverse universities (e.g., from different countries) to increase the applicability of the findings.