Motivation

Transitioning to university life presents a long list of challenges to first-year students, among which is the delicate balance between academic demands and physical/mental well-being. While academic success is often a top priority, maintaining healthy sleep habits is crucial for cognitive function, mood regulation, and overall health. As college students at a notoriously rigorous university, our group has lots of experience with this struggle. So, it’s no surprise that we gravitated towards data that may give us insight into our own lives.

The Data

The chosen dataset was reported by they Carnegie Mellon Repository, which collected student’s sleep and academic information from three different universities. It is presented in a wide format with each row representing data from a distinct student. Multiple variables describing student’s demographic, sleep patterns, and academic performance are recorded in different columns:

The reported quantitative and categorical variables are complex enough for our interest of research. We think that sleep and academic performance are intuitively related, so we would like to delve into the potential factors that influence students’ sleep patterns as well as how sleep affects academic performance.

Does More Sleep Lead to Better Academic Performance?

While it may seem like common knowledge that you should sleep well if you’re looking for better performance in any respect, many college students often forego sleep to keep up with their workload. So, are those late nights worth it in terms of a student’s semester GPA? Before diving into this question, we would like to visualize the distribution of semester GPA to see if there exists a difference between semester GPA across schools. As such, density curves of semester GPA of each university are displayed on the following ridge plot.

Note, from here on out we will use the following abbreviations for each university:

CMU = Carnegie Mellon University, UW = University of Washington, NDU = Notre Dame University

**Figure 1:** Ridgeline plot with semester GPA on the x-axis and university/density on the y-axis. The relative density curves of GPA for each school are displayed on top of each other.

Figure 1: Ridgeline plot with semester GPA on the x-axis and university/density on the y-axis. The relative density curves of GPA for each school are displayed on top of each other.

From a brief glance, it can bee seen that the distributions of semester GPA for CMU and UW have greater dispersion than that for NDU. UW’s distribution has three modes, approximately at: 3, 3.5, and 3.7. There are also three modes in the distribution for CMU, approximately at: 2.5, 3.5, and 3.9. There is only one mode in the distribution for NDU, approximately at 3.7. The distributions of semester GPA for all three universities are skewed left, and the center for NDU has the greatest value. Lets do further comparison of these distributions with a…

Two-Sample KS Test

## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  CMU_gpa and UW_gpa
## D = 0.13243, p-value = 0.03061
## alternative hypothesis: two-sided
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  CMU_gpa and NDU_gpa
## D = 0.37457, p-value = 6.376e-11
## alternative hypothesis: two-sided
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  UW_gpa and NDU_gpa
## D = 0.32141, p-value = 4.596e-09
## alternative hypothesis: two-sided

Since the p-values for each pair-wise comparison are all significant at alpha = 0.05, we conclude that the distribution of semester GPA is different for each university. Thus, we will split up our data by university for subsequent GPA analyses to avoid interplay between distributions.

Now that we know to differentiate semester GPAs by university, lets look at how sleep affects these GPAs by plotting them alongside students’ total sleep time.

**Figure 2:** Scatter plot with total sleep time measured in minutes on the x-axis and semester GPA on the y-axis. Plot has overlayed linear regression lines for each university.

Figure 2: Scatter plot with total sleep time measured in minutes on the x-axis and semester GPA on the y-axis. Plot has overlayed linear regression lines for each university.

The fact that all of the linear regression lines have positive slopes indicates that there may exist a positive linear relationship between total sleep time and semester GPA, implying that more sleep leads to a higher GPA. The points and regression lines are colored by university, allowing us to compare the total sleep time and GPA across universities as well as explore possible interactions between university and total sleep time. Since the slope of regression lines doesn’t variate a lot across school, it is reasonable to hypothesize that there is no interaction between university and total sleep time.

To formally test the above hypotheses, we could like to conduct a multi-linear regression model for more information.

Multi-Linear Regression Model

## 
## Call:
## lm(formula = term_gpa ~ TotalSleepTime * school, data = sleep_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.83131 -0.19456  0.07326  0.31226  0.82589 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               2.5834460  0.2581645  10.007  < 2e-16 ***
## TotalSleepTime            0.0019891  0.0006619   3.005  0.00276 ** 
## schoolNDU                 0.4461754  0.3994227   1.117  0.26440    
## schoolUW                 -0.4002610  0.3529491  -1.134  0.25721    
## TotalSleepTime:schoolNDU -0.0003405  0.0010266  -0.332  0.74028    
## TotalSleepTime:schoolUW   0.0009881  0.0008806   1.122  0.26224    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4729 on 628 degrees of freedom
## Multiple R-squared:  0.1141, Adjusted R-squared:  0.1071 
## F-statistic: 16.18 on 5 and 628 DF,  p-value: 5.04e-15

This multi-linear regression model takes CMU as the reference level. The p-value for intercept and TotalSleepTime is less than the significant level of 0.05, indicating that the true value is not 0. This means that the semester GPA for CMU students when their total sleep time equals 0 is 2.58, which is a total extrapolation and meaningless because a person shouldn’t/wouldn’t sleep 0 minutes every day. For CMU students, it is estimated that, on average, every one minute increase in total sleep time will increase their semester GPA by 0.0020.

However, the p-values for schoolNDU and schoolUW are all greater than the significant level, showing that if given the same average total sleep time, the estimated semester GPA for a student at CMU is the same as our estimate for a student at NDU and our estimate for a student at UW. The p-values for TotalSleepTime:schoolNDU and TotalSleepTime:schoolUW are also higher than the significant level. This means that the increase in semester GPA associated with one minute increase in average total sleep time is the same between CMU and NDU, as well as CMU and UW.

Although we have shown that the distributions of GPA differ for each school, the relationships between their students’ total sleep time and their semester GPAs can be modeled using the same linear regression line. This means that our conclusions about CMU students hold for college students at NDU and UW as well. Sleeping more, on average, will increase a student’s term GPA. Our regression line reported a multiple r-squared value of 0.11, meaning that 11% of the variation in a student’s term GPA can be explained by their total sleep time. Considering the multitude of factors that end up determining a student’s GPA, we would argue that 11% is quite substantial. College students should aim to get a good night’s sleep if they want to maximize their academic performance.

How Do Demographic Factors Influence Sleep Patterns?

Now that we have shown a relationship between sleep and GPA, we wanted to investigate sleep across different demographics to see how well our analysis generalizes to different groups of students. We started with violin plots to get a sense for the conditional distributions of total sleep time given race and gender.

**Figure 3:** Violin plots of total sleep time by race and facetted by gender.

Figure 3: Violin plots of total sleep time by race and facetted by gender.

The violin plots show that the median and IQR are around the same for both genders, suggesting similar distributions. Between underrepresented and non-underrepresented races, we see that the median and IQR are also close together. Thus, the overall distributions are similar. Although there are possible outliers for non-underrepresented male, underrepresented female, and non-underrepresented male, they don’t seem to cause significant differences. So far we see almost no influence on sleep from demographic factors.

To have more direct comparison of total sleep time between underrepresented and non-underrepresented as well as female and male respectively, we proceed to make the following stacked histogram.

**Figure 4:** Histogram of total sleep time by race representation. Total sleep time in minutes is on the x-axis and count is on the y-axis.

Figure 4: Histogram of total sleep time by race representation. Total sleep time in minutes is on the x-axis and count is on the y-axis.

**Figure 5:** Histogram of total sleep time by gender. Total sleep time in minutes is on the x-axis and count is on the y-axis.

Figure 5: Histogram of total sleep time by gender. Total sleep time in minutes is on the x-axis and count is on the y-axis.

Two Sample T-Test

## 
##  Welch Two Sample t-test
## 
## data:  TotalSleepTime by factor(demo_race)
## t = -0.96638, df = 118.76, p-value = 0.3358
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -19.074733   6.562707
## sample estimates:
## mean in group 0 mean in group 1 
##        395.8048        402.0608
## 
##  Welch Two Sample t-test
## 
## data:  TotalSleepTime by factor(demo_gender)
## t = -0.42329, df = 406.69, p-value = 0.6723
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -11.193802   7.227296
## sample estimates:
## mean in group 0 mean in group 1 
##        399.6999        401.6832

The above graphs show that total sleep time for underrepresented and non-underrepresented races is similarly distributed, with nearly identical peaks and spread. We see the same for total sleep time and gender, with peaks at around 400 minutes and similar spread between both genders. Overall, we observe that there does not seem to be significant changes in the distribution for total sleep time between underrepresented and non-underrepresented races, or male and female students. This is further supported by the t tests. The t-test between distributions of total sleep time for students of underrepresented and non-underrepresented race demographic has a p-value of 0.3358, which is much larger than a significance level of 0.05. Thus here is not sufficient evidence to conclude that the distributions are statistically significantly different. The t-test for distribution of total sleep time between gender demographic, male and female, has a p-value of 0.6723, which is much larger than 0.05 and thus there is not sufficient evidence that the distributions are significantly different between male and female demographics among students.

Our analysis has shown that demographics should not be considered inferencing about sleep because there are no significant differences between them.

How Does Daytime Sleep Affect Total Sleep Time?

When you’re over-worked and tired, sometimes a nap can re-energize you for the rest of the day. However, on other other occasions, you lie down for an hour-long nap and then wake up three hours later, ruining your sleep schedule for the next few days. Both situations are common, and some people can manage naps very well, while others never take them at all. Is napping good or bad for one’s holistic sleeping habits? We set out to answer this question by first seeing how frequent napping is at each university. Below is a stacked density bar chart showing the proportion of students who reported a high amount of daytime sleep, more than 41 minutes per day, at each university.

## Rows: 634 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): cohort
## dbl (14): subject_id, study, demo_race, demo_gender, demo_firstgen, bedtime_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
**Figure 6:** Stacked density bar chart of high vs. low daytime sleep at each university. University is on the x-axis and the proportion within each university is on the y-axis. High daytime sleep is defined as averaging more than 41 minutes daily.

Figure 6: Stacked density bar chart of high vs. low daytime sleep at each university. University is on the x-axis and the proportion within each university is on the y-axis. High daytime sleep is defined as averaging more than 41 minutes daily.

While NDU and UW students both reported as approximately 45% high daytime sleepers, CMU students slept much less often during the day, with less than one quarter of them being high daytime sleepers. Does this indicate anything about CMU students’ sleep in general vs students at the other universities? We looked at the distribution of total sleep time for each university to find out.

**Figure 7: ** Density plot of total sleep time conditional on university. Total sleep time in minutes is on the x-axis and density is on the y-axis.

Figure 7: Density plot of total sleep time conditional on university. Total sleep time in minutes is on the x-axis and density is on the y-axis.

Although CMU students seem to nap less than students at other universities, you can see in the graph above that they actually sleep around the same amount as NDU students. Both CMU and NDU students, on average, sleep slightly less in total than UW students, however, all three distributions peak at a little over 400 minutes (6.67 hours) of total sleep time. Barely any students at all are getting the recommended 8 hours of sleep.

However, as daytime napping also being the component of student’s overall sleep, we would like to see if daytime sleep complement the lack of sleep at night. Promoted by this question, we make the following heat map which depicts the joint distribution of total sleep time and day time sleep.

**Figure 8:** Heat map of total and daytime sleep, both measured in minutes. Daytime sleep is on the x-axis and total sleep is on the y-axis.

Figure 8: Heat map of total and daytime sleep, both measured in minutes. Daytime sleep is on the x-axis and total sleep is on the y-axis.

From the above graph we can see that most students have around 20 minutes of daytime sleep and around 400 total minutes of sleep, which is approximately 7 hours sleep a day in total. We did not observe a large density of the combination of low total sleep time and high daytime sleep, implying that acquiring longer day time sleep may not be a strategy students’ used to accommodate lack of night time sleep.

Conclusions

We aimed to investigate several key questions regarding sleep patterns and their impact on academic performance, as well as the influence of demographic factors on sleep patterns, and the effect of daytime sleep on total sleep time.

Regarding the relationship between sleep duration and academic performance, our analysis revealed a significant positive association. Students who reported sleeping more in total tended to have better semester GPAs. This finding underscores the importance of adequate sleep in supporting cognitive functions essential for learning and academic success.

Our investigation into the influence of demographic factors on sleep patterns yielded interesting results. Demographic factors such as gender and race did not exert a significant influence on sleep patterns among the study participants. This suggests that while demographic factors may play a role in other aspects of students’ lives, they do not significantly impact their sleep behaviors.

Lastly, our study examined the effect of daytime sleep on total sleep time. Our analysis demonstrated that daytime sleep, in the form of naps, was associated with decreased total sleep time. This finding highlights the importance of maintaining a consistent sleep schedule and avoiding daytime naps, particularly for individuals seeking to maximize their overall sleep duration.

In summary, our study provides evidence supporting the notion that longer sleep durations are associated with better academic performance. Additionally, we found that demographic factors do not significantly influence sleep patterns, and daytime sleep can lead to reduced total sleep time. These findings have implications for both academic institutions and individuals seeking to optimize their sleep habits for improved academic and overall well-being.

Discussion

Based on the nature of the study design, where the information are collected from students instead of carefully controlled experiment with distinct control and treatment groups, the result from the above tests could only suggest correlations between variables, rather than establishing causal relationships. In other words, there exist potential confounding variables affecting our variables; for example, a lower stress level may result in both higher GPA and longer sleep time. Since there is a moral question of controlling student’s sleep, future research may need a deeper exploration to develop a better study design. Furthermore, as our data include samples only from three universities, the conclusion from our study might not be generalized to all university students. Further research might want to include data from diverse universities (e.g., from different countries) to increase the applicability of the findings.