The importance of sleep has been a research focus for many decades, and it has proven to be a crucial factor of success; College students are no exception. Sleep is often overlooked and sacrificed because of students’ busy schedules. We want to look at how sleep habits can affect educational achievements. We will investigate various research questions about sleep using a dataset coming from Carnegie Mellon University and two other institutions. The dataset will help us investigate three main research questions:
1. How does total sleep time affect academic performance?
2. Does the timing of sleep affect student performance?
3. Does gender or first-generation status play a role in sleep patterns or student success?
There are a few key deatils to note about the previous research questions. Academic performance measures in this study are cumulative QPA and end of term GPA. Timing refers to when a student goes to bed and if they are sleeping in the daytime as well as at night. Gender and first-generation status are two demographic variables used in this study that will be explored. In addition to the three research questions, we will look at a linear regression model that captures the varaibles we found to be significant in the data.
The data utilized in this report comes from a study that was conducted at Carnegie Mellon University, the University of Washington, and Notre Dame University. Using sleep trackers, researchers collected data on 634 first year students’ (the 634 rows in the dataset) sleep habits as well as their academic performance. This is represented by the 634 rows in the dataset. They measured a plethora of variables (the 15 columns), some of which include:
- TotalSleepTime: The average amount of time a first year student spent in bed minus the length of total awake / restleessness.
- cum_gpa: A first year student’s cumulative GPA on a 4.0 scale for all prior semesters. Because the data set is predominantly first years, it is just their fall GPA.
- term_gpa: A first year student’s GPA out of 4.0 for the semester being studied.
- demo_gender: Binary classification for underrepresented (0) and non-underrepresented students (1). To be classified as underrepresented students need to have at least one parent that was Black, Hispanic or Latino, Native American, or Pacific Islander. To be considered non-underrepresented students couldn’t have either parent fall into the underrepresented category.
- demo_firstgen: Binary classification for first generation college students. A student would have a 0 if they weren’t first generation and a 1 if they were. To be a first generation college student, neither parent could have completed any college.
- cohort: Designation given to each first year student signifying which cohort the student belonged to.
- bedtime_mssd: Mean successive squared difference of bedtime. This is a measure of bedtime variability between successive nights.
- midpoint_sleep: Measurement for a first year student’s average midpoint of bedtime and wake time after 11 pm in minutes.
- daytime_sleep: A first year student’s average amount of sleep outside the main sleep window in minutes.
The histogram above shows the distribution of the main response variable, Average Total Sleep. We see a unimodal distribution with very little skewness. The mean is slightly under 400 minutes of sleep.
Our research will be guided by three main questions meant to explore the relationship between sleep patterns and academic performance, analyze the impact of sleep timing on student performance, and examine the role of first-generation status in sleep and academic success.
How does the total sleep time, bedtime variability (bedtime MSSD), and term units relate to students’ cumulative GPA (cum_gpa) and end-of-term GPA (term_gpa)? Are there variations in the sleep duration between males and females? Additionally, does the impact of sleep on GPA differ between males and females?
Does the average midpoint of bedtime influence academic performance, as reflected in cumulative and end-of-term GPAs? Are there significant variations in daytime sleep duration among different cohorts, and how do they affect academic outcomes for first year students?
How does being a first-generation student (demo_firstgen) influence sleep patterns, such as bedtime MSSD, total sleep time, and daytime sleep?
There does not appear to be a relationship between Total Sleep Time and cumulative GPA or Daytime Sleep and Cumulative GPA, as shown by the scatterplot. The goal of these plots is to help us analyze the relationship between sleep duration and cumulative GPA, while specifically looking to see if there are differences in this relationship for each gender. In the plot looking at Daytime sleep, we can see there are two modes very close to each other showing the parts of the plot with the highest concentration of males and females, respectively. The female mode comes between approximately log(Daytime Sleep) of 3.1-3.8 or about 20-44 minutes of daytime sleep and a cumulative gpa of 3.1-3.85. The male mode comes between approximately 3.2-4.1 log(Daytime Sleep) or about 33-58 minutes of daytime sleep and a gpa of 3-3.9. We can interpret this as Males in the study, on average, spend slightly more time taking naps or other types of daytime sleep. There does not appear to be a statistically significant difference in the effect of daytime sleep on cumulative GPA between the two genders. As for the overall trend, it appears there is a very slight negative correlation between daytime sleep duration and GPA.
If we analyze the relationship between cumulative GPA and log(Total Sleep Time), we can see the male and female modes are once again very similar. The males appear to have slightly more total sleep time, but there does not appear to be a statistically significant difference in the effect of main sleep time on cumulative gpa between the two genders. As for the overall trend, it appears there is a slight positive correlation between total sleep time and gpa. To conclude, I think this plot is particularly informative for the kind of question I aim to discuss because it gives us a visual understanding of potential correlations between sleep patterns and academic performance, while also considering the impact gender might have. By examining the relationship between sleep duration and cumulative GPA for males and females, we can see whether sleep habits appear to have different effects on academic success based on gender.
##
## Welch Two Sample t-test
##
## data: sleep2$TotalSleepTime by sleep2$demo_gender
## t = -1.0155, df = 577.36, p-value = 0.3103
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -12.130126 3.861773
## sample estimates:
## mean in group 0 mean in group 1
## 394.9577 399.0918
The Violin Plot of TotalSleepTime vs. Cumulative GPA by Gender further shows that there is not an apparent difference between the effects of Total Sleep Time on GPA based on your gender. However, we decided to run a t-test to further analyze. The p-value from the t-test was .310, meaning there is not evidence of a statistically significant difference in Total Sleep Time between the two groups.
As for the hexagonal heatmap, there is a slight pattern of student’s with a log(bedtime mssd) of around -3 to -2 having a high cumulative gpa (3.5+). A log(bedtime mssd) value in this range indicates a relatively stable/normal bedtime routine, which we would expect to lead to improved concentration and overall academic engagement. On the other hand, those with higher bedtime variability also display a trend of lower GPAs. This is what we would expect, as they may have irregular sleep patterns, leading to low concentration in class.
In this section we will be analyzing how the timing of sleep affects student performance. We will be answering this question by answering two sub-questions: Does the average midpoint of bedtime influence academic performance, as reflected in cumulative and end-of-term GPAs? And are there significant variations in daytime sleep duration among different cohorts, and how do they affect academic outcomes for first year students?
## `geom_smooth()` using formula = 'y ~ x'
This scatterplot depicts first year students’ bedtime & wake up midpoints against their cumulative GPAs colored by their cohort and fit with a regression line for each cohort. While a majority of the data points are concentrated at higher GPA values. There appear to be two main takeaways from this graph: first, the data points from the cohorts appear to be spread equally along the x and y axis. This means it appears that you are equally likely to find a data point from any cohort at any place on the graph. Second, all of the cohorts exhibit a negative association between bedtime and wake up midpoint and cumulative GPA. Meaning that as you increase the bedtime wake up midpoint, you can expect to also decrease cumulative GPA regardless of what cohort an you are in.
## `geom_smooth()` using formula = 'y ~ x'
This scatterplot depicts first year students’ bedtime & wake up midpoints against their term GPAs colored by their cohort and fit with a regression line for each cohort. Similar to the previous plot, a majority of the data points are concentrated at higher GPA values. There appear to be two main takeaways from this graph: first, the data points from the cohorts appear to be spread equally along the x and slightly higher along y axis. This means it appears that you are equally likely to find a data point from any cohort at any place on the graph. Second, all of the cohorts exhibit a negative association between bedtime and wake up midpoint and term GPA. Meaning that as you increase the bedtime wake up midpoint, you can expect to also decrease term GPA regardless of what cohort an you are in.
Next, to test whether or not bedtime / wake up midpoint is significant in predicting term and cumulative GPA we will construct two linear models and conduct hypothesis tests. Starting with the model predicting cumulative GPA, we will run a t-test to test whether or not a first year student’s average midpoint of bedtime and wake time after 11 pm (midpoint_sleep) is significant in predicting their cumulative GPA. For this test the null hypothesis will be that a first year student’s average midpoint of bedtime and wake time after 11 pm is not significant in predicting their cumulative GPA whereas our alternative hypothesis will be that a first year student’s average midpoint of bedtime and wake time after 11 pm is significant in predicting their cumulative GPA. We will do this at a 95% significance level.
From our t-test, we see that the p value for the \(\beta\) coefficient associated with midpoint_sleep (1.2e-06) is less than alpha (0.05) meaning we have statistically significant evidence to reject the null hypothesis that a first year student’s average midpoint of bedtime and wake time after 11 pm is not significant in predicting their cumulative GPA in favor of the alternative. In addition, our model also provides us with information on how changing midpoint_sleep affects a first year student’s predicted cumulative GPA at the end of the term. The model equation is \(Y = 3.924867 - 0.001152 X_{1}\) where \(X_{1}\) is the midpoint_sleep value for an individual. In context this means that every one minute increase in midpoint_sleep causes a decrease in the first year student’s predicted cumulative GPA by 0.001152 points signifying that sleeping later and waking later is expected to decrease an individual’s cumulative GPA.
Regarding the model predicting term GPA, we will also run a t-test to test whether or not a first year student’s average midpoint of bedtime and wake time after 11 pm (midpoint_sleep) is significant in predicting their term GPA. For this test the null hypothesis will be that a first year student’s average midpoint of bedtime and wake time after 11 pm is not significant in predicting their term GPA whereas the alternative hypothesis will be that a first year student’s average midpoint of bedtime and wake time after 11 pm is significant in predicting their term GPA. We will also do this at a 95% significance level.
From our t-test, we see that the p value associated with the \(\beta\) coefficient associated with midpoint_sleep (7.97e-07) is less than alpha (0.05) meaning we have statistically significant evidence to reject the null hypothesis that a first year student’s average midpoint of bedtime and wake time after 11 pm is not significant in predicting their term GPA in favor of the alternative. Similar to the cumulative model, we also get information about the magnitude of predicted change in term GPA for every minute increase in midpoint_sleep. The model’s equation is \(Y = 3.9834443 - 0.0013390X_{1}\) where \(X_{1}\) represents a first year student’s midpoint_sleep value. In context this means that for every one minute increase in midpoint_sleep, there is an expected decrease in a first year student’s term GPA by 0.0013390 points signifying that sleeping later and waking later is expected to decrease an individual’s term GPA. Through our analysis, we have found that there is statistically significant evidence to support the notion that the average midpoint of a first year student’s bedtime does influence academic performance as shown through their cumulative and term GPAs.
The contour plot above depicts information about the amount of daytime sleep first year students get and their cumulative GPAs by their racial groups and cohort. The plot shows that a majority of students across all cohorts have a high cumulative GPA (above 3.0) with a high concentration of students having a GPA above 3.75 and closer to 4.0. This plot also shows us that a majority of students across all cohorts get less than 100 minutes of daytime sleep, indicating that the students do take naps but not very long ones. It is important to note that there are many students who sleep longer than 100 minutes during the day and even those that sleep longer than 150 minutes during the day but the majority of students sleep for less than 100 minutes on average during the daytime. Another thing to note is that it appears that the data points seem to follow the same trend across all cohorts, meaning it doesn’t appear that one cohort is more likely to have a certain cumulative GPA or daytime sleep value than any other cohort.
This contour plot depicts information about the amount of daytime sleep first year students get and their term GPAs by their racial groups and cohorts. Similar to the previous contour plot, a majority of students across all cohorts have a high term GPA (above 3.0) with a high concentration of students having a GPA above a 3.5 and closer to 4.0. The plot also shows us that a majority of students across all cohorts get less than 100 minutes of daytime sleep. This indicates that the students do take naps but not very long ones during the day. There are some students that sleep for more than 100 minutes during the daytime, however there just aren’t that many of them. Another thing to note is that it appears that the data points seem to follow the same trend across all cohorts, meaning it doesn’t appear that one cohort is more likely to have a certain term GPA or daytime sleep value than any other cohort.
From these two contour plots it appears that cohorts get the same amount of daytime sleep. To justify this claim we will run an ANOVA test for means. Our null hypothesis will be that there is no significant difference in mean daytime sleep values between cohorts. Our alternative hypothesis will be that there is a significant difference in mean daytime sleep values between at least two of the cohorts. We will use a 95% significance level for this test.
## Df Sum Sq Mean Sq F value Pr(>F)
## cohort 4 31732 7933 11.26 7.88e-09 ***
## Residuals 629 443132 705
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the test because the p value (7.88e-009) is less than alpha (0.05) we have statistically significant evidence to reject the null hypothesis in favor of the alternative hypothesis. This means that the mean values for daytime sleep among first year students is not the same between cohorts. Knowing this we will test whether or not this difference is significant in predicting first year students’ academic performance.
To test whether or not an individual’s cohort is significant in predicting their cumulative GPA given the amount of daytime sleep they get we will run a partial F test. We will construct two models: the full model and the partial model. The full model will be constructed to predict a first year student’s cumulative GPA (cum_gpa) using the amount of sleep they get during the day (daytime_sleep) and their cohort (cohort). The partial model will predict a first year student’s cumulative GPA (cum_gpa) using the amount of sleep they get during the day (daytime_sleep). Our null hypothesis for this will be that the \(\beta\) coefficient associated with cohort is equivalent to zero. Our alternative hypothesis is that the \(\beta\) coefficient associated with cohort is not equivalent to zero. We will conduct this test at a 95% significance level.
After conducting the test we see that the p value (1.198e-11) is less than alpha (0.05) meaning we have statistically significant evidence to reject the null hypothesis in favor of the alternative. In context this means that cohort is significant when predicting a first year student’s cumulative GPA when given the amount of sleep they get during the day. For the rest of this analysis we will be building models with the cohort variable included when predicting a first year student’s cumulative GPA.
In the same way, to test whether or not an individual’s cohort is significant in predicting their term GPA given the amount of daytime sleep they get we will run a partial F test. We will again construct two models: the full model and the partial model. The full model will be constructed to predict a first year student’s term GPA (term_gpa) using the amount of sleep they get during the day (daytime_sleep) and their cohort (cohort). The partial model will predict a first year student’s term GPA (term_gpa) using the amount of sleep they get during the day (daytime_sleep). Our null hypothesis for this will be that the \(\beta\) coefficient associated with cohort is equivalent to zero. Our alternative hypothesis is that the \(\beta\) coefficient associated with cohort is not equivalent to zero. We will conduct this test at a 95% significance level.
After conducting the test we see that the p value (3.771e-14) is less than alpha (0.05) meaning we have statistically significant evidence to reject the null hypothesis in favor of the alternative. In context this means that cohort is significant when predicting a first year student’s term GPA when given the amount of sleep they get during the day. For the rest of this analysis we will be building models with the cohort variable included when predicting a first year student’s term GPA.
After establishing that daytime sleep values are different between cohorts and that cohorts are significant when it comes to predicting academic outcomes, we will now see how they affect academic outcomes for first year students. To do this we will create two final linear regression models: one predicting cumulative GPAs for first year students and one predicting term GPAs for first year students. Both of which will utilize daytime sleep and cohorts as predictors. We will factorize cohort because we have established that the response variable will be affected differently by each cohort. We will start by creating the model that predicts cumulative GPA (cum_gpa) for first year students.
The model tells us that different cohorts provide different intercepts for our model. The model equation is:
\(Y = 3.4141129 - 0.0031580X_{1} + 0.2254999(X_{2} = lac2) + 0.3660645(X_{2} = nh) + 0.2175691(X_{2} = uw1) + 0.0965668(X_{2} = uw2)\) for \(X_{1} = daytime_sleep\) and \(X_{2} = cohort\).
In context this means that different cohorts affect the initial \(\beta_{0}\) of our model but the slope of the model stays the same. The slope of our model is -0.0031580 meaning that for every one minute increase in a student’s average daytime sleep, there is an expected decrease of 0.0031580 points in their cumulative GPA. Now we will repeat this process but this time we will predict term GPA.
This model also tells us that different cohorts provide different intercepts for our model. This model’s equation is:
\(Y = 3.3865808 - 0.0038968X_{1} + 0.2510138(X_{2} = lac2) + 0.4539069(X_{2} = nh) + 0.2853727(X_{2} = uw1) + 0.1125779(X_{2} = uw2)\) for \(X_{1} = daytime_sleep\) and \(X_{2} = cohort\)
In context this also means that different cohorts affect the initial \(\beta_{0}\) of our model but the slope of the model stays the same. The slope of our model is -0.0038968 meaning that for every one minute increase in a student’s average daytime sleep, there is an expected decrease of 0.0038968 points in their term GPA.
In conclusion, we have discovered that there are significant variations in daytime sleep duration amounts among different cohorts and that these variations are significant when it comes to predicting academic outcomes for first year students.
The side-by-side boxplot illustrates that there does not appear to be a significant difference between the sleep variability of Non-First Gen studens and First-Gen students. The range is slightly large for Non-First Gen students, and there are a handful of outliers (mostly high outliers). This is interesting, as it might mean that while Non-First Gen students overall have similar sleep variability to First-Gen students, the ones who do not have significantly more variation than those of First Gen students.
##
## Welch Two Sample t-test
##
## data: sleep4$TotalSleepTime by sleep4$demo_firstgen
## t = -0.30257, df = 143.23, p-value = 0.7627
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -12.65525 9.29522
## sample estimates:
## mean in group 0 mean in group 1
## 397.0508 398.7308
When analyzing if there is a difference in the effect of Total Sleep Time on Cumulative GPA among first-gen and non-first gen students, we can see the standard error bars of their respective regression line overlap, and if two SEM error bars do overlap, and the sample sizes are similar, then it points to that fact that the difference is not statistically significant. To be more certain, we performed an independent sample t-test on the difference in Total Sleep Time between first-gen and non-first gen students. The p-value from the t-test was .763, meaning there is not evidence of a statistically significant difference in Total Sleep Time between the two groups.
Finally, we can use a side by side bar plot to see if there appears to be a significant difference in Daytime sleep depending on a whether a student is First-Generation or not. We can see here that the non-first gen students do, in fact, have over twice as many minutes of Daytime Sleep, on average, than First-Generation students. It could be interesting to further discuss the psychological reasons behind this, such as the potential that First-Gen students might be more internally motivated than Non-First Gen students.
## `geom_smooth()` using formula = 'y ~ x'
We ran a linear regression model with variables we thought might have a significant relationship with Total Sleep Time. The model is represented by the blue line on our scatterplot and we see a slight positive correlation between Total Sleep Time and Cumulative GPA.
An additional plot we chose to include was a bubble plot showing Midpoint of Sleep and Total Sleep Time. The size of the bubbles is showing Term GPA and as a students GPA gets higher their bubble will be larger. Additionally, the color represents the Cohort variable which we can visualize more of the sleep trends by cohort.
A potential question we identified as an area for future research is to investigate temporal trends in sleep patterns. This would involve looking at sleep patterns more closely to see if the amount of sleep for students changes significantly across different weeks of the semester such as midterms, fall/spring break, and finals. Additionally, we could look at sleep patterns of students on weekdays vs weekends and see if that shows any trends.
Another future study that could be done with this data is comparing how each cohorts’ sleep averages change over a longer period of time. If the study is extended over multiple years we could investigate whether students have been getting more or less sleep than previous graduating classes. Another variable that could be looked into is total units to see how many students are overloading and taking more units in the future compared to the current study.
These questions are left for future research, as we need more data, over a longer period of time. We are excited to see the potential implications of future analysis of this subject.
Our report has provided valuable insights regarding sleep patterns, demographic variables, and academic performance. The three main research questions looked into the effects of total sleep time, sleep timing, and demographic factors (gender and first-generation status) on academic success. For the first question regarding total sleep time, our analyses revealed that there is no significant difference in the effect of total sleep time on cumulative GPA between male and female students. The violin plot and Welch two-sample t-test confirmed this finding. Additionally, scatterplots illustrated a small negative correlation between daytime sleep duration and GPA, with slight variations between genders. The second question investigated the impact of sleep timing on student performance, specifically examining the average midpoint of bedtime. Scatterplots demonstrated a negative association between bedtime/wake up midpoint and cumulative/term GPAs for various cohorts. Linear regression models supported our observations, indicating that a later midpoint is associated with lower GPAs. The t-tests for the midpoint sleep variable confirmed its significance in predicting both GPA response variables. Finally, the third question explored the role of first-generation status in sleep patterns and academic success. Our analysis did not find a statistically significant difference in total sleep time between first-gen and non-first-gen students. However, our linear regression model showed that bedtime variability, daytime sleep, and midpoint of sleep were significant predictors of cumulative GPA. By addressing our research questions, we contribute to ongoing sleep studies that emphasize the need to consider the crucial role of sleep in the college experience.