Introduction

Sleep is a fundamental aspect of human health and well-being, influencing physical, mental, and cognitive functions. The interplay between sleep patterns and lifestyle factors is of significant interest in understanding overall health outcomes. This report delves into a comprehensive dataset collected from individuals, exploring diverse dimensions of sleep health and associated lifestyle behaviors.

The data used in our research come from Kaggle updated by Lakiska Tharmalingam. The dataset includes the Person ID, Gender, Age, Occupation, Sleep Duration (hours), Quality of Sleep (rating of quality of sleep from 1 to 10), Physical Activity Level (number of minutes per day the person engages in physical activity), Stress Level (rating of the stress level experienced by the person from 1 to 10), BMI Category (Underweight, Normal or Overweight), Blood Pressure (the systolic and the diastolic pressure of the person), Heart Rate (bpm), Daily Steps (number of steps per day the person walks), Sleep Disorder (None, Insomnia or Sleep Apnea) for 374 people.

In our study, we would be focused on the Sleep Situations (Sleep Duration, Sleep Disorder and Sleep Quality) of different people. As people often claim that some jobs lead to unhealthy lifestyle such as very few exercises, we would like to first investigate the effect of Occupation on Physical Activity Level and then how is Physical Activity Level associated with Sleep Duration. Next, we would like to know what factors would lead to potential sleep disorder. Last but not least, we would like to study how Sleep Quality is affected by certain factors according to commonsense and verify if these commonsense are correct.

Research Question 1: What Effect Does Occupation Have on Physical Activity Level and How Does This, in turn, Affect Sleep Duration?

We are interested in examining the distribution of physical activity levels across occupations and examining how different physical activity levels affect sleep durations. It is well-known that physical activity and sleep are crucial for good health. Thus, examining which occupations lead to more sedentary lifestyles and the impact this has on sleep duration can help us gain an understanding of these patterns.

We used a box plot to view the distribution of physical activity levels by occupation.

It’s noteworthy that the boxes for doctors and engineers are especially wide, meaning there is much variability in physical activity levels within these occupations. Also, the mean values of physical activity (in minutes/day) for these different occupations seems to vary greatly, suggesting that the type of work people do influences their daily activity levels. Looking at the boxes, they also seem significantly spread out. To be sure that there is a significant difference in physical activity between these occupations, we confirmed with an ANOVA.

##              Df Sum Sq Mean Sq F value Pr(>F)    
## Occupation   10  55231    5523    18.8 <2e-16 ***
## Residuals   363 106622     294                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This confirms that at least one of the occupation groups has a different mean physical activity level compared to the others and there is a statistically significant difference among the groups.

Then, we were interested in examining the relationship between physical activity level and sleep duration. To do this, we created a scatter plot with physical activity level on the x axis and sleep duration on the y-axis with the points colored by occupation so we could see whether certain occupations have unique patterns of physical activity and sleep duration.

A general positive linear trend is observed between physical activity level and sleep duration. This indicates that, on average, higher physical activity levels are associated with longer sleep durations. However, there is variability in this relationship, suggesting that other factors may also play a role.

Two notable clusters were identified in the scatter plot:

High Physical Activity, Low Sleep Duration: In the bottom-right corner of the scatter plot, there’s a cluster of individuals with high physical activity levels but low sleep duration. Most of these individuals are nurses. This could suggest that despite being physically active, certain demands in the nursing profession may contribute to reduced sleep.

Low Physical Activity, High Sleep Duration: In the top-left corner, there’s a cluster with low physical activity but high sleep duration. This group primarily consists of engineers. This might indicate that some engineers, who typically have more sedentary roles, tend to sleep longer.

The observed relationship between physical activity level and sleep duration suggests that promoting physical activity could positively impact sleep. The identified clusters could indicate occupation-specific factors influencing these trends. For instance, high-stress roles like nursing might require further study to understand the impact on sleep, while engineering might benefit from initiatives promoting physical activity.

Resarch Question 2: What are some factors that might indicate the presence of a sleep disorder in an individual?

To investigate if people with sleep apnea, insomnia, or no sleep disorder share similar quantitative traits (i.e age, stress level, sleep duration, etc), we performed hierarchical clustering and created a dendrogram to visually represent the clusters. We chose to use six clusters based on a cluster dissimilarity of approximately four, which allowed us to identify six distinct clusters before they merged into larger groups. Using six clusters facilitated further subdivision, enabling us to observe clearer patterns in each cluster. Notably, we can see that each of the three rightmost clusters contain observations almost exclusively with Sleep Apnea, Insomnia, and no diagnosed sleep disorder, respectively. On the other hand, the two leftmost clusters are more mixed, including observations with all three types of sleep disorders. The third cluster from the left contains a blend of observations with Sleep Apnea and no diagnosed sleep disorder. Overall, it appears that some of the clusters identified by the dendrogram align with the different sleep disorders in the dataset, suggesting that data points with different sleep disorders may exhibit unique similarities or patterns compared to each other. However, there is some overlap, indicating that there may be some shared characteristics or features among the different sleep disorders.

To delve deeper, we explored the relationship between age and stress level among individuals with different sleep disorders

We used a jitter plot instead of a scatter plot to visualize the relationship between age and stress level because stress level is a discrete quantitative variable, resulting in multiple data points having the same y value and hiding patterns. The jittering technique spreads out the points, making the density of points more visible. Based on the plot, we can see several patterns. Insomnia tends to occur in individuals aged 40-45, and in those with stress levels of 4 or 7. Sleep apnea is more prevalent in individuals aged 50-60, as well as those with stress levels of 3 or 8. Conversely, there appears to be no clear pattern for those with diagnosed sleep disorder, as the points are generally spread out across stress level and age, although most data points fall below a stress level of 6. Additionally, the regression lines show a negative correlation between age and stress level for all disorders. Notably, however, the slope of the regression line for individuals with insomnia is less steep, indicating that stress levels decrease less significantly with age compared to those with other sleep disorders. This could imply that the factors contributing to stress levels in individuals with insomnia may be more persistent or less influenced by age-related factors than those in individuals with other sleep disorders.

Research Question 3: How is the Quality of Sleep Affected by Age, Physical Activity Level, Stress Level, Heart Rate and Daily Steps?

We’re interested in exploring how Age, Physical Activity Level, Stress Level, Heart Rate and Daily Steps influences one’s quality of sleep. According to commonsense, these factors could be highly associated with quality of sleep as people tend to have shallower sleeps when they get old; deeper sleeps when they exercise more (including spending more time in physical activities and walking more); bad sleeps when they’re stressed out; harder to fall asleep if they have a fast heart rate. We would like to analyze the dataset and verify if these commonsense are indeed correct.

In order to better analyze the relationship for a large set of variables on 374 observations, we use Principal Component Analysis (PCA) to help reduce the dimension. Thus, we would first look at the scree plot (elbow plot) for the PCA to determine how many principal components should be used.

From the elbow plot, we can tell that 2 principal components should be used as the increment in variation explained decrease dramatically after the second principal component. Then we used biplot to show how sleep quality is associated with the variables.

From the biplot, we can tell that as stress level and heart rate increases, the sleep quality decreases from 7 to 4 when the stress level or heart rate is extremely high. We also notice that if other variables are controlled, people with more daily steps or longer physical activity level have better quality of sleep. For example, for people with a moderate stress level and heart rate, as the daily steps or physical activity level increases, the sleep quality changes from 7 to 8. These findings align with the commonsense and are as expected, however, for the variable age, we observe that older people tend to have better sleep quality as indicated in the Biplot, which is quite surprising. To have a scientific conclusion on these relationships and to verify if these relationships indeed holds, we use The PCAtest by Arley Camargo that includes Psi (Vieira 2012),Phi (Gleason and Staelin 1975), the rank-of-roots (ter Braak 1988), the index of the loadings (Vieira 2012), and the correlations of the PC with the variables (Jackson 1991) statistical tests for PCA.

Here we use a bootstrap with 100 replicates and a random permutation of 100 and from the Psi test, we can tell that the Psi test value falls out of the null hypothesis for Psi and similarly the PCA is also significant in the Phi test. Hence, we can conclude that our principal component analysis with 2 principal components is indeed significant. We could also observe the same conclusion that the first two PCs are significant from the percentage of total variation which is similar to the early elbow plot. Then we look more clearly at the index loadings for both the first and second principal component. We can tell from the graphs that the third and fourth (Stress Level and Daily Steps) variables in our dataset have significant different loadings than the null hypothesis, meaning that they’re significant loadings on PC1. Similarly, we can also observe that the second and fifth variables (Physical Activity Level and Heart Rate) have significant loadings on PC2. This then means that Age has no significant loadings on either of the principal components, which then means that our former analysis of the relationship between age and quality of sleep is not significant and the commonsense is not violated in this case and the true relationship needs further studies.

Conclusion

In our research, we investigated how one’s sleep duration, sleep quality and whether one has sleep disorder or not are affected by different factors

For the first question, we conclude that different occupations lead to very different Physical Activity Level and sleep duration, for example, software engineers tend to have very low Physical Activity Level and also very short sleep duration. Besides, we also conclude that there’s a positive relationship between Physical Activity Level and sleep duration. However, the variation in the relationship suggests that there might be potential confounding variables and future research is needed to better study the relationship, for instance, the relationship might also be affected by the stress level of different careers.

For the second question, our hierarchical clustering and jitter plot analysis revealed distinct patterns in the relationship between age, stress level, and sleep disorders. While certain clusters aligned closely with specific sleep disorders, there was also some overlap, suggesting shared characteristics among different disorders. One interesting finding was the decrease in stress levels with age. Future studies could investigate this relationship further, potentially using regression analysis, and explore the underlying mechanisms behind stress levels in individuals with insomnia compared to other sleep disorders. Another area for exploration could be the impact of other lifestyle factors, such as diet and exercise, on sleep disorders and stress levels.

For the third question, based on the PCA tests and the biplot, we conclude that the commonsense of more time spent in exercise (more Daily Steps and Higher Physical Activity Level) leading to better sleep quality, more stressed leading to worse sleep quality, and higher heart rate leading to worse sleep quality. However, our analysis didn’t show that there’s significant evidence that older people have worse sleep quality. This might be due to age having no relationship with sleep quality or could happen by chance as the data set is not very large and there might be potential correlation variables and further studies are needed to verify if there’s indeed no significant relationship.