A good night sleep is essential for a good start to your day. On the other hand, a declining quality of sleep is associated worsen health conditions, like negative mood changes, worsened cardiac health, and a myriad of other negative health effects. In our project, we aim to dive into the factors that can impact sleep duration and the quality of sleep.
This data is sourced from Sleep Health and Lifestyle Dataset created by Laksika Tharmalingam and found on Kaggle. 374 people were included in this study and there were thirteen variables collected from this dataset.
The variables are the following:
In our project we aimed to answer the following questions: | 1. What is the impact on of quality of sleep as we age and works | 2. What is the impact of sleep duration given occupation type? | 3. How would various sleep disorders impact quality of sleep?
The saying is a good night leads to a good day. It does not specify a time of day, rather focusing on quality. The objective is to see how people view their opinions on quality of sleep and comparing it to the factors of their life. Oftentimes, we are focused on the number of hours of sleep rather than the quality of sleep. For example, eight hours of sleep can vary from a restless sleep to a very comfortable state of rest.
In our project, we aim to look at how age, occupation and sleep disorders affect quality of sleep. Additionally, we also explore how occupation will have an impact on sleep duration.
For many of us, we go to sleep without a clear mind. Always focused on the next task and a plan for the next day. This question aims to explore how people think their quality of sleep is and is attempting to distinguish the relationship between age and self-reported quality of sleep.
Exploring the relationship through occupation is important to see how someone views their quality of sleep based on their career. Although age and sleep quality can show a relationship, considering occupation could capture an insight in sleep in various stages of a career.
As a result, we made a jitterplot to see how age impacts Quality of Sleep, coloring the points to see what career the person is in.
## `geom_smooth()` using formula = 'y ~ x'
This is a linear regression graph that is showing the overall relationship between age and self-rated quality of sleep. The regression line is showing overall that as age increases, the self-rated quality of sleep increases. That is true because most people above the age of fifty were above a six. The lowest quality rating given was a four and the highest quality rating was a 9. An increase in sleep quality ratings can be said because most people in their thirties did not rate their sleep above an eight. People after the age of fourty reported a low of 5 to a high of nine. People after the age of fifty rated their sleep above a six. People after 55 rated their quality of sleep at a 9.
Based on the plot, we can see that certain careers have an increase in quality of sleep. Although, managers and sales representatives do not have enough data, there were nine regression lines made. Older accountants reported a worse quality of sleep. However, doctors reported an increase. Older Engineers reported an increase in quality of sleep. Lawyers did not have a large range in age but it was showing a slight increase in quality of sleep. Older nurses show a better quality however this is showing that there is are reports in clusters since there are younger nurses and then a dramatic shift in age. Sales people are a smaller group but it remained constant through the few changes in age. Scientists see a sharp decrease but it is due to a lack of figures, only having people in their 30s and 40s. Software Engineers are on the younger side and they are seeing an increase in quality of sleep. Teachers remain fairly constant at seven.
This is to say that it is important to note that many careers report an increase in quality of sleep. We also need to understand how the differences in ages can show a much different impact in sleep. For this dataset, we can see overall that older people generally rate their quality of sleep higher than younger people.
As a result, using the same variables, creating a contour plot can show the clustered groups. The graph is showing that most of the points are concentrated between six and eight and that most people surveyed are in their late twenties to early fifties.
|The takeaways we should get from the graph is that people at different occupations will also rate their quality of sleep differently. We can see doctors and engineers at similar ages have similar ideas on quality of sleep. Aside from the person in their fourties, everyone in their late fourties and after rated their sleep at a higher rate. The decreases in quality of sleep was not for every career rather a few that are slow in increases and the ones stated earlier that were a decrease. The graph is not exhbiting a upwards and downwards trend, rather a cluster of increases and concentrated between a six and eight.
Based on the varying regressions presented in the first graph, it is important to test whether or not the differences are significant. We used an anova to see the the variances for the models.
## Analysis of Variance Table
##
## Response: sleep$Quality.of.Sleep
## Df Sum Sq Mean Sq F value Pr(>F)
## sleep$Age 1 119.93 119.932 107.64 < 2.2e-16 ***
## Residuals 372 414.47 1.114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
##
## Response: sleep$Quality.of.Sleep
## Df Sum Sq Mean Sq F value Pr(>F)
## sleep$Occupation 10 241.91 24.1907 30.022 < 2.2e-16 ***
## Residuals 363 292.49 0.8058
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The purpose of an ANOVA test is to see if there are differences in the means between groups. Running a separate test for the two variables was to ensure that both groups were exhibiting an actual trend. The anova for quality of sleep given age and quality of sleep given occupation shows that there is a p-value of less than 0.05, so we can reject the null and say at least one of the groups of means is different and it is statistically significant.
To continue our analysis, we will now determine if there is any clustering in the data by occupation type. For this analysis, we divide our dataset into two categories: STEM and non-STEM careers. STEM careers incorporate any careers involving the sciences, technology, engineering, and math. In this dataset, the represented careers in the STEM fields are doctor, nurse, scientist, engineer, and software engineer. Non-STEM careers encompass all careers outside of those specific fields, such as lawyers, accountants, and teachers. First, we create a dendrogram plot by hierarchical clustering using numerous health related predictors including the subject’s age, physical activity level, stress level, heart rate, and number of daily steps, and color the individual leafs by occupation type, with STEM careers marked in purple and non-STEM careers marked in green.
For this analysis, we use complete linkage hierarchical clustering, and have highlighted the five most dissimilar clusters into distinct colors [from left to right: red, yellow, green, blue, and purple]. Comparing these clusters to the leafs - which again are colored purple for STEM occupations and green for non-STEM occupations - we see there is a fairly strong correlation between occupation type and cluster. For instance, the purple cluster [farthest right] is composed of leafs entirely from patients who work in a STEM career.
The two clusters that have large amounts of variation, which include many patients who work in both STEM and non-STEM occupations are the yellow cluster [2nd from left] and blue cluster [4th from left]. However, we note for the yellow cluster that there is a lot of dissimilarity already inherent in the data, with most of the least dissimilar clusters being at a higher height (hence greater dissimilarity) than any of the other four colored clusters. The blue cluster is the largest by number of members, including close to half the data, and thus including a broader set of career types is also understandable.
From this dendrogram plot, we conclude that there is low dissimilarity between the health related, quantitative predictor variables and occupational type. Datum that are close to each other with respect to age, sleep duration, physical activity, stress, heart rate, and daily steps are very likely to be from individuals with the same occupation type.
Now, to see how each individual health factor correlates with duration of sleep, we will utilize a correlation matrix. In the correlation matrix, we plot the correlation (or similarity) between each pair of variables analyzed. In this case, this includes sleep duration as well as our quantitative health statistics, including stress level, daily steps, physical activity level, age, and heart rate. The first plot shows the correlation between each set of variables for STEM workers, where the second plot shows the correlation between each set of variables for non-STEM workers.
To interpret this plot, we focus on the bottom row of the plots, which show the correlation between sleep duration and each predictor variable. The other variables are valuable to see the relationship or lack thereof between our quantitative predictors, but it is not central to this analysis.
Comparing the two plots, we see some pretty noticeable differences. For workers in the STEM fields, there is a correlation coefficient of 0.43 between sleep duration and age, implying there is a moderately strong relationship between sleep duration and age (hence an older patient is likely to sleep longer). However, the correlation for non-STEM workers between sleep duration and age is -0.38, which shows a moderately strong negative correlation (hence an older patient is likely to sleep less). We also notice for physical activity level, there is close to no correlation between sleep duration and physical activity level for STEM workers, but a very strong correlation for non-STEM workers.
Actually for each relationship between sleep duration and each health related predictor, there are fairly significant differences between non-STEM and STEM workers. STEM workers have a significantly stronger negative correlation between sleep duration and stress level than non-STEM workers, and also a significantly stronger negative correlation between sleep duration and heart rate than non-STEM workers. Most shockingly, STEM workers have a slight negative correlation between sleep duration and daily steps, whereas non-STEM workers have a fairly strong positive correlation instead.
These conclusions show some concerns about the quality of our data. To intuition, it seems hard to reason that the same health factors will have such large differences on subjects purely based on the occupation type they choose. One of the limitations of this analysis is the fairly small size of our dataset. Furthermore, we can further we suspicious of these conclusions by considering the correlations between each health factor are different based on occupation. The correlation between two health predictors should be independent of any lifestyle choices. However, we see clear differences in these correlations on the above plots, with for example the correlation between stress level and number of daily steps for STEM workers being 0.34 and for non-STEM workers being -0.42, and hence having completely contradictory conclusions.
Further in line with the research goals of this paper, an analysis specifically between sleep quality and our health related predictor variables can also be valuable. The choice to study occupation for this research question is based under the premise that sleep duration can be studied more empirically and, in our opinion, produce a more generalizable conclusion.
Given our interest in sleep quality overall, our next question is, how do different documented sleep disorders correlate with sleep quality, and are there were correlations with occupation as well? Answering the first part of this question, and viewing groups within the sleep disorders, may allow us to gain deeper understanding on how one may improve quality of sleep despite their condition. The second section of this question may allow us to gain deeper insight into sleep conditions and how lifestyle may be correlated with these conditions.
To start, how do people with different sleep disorders rate their sleep quality?
Generally, those with insomnia or sleep apnea have lower sleep quality than those with no sleep disorders, which makes sense. However, there seems to be a group of people with sleep apnea who have very high sleep quality, better than either of the other groups. These comparisons can be seen by looking at the peaks of the graph, which represent high frequencies of data points. Most people with insomnia seemed to have a self-reported sleep quality of 6 or 7, most people with sleep apnea seem to have a sleep quality of either 6 or 9, and most people with no sleep condition tend to have a sleep quality of around 6-8. It would be beneficial to perhaps take a closer look at the discrepancies in sleep quality for sleep apnea patients, given the bimodal nature of the sleep apnea data, whereas the other groups have mostly unimodal distributions.
Next, we wanted to examine how these sleep disorders are distributed throughout the different occupations of the patients. We would expect sleep disorder to be random, so roughly equal proportions across the occupations.
Interestingly, this is not what we see at all. Nurses have a
significantly higher proportion of people with sleep apnea, and
salespeople and teachers have an unusually high proportion of insomnia.
For managers, sales representatives, scientists, and software engineers,
it’s hard to see patterns in the data due to the very low number of
these occupations, which can be seen by the very narrow columns here. To
understand the statistical significance of these differences across
occupation, we will use a pearson residuals plot.
We can use a pearson residuals test to see how significant the differences are from expected. The strong blue for nurses and salespeople and teachers signifies that there are more people with sleep apnea in nurses and more people with insomnia in sales and teaching, at a significant level. A chi-squared test would be good to verify this, although both tests become much less reliable given the low amount of observations in certain groups, making testing for statistical significance difficult. If more data was collected, it would be likely be beneficial to do a chi-squared test as well here.
Through our research, we saw how age, occupation, and sleep disorders are correlated with a person’s quality of sleep.
To answer our first question, what is the impact on of quality of sleep as we age and work? In general, as someone ages, they should expect to get a better quality of sleep. There was strong postive relationship between age and quality of sleep. Based on occupation, however the changes over time will vary. These changes will be a slower increase to a sharp decrease.
In the future, we should explore people from these careers but from a more varied set of ages. For example the nurses set had people from their thirties to late fifties but sales representative had a small group of people. The first question does not take into account the individual stresses that come from a career. Some careers are associated with stress but the graph does not show stress anywhere.
For our second research question, we can conclude that there are pretty substantial differences in the relationship between sleep duration and various health factors for individuals for work in either STEM careers or non-STEM careers. A major limitation we ran into was the fairly small sample size, especially after subsetting the data into two groups, as well as concerns of repeated sampling of the same individual effecting the efficacy of our hierarchical clustering approach. For further analysis, we plan on looking at the distribution of sleep duration for each health related predictor separated by occupation type, where we can investigate the extent to which the relation between say stress and sleep duration varies for STEM and non-STEM employees.
For our third research question, we conclude that overall, sleep disorders do seem to be correlated with sleep quality, and there is a statistically significant correlation between these sleep disorders and occupation. However, given the low amount of data for some occupations, determining the validity of these tests is not possible with this dataset. In the future, it would be beneficial to gather more data to explore further this relationship between certain jobs and people with sleep disorders, as well as exploring the duality of sleep quality for sleep apnea patients, and exploring how differences in these subjects may affect their sleep quality despite their condition.