Introduction

The COVID-19 pandemic has fundamentally reshaped how we work, accelerating the adoption of remote and hybrid work models across the globe. What was once considered a niche arrangement has now become the norm for millions of employees and organizations. This shift has brought about both opportunities and challenges, redefining work-life balance, productivity, and mental health in ways never seen before.

Remote work offers flexibility, reduces commuting time, and opens up job opportunities beyond geographical boundaries. However, it also presents potential downsides, such as feelings of isolation, difficulty in separating work from personal life, and increased stress for some individuals. Hybrid work models, which blend in-office and remote setups, have emerged as a popular compromise, allowing employees to enjoy flexibility while maintaining interpersonal connections.

In light of these trends, understanding the impact of different work arrangements—onsite, hybrid, and remote—on employee well-being and productivity has become crucial for organizations striving to create effective and sustainable workplace policies. This dataset, collected from 5,000 employees across diverse industries and regions, provides a comprehensive look into how these work arrangements affect key factors such as stress levels, work-life balance, access to mental health resources, and productivity changes.

This analysis aims to explore the intricate interplay between work arrangements and their effects on employees’ professional and personal lives. By identifying trends and patterns, the findings can guide organizations in designing workplace strategies that foster productivity, support mental health, and enhance employee satisfaction in a post-pandemic world.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

Dataset

The data captures a wide range of variables, including demographic attributes such as age, gender, and region, as well as workplace factors like hours worked per week, number of virtual meetings, and job roles. Key metrics such as stress levels, work-life balance ratings, and mental health conditions offer insights into employee well-being, while variables like productivity change shed light on how employees are adapting to these new work norms.

The dataset comprises 5,000 observations and 20 variables, providing a rich source of information for analyzing the impact of remote work on employees. We can split the variables into three main categories: demographic attributes, workplace factors, and well-being/productivity metrics.

First, for demographic attributes, the dataset comprises the following variables:

Employee_ID: A unique identifier for each employee in the dataset. While it does not provide any analytical insights, it ensures data integrity.
Age: A continuous variable indicating the age of the employee.
Gender: A categorical variable (e.g., Male, Female, Non-binary, Prefer not to say) that enables us to explore whether the impact of work arrangements varies across genders.
Region: The geographic region of the employee (e.g., Europe, North America, Asia). This helps in understanding how cultural and regional differences influence the experience of remote work.

Second, for workplace factors, the dataset comprises the following variables:

Job_Role: A categorical variable specifying the employee’s role (e.g., Software Engineer, HR, Sales). This variable allows for the analysis of role-specific challenges and benefits in remote work settings.
Industry: The sector in which the employee works (e.g., IT, Healthcare, Finance). Industries may vary significantly in their ability to accommodate remote or hybrid work, and this variable helps capture those nuances.
Years_of_Experience: A continuous variable indicating the total years of professional experience. Employees with more experience might adapt differently to remote or hybrid work compared to those with less experience.
Work_Location: A key variable indicating whether the employee works onsite, hybrid, or remotely. This is the primary variable for understanding how work arrangements influence outcomes.
Hours_Worked_Per_Week: A continuous variable measuring the number of hours the employee works weekly. This allows us to identify overwork trends in different work arrangements.
Number_of_Virtual_Meetings: A continuous variable capturing how many virtual meetings an employee participates in weekly. High numbers may indicate the “Zoom fatigue” associated with remote work.
Company_Support_for_Remote_Work: A numerical scale (e.g., 1 to 5) indicating the level of support provided by the organization for remote work. This includes access to tools, resources, and flexibility.

Lastly, for well-being and productivity metrics, the dataset comprises the following variables:

Work_Life_Balance_Rating: A numerical scale (e.g., 1 to 5) indicating the employee’s perception of balance between work and personal life. This metric is crucial for evaluating the psychological impact of remote and hybrid work.
Stress_Level: A categorical variable (e.g., High, Medium, Low) reflecting the employee’s stress levels. This is a key outcome variable to assess the challenges of different work setups.
Mental_Health_Condition: A categorical variable (e.g., Anxiety, Depression, None) indicating whether the employee has any reported mental health conditions.
Access_to_Mental_Health_Resources: A binary variable (Yes/No) indicating whether the employee has access to support such as counseling or mental health programs. This variable explores how access to resources moderates the effects of work arrangements.
Social_Isolation_Rating: A numerical scale reflecting the employee’s level of social isolation. High ratings may highlight the challenges of remote work in maintaining social connections.
Satisfaction_with_Remote_Work: A categorical variable (e.g., Satisfied, Unsatisfied) reflecting the employee’s overall satisfaction with working remotely.
Productivity_Change: A categorical variable (e.g., Increase, Decrease, No Change) indicating how the employee’s productivity has been affected by their work arrangement.
Physical_Activity: A categorical variable (e.g., Weekly, None) measuring how often the employee engages in physical activity. This variable explores whether remote work affects health-related behaviors.
Sleep_Quality: A categorical variable (e.g., Good, Average, Poor) reflecting how well the employee sleeps. Sleep quality often correlates with stress and overall well-being.

Research Questions

The main focus of this paper revolves around the impact of remote work arrangements on employees’ well-being and productivity. To delve deeper into this topic, we have formulated three research questions that will guide our analysis and exploration of the dataset.

How does work location affect employees’ productivity and well-being across various age groups?
Does the distribution of work-life balance ratings differ significantly across gender groups?
What effect does hours worked per week have on employees’ work-life balance ratings across different work locations?

Research Question 1:

How does work location affect employees’ productivity and well-being across various age groups?

First, we investigate the relationship between work location (onsite, hybrid, remote) and employees’ productivity change across all age groups. Productivity is a critical metric that reflects employees’ effectiveness and output in their roles. By analyzing how productivity change varies based on work location and age, we can uncover trends that shed light on the effectiveness of different work arrangements for different age groups.

The mosaic plot colored by Pearson Residuals obtained from running a Chi-Squared test of independence between Work Location and Productivity Change shows some patterns in the data despite relatively weak association between the two variables. The plot illustrates how there are more than expected counts of ‘Increase’ in productivity for employees working remote, ‘Decrease’ in productivity for employees with a hybrid working arrangement and ‘No Change’ in productivity for employees working onsite. This could suggest that remote work might be more conducive to productivity, but the overall trend is unclear. Hence, we further analyze this relationship by segmenting the data based on age categories (<30, 30-40, 40-50, 50+). This aims to further identify trends and patterns that shed light on how different generations perceive and experience remote work.

For younger employees aged below 30, onsite work appears to increase productivity more as seen by the higher ‘increase’ levels of productivity change and lower ‘decrease’ levels of productivity change compared to both hybrid and remote work. This could be attributed to how younger employees could be still adjusting to the working world and gaining in-person experience from other more experienced colleagues help increase their work output.

For employees aged between 30 and 40, the trend is actually the opposite with remote work increasing productivity more compared to both hybrid and onsite work. For employees aged between 40 and 50, this trend is rather similar, although less apparent, with a high proportion of onsite workers indicating that their productivity has decreased. These trends could be due to employees feeling more confident in performing better in their work and hence, prefer to work from the comfort of their homes. The increased convenience and work-life balance this gives them could serve as additional motivation and drive productivity.

For employees aged 50 and above, it seems that they are rather ambivalent between the different categories of work location as the ‘increase’ and ‘decrease’ levels for all categories are very similar. As the transition into the later stages of their careers, there could be a shift in the trend from the 30-40 and 40-50 age groups to a slightly greater emphasis on working onsite because they could be attempting to network more to gain managerial positions in their career or simply spend more time with colleagues. However, they do still prioritize their work-life balance and convenience of working remotely to a certain extent.

Overall, as different age groups tend to have different preferences and priorities, this explains the weak association and unclear overall trend between the two variables. However, analyzing this relationship across different age groups provides valuable insights into how work location impacts productivity and how organizations can tailor their work arrangements to meet the diverse needs of their employees.

Next, we explore the relationship between work location and employees’ stress levels across all age groups. Stress is a critical factor that can impact employees’ well-being and overall job satisfaction. By examining how stress levels vary across different work arrangements and age groups, we can gain insights into the psychological effects of remote work on employees.

The mosaic plot colored by Pearson Residuals obtained from running a Chi-Squared test of independence between Work Location and Stress Level shows some patterns in the data, which could indicate some weak association between the two variables. The plot illustrates how there are more than expected counts of ‘High’ stress levels and lower than expected counts of ‘Low’ stress levels for employees working remotely. In contrast, employees working onsite have more than expected counts of ‘Low’ stress levels and lower than expected counts of ‘High’ stress levels. This suggests that remote work might be associated with higher stress levels. To further explore this relationship, we segment the data based on age categories to identify how different age groups experience stress levels across various work locations.

For employees aged 40 and below (both the <30 and 30-40 groups), there seems to be similar proportions of high and low stress levels across all work locations, which could mean that the stress levels of these age groups are not significantly affected by the work location they are in. This could be attributed to the fact that these age groups are more adaptable to different work environments and are more focused on building their careers.

However, for employees aged above 40 (both the 40-50 and >50 groups), there is a higher proportion of high stress levels and lower proportion of low stress levels for those working remotely or hybrid compared to those working onsite. This could be due to the increased pressure and expectations placed on older employees, who may find it more challenging to adapt to remote work setups, as these new working arrangements could mean a huge shift in the way they work. The lack of in-person interactions and support systems could contribute to higher stress levels among older employees, highlighting the importance of tailored strategies to support their well-being.

This corroborates the findings from the mosaic plot, indicating that remote work might be associated with higher stress levels, particularly for older employees. By understanding how stress levels vary across different age groups and work locations, organizations can implement targeted interventions to support employees’ mental health and well-being in remote and hybrid work environments.

Research Question 2:

Does the distribution of work-life balance ratings differ significantly across gender groups?

Understanding if there are discrepencies of work-life balance ratings among gender groups is extremely important to ensuring equality in the workplace. By seeing if certain gender groups are having a harder time establishing a work-life balance, steps can be taken to make sure that certain gropus are not being overworked when compared to others.

In order to get a good grasp of the correlation between the work-life balance and the different gender groups, we decided to first look the counts of the different gender groups and the different work-life balance ratings.

##                    
##                             1         2         3         4         5
##   Female            0.1938776 0.1946625 0.1930926 0.2095761 0.2087912
##   Male              0.2244094 0.1826772 0.2007874 0.1952756 0.1968504
##   Non-binary        0.2009885 0.1968699 0.2191104 0.1828666 0.2001647
##   Prefer not to say 0.1988728 0.1996779 0.2302738 0.1956522 0.1755233

As shown in the table, there doesn’t seem to be a major difference between any of the gender groups in terms of the proportions of each of the gender group and the work-life balance ratings. The only noticeable difference can be shown with the Male group and the rating of 1, with the proportion being around .22 while the next highest is around .2. Furthermore, there was another noticeable difference for the “Prefer not to say” group with the rating of 3, with the other groups not having a proportion near to .23. Finally, the “Prefer not to say” had a higher proportion than the other groups for the rating of 5. By looking at the table, it is clear that there is no major difference between the different gender groups. However, we can still look at a heat map and a density plot to see if there is any other noticeable difference between the different genders.

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

The results stemming from the heat map and density plot, the general picture is that rating the work-life balance does not vary with gender and most of the responses concentrate on the mid-range ratings of 3 and 4. It can be seen on the heatmap that both male and female respondents account for these mid-range ratings while the other two categories have less color intensity due to less number of responses. The density plot also supports this view showing identical distribution across the segments with the most noticeable surround around 3 and 4 for all other groups.

It is also interesting the values of 1 and 5 are not as high as the other values among all the other gender which tend to mean most people reach to a conclusion where it is best to describe work-life balance as average rather than poor or strong. Nevertheless, the trend here is that the “Non-binary” and “Prefer not to say” groups having smaller sample sizes may have drawn wider distributions and lower density values which could affect the chances of reaching a definitive conclusion for these populations. To sum up, the findings do indicate that there is a uniformity of trends in how the sample perceives work-life balance irrespective of gender as there are no significant anomalies or deviations from the trend.

To finally come to a conclusion about the work-life balance rating of different genders, we can do a Kruskal-Wallis test between the two variables. This test was chosen to the fact that Work-life Balance is on a scale, making it an ordinal data type.

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Work_Life_Balance_Rating by Gender
## Kruskal-Wallis chi-squared = 3.5584, df = 3, p-value = 0.3133

As we can see a p-value of over .3, we can come to a conclusion that there is no statistical difference between the different gender groups and their work-life balance rating. This is a good sign that specific gender groups aren’t finding it difficult to work in a workplace compared to other groups.

Research Question 3:

What effect does hours worked per week have on employees’ work-life balance ratings across different work locations?

Lastly, we will investigate the relationship between the number of hours employees work per week and work-life balance ratings. We will also consider location (hybrid, onsite, or remote) as another factor in this analysis to see if there is any additional information we can use to further our understanding of the impact of different work arrangements. Generally, work-life balance is the ability to balance professional and personal lives so that individuals can meet both personal and work goals while maintaining a healthy mental state. It is very important for good employee performance as a bad work-life balance can lead to burnout, a poor mental state, and detrimentally impact work produced. We will start off by looking at a box plot that displays work-life balance by hours worked per week (grouped by intervals of 5 hours), faceted by work location.

## [1] 45-50 50-55 45-50 30-35 35-40 35-40
## 12 Levels: 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 ... 55-60

From the above graph, we can see that the median, interquartile range, and total range of each box plot is for the most part similar. However, there are a few exceptions for each category of work location. In hybrid, the box plot for 50-55 hours has a lower value of work-life balance rating for its lower quartile than the majority of the other pictured box plots. In onsite, the box plot for 25-30 hours also has a lower value of work-life balance rating for its lower quartile. However, this lower quartile is not as low as the lower quartile previously mentioned in the hybrid facet. Lastly, in remote, the box plot for 40-45 hours has a lower quartile values visually similar to that of 50-55 hours in hybrid. The interquartile range for these three box plots are greater than the rest, meaning that there is higher variability in this data. A lower value for the lower quartile may suggest a relatively higher number of lower work-life balance rating values. Since the differing box plots in hybrid and remote have lower lower quartiles than the differing box plot in onsite, there may be more employees that work in these locations that reported lower work-life balance ratings compared to those working onsite.

To further understand the distribution of the number of hours employees work per week across the three defined work locations, we can create a stacked bar chart where data is represented as proportions for clarity. We are doing this to make sure our data has about an even number of observations of employees who work at each location and for various amount of hours per week.

## `summarise()` has grouped output by 'Hours_Worked_Per_Week'. You can override
## using the `.groups` argument.

From the stacked bar chart, we can observe the proportion of employees working at each location by the number of hours they work per week. There appears to be a relatively even spread of observations for each work location, perhaps with slightly more for remote. We do not need to have much concern for our tests being affected by not having enough data for any given work location category. We can also run a Kolmogorov-Smirnov test on the data we have for the number of hours worked per week to see if it follows a Gaussian (normal) distribution.

## Warning in ks.test.default(x = data$Hours_Worked_Per_Week, y = "pnorm", : ties
## should not be present for the one-sample Kolmogorov-Smirnov test

## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  data$Hours_Worked_Per_Week
## D = 0.0712, p-value < 2.2e-16
## alternative hypothesis: two-sided

Using the Kolmogorov-Smirnov test, we get a p-value of <2.2e-16 Since it is very small and less than 0.05, we reject the null hypothesis and say that Hours_Worked_Per_Week does not follow a Gaussian distribution. We can visually confirm this by looking at a histogram displaying the count of employees that work a certain number of hours per week.

## [1] 39.6146

As we can see, the values are almost visually uniformly distributed across the interval 20 hours to 60 hours. A histogram for a normal distribution typically shows many data points occurring near the mean. In this dataset for this variable, the mean is 39.61 values. Since the data does not appear to have any groupings around this value, we have visually confirmed what we drew from the K-S test: Hours_Worked_Per_Week does not follow a Gaussian distribution.

Conclusion

Our analysis aimed to investigate the impact of remote work on employees, specifically exploring productivity, stress levels, and work-life balance across various demographic and workplace factors. By examining a dataset of 5,000 employees, we sought to identify trends and patterns that could inform organizational strategies in the evolving landscape of work. Our findings indicate that work location has nuanced effects on productivity, with remote work being associated with increased productivity for middle-aged employees (30-50 years old), whereas younger employees (<30 years) benefit more from onsite arrangements. Interestingly, older employees (50+) displayed no significant preference, highlighting the diversity of needs across age groups. Regarding stress levels, remote work appears to correlate with higher stress among older employees (40+), emphasizing the importance of tailored support systems for this demographic. When analyzing work-life balance ratings across gender groups, our results show no significant differences, suggesting uniformity in how work-life balance is perceived regardless of gender. This uniformity could be indicative of equitable experiences across workplaces in the dataset, though smaller sample sizes for certain groups may require further exploration. Finally, we examined how hours worked per week affect employees’ work-life balance ratings across different work locations. While employees working remotely or in hybrid setups reported slightly lower work-life balance ratings when working extended hours, onsite employees generally exhibited less variability in their responses. This suggests that while long working hours universally challenge work-life balance, remote and hybrid setups may introduce additional complexities in maintaining this balance. These findings highlight the need for organizations to consider work hours and location together when designing policies that promote employee well-being.

While this study provides valuable insights, it is not without limitations. The dataset, while comprehensive, does not capture cultural and social factors that could influence employees’ experiences. Additionally, metrics such as access to healthcare, family dynamics, or job satisfaction were not included, which could provide a fuller picture of employee well-being. Future research could delve deeper into how remote work affects team dynamics and innovation. Further, examining industries where remote work may not be as viable, or analyzing long-term impacts of hybrid models, could yield actionable insights. Investigating how employers can mitigate stress for older employees and support career growth through virtual mentoring programs would also be valuable. In conclusion, our analysis underscores the complexity of remote work’s impact and the importance of flexible, inclusive workplace policies that accommodate diverse needs. These findings can guide organizations in fostering productive and supportive environments for all employees.

36-315 Final Project: Impact of Remote Work on Employees

Smaran Alli, Katherine Weng, Anushka Iyer, Pi Rey Low

11 November 2024