Introduction

Employee attrition, the gradual reduction in employees as they leave and aren't replaced, is an issue that every company faces. It greatly impacts not only the company but the industry as well. In today's competitive business market, attrition results in negative impacts and complex challenges for the company. Some attrition is unavoidable but identifying people-related trends or patterns will allow companies, especially the HR department, to take the necessary steps to improve the organization and ensure the company runs smoothly in the future. We will explore how different factors affect employee attrition and job satisfaction.

Data Description

The dataset used was obtained from Kaggle from the following link: https://www.kaggle.com/datasets/rishikeshkonapure/hr-analytics-prediction. It includes information about 1470 employees each with 35 attributes ranging from demographics to work factors. We cleaned the data by removing several variables that we perceived as overlap such as DailyRate, HourlyRate, and MonthlyRate as they are all related using calculations.

These are the variables we are focusing on in our analysis:
Age - age of employee
Attrition - employee leaving the company
Department - employee department
DistanceFromHome - distance from home to office in kilometers
Education - qualification of employee (1=Below College, 2=College, 3=Bachelor, 4=Master, 5=Doctor)
Gender - gender of employee
HourlyRate - hourly salary/rate
JobInvolvement - job involvement (1=Low, 2=Medium, 3=High, 4=Very High)
JobLevel - level of job (1=Entry-level, 2=Intermediate, 3=Mid-level, 4=Senior, 5=Executive)
JobRole - job role of employee
JobSatisfaction - satisfaction with the job (1=Low, 2=Medium, 3=High, 4=Very High)
MaritalStatus - whether employee is married or not (divorced, married, single)
NumCompaniesWorked - number of companies employee worked for
PercentSalaryHike - percentage increase in salary
TotalWorkingYears - total years worked
TrainingTimesLastYear - hours spent training
YearsAtCompany - total number of years at the company
YearsInCurrentRole - total years in current role
YearsSinceLastPromotion - years since last promotion
YearsWithCurrManager - years worked under current manager

Research Questions

We will address these three research questions:
1. How do demographics affect job satisfaction?
2. How do work factors affect employee attrition and job satisfaction?
3. How do different factors affect each other?

Question 1: How do demographics affection job satisfaction?

We looked into the relationships between the quantitative variables and JobSatisfaction. We specifically analyzed 3 categorical and 2 quantitative variables related to demographics, namely: Gender, Education, Marital Status, Age, and TotalWorkingYears to discover their relationship with JobSatisfaction. We also explored the effects Education and JobSatisfaction had on Age and TotalWorkingYears.

Graph 1: Dendogram of Demographic Variables, Leaves colored by Job Satisfaction

Above is a dendrogram depicting the relationship between employee demographic variables (Gender, Education, Marital Status) and JobSatisfaction. The dendrogram was constructed using hierarchical clustering and leaves are colored based on job satisfaction levels.

Upon inspection of the dendrogram, there does not appear to be a strong relationship between any of the demographic variables and JobSatisfaction as there are too many leaves that do not form a distinct cluster.

We created several mosaic plots to examine if our findings are true, but have only included one for demonstration purposes.

Graph 2: Mosaic Plot of Gender and JobSatisfaction

The above is a mosaic plot for Gender and JobSatisfaction. The observed frequencies in each category are exactly what would be expected under the null hypothesis of independence, indicating that the two variables are independent. Similar null results were displayed with the other variables Education and Marital Status.

Thus, we have evidence that JobSatisfaction is independent of Gender, Education, and Marital Status.

We've examined three of the four demographic variables so far. We still need to further examine the relationship between JobSatisfaction and Age. We're also curious about whether Education affects this relationship.

Thus, we created a stacked density plot was created with Age on the x-axis, and Job Satisfaction represented by colored density curves. The plot was then faceted by Education, allowing for a more detailed examination of the relationship between Job Satisfaction, Education, and Age.

Graph 3: Density Curves of Job Satisfaction by Age, Facetted by Education

This graph shows the distribution of Job Satisfaction over Age for different levels of Education.

Overall, the density curves for Job Satisfaction levels, ranging from low to very high, share similar distributions and overlap substantially across the Education facets. This suggests that Education levels are not strongly associated with Job Satisfaction. Age does seem to have some effect on Job Satisfaction as the density curves of Job Satisfaction over Age differ somewhat.

However, we note that the plot shows some variation in the relationship between Age, Job Satisfaction, and Education level.

The “Below College” Education facet shows less of an overlap between Job Satisfaction density curves as the “Very High” Job Satisfaction density curve is slightly taller than the rest.

For the “College” Education level facet, we can observe that while the “Medium” Job Satisfaction density curve is slightly taller than the rest and for both “Bachelor” and “Master” facets, the density curves almost completely overlap.

It is interesting to note that the “Doctor” level facet has an extraordinarily high density curve for the “Medium” Job Satisfaction level of around 0.13 (person per unit of age) at around age 30 and another smaller but outlying curve with a density of around 0.04 (person per unit of age) at around age 55, while the remaining density curves significantly overlapped with each other. It seems that employees that have a “Doctor” level of Education and “Medium” JobSatisfaction have two very specific ranges of Ages.

We explore these relationships further with statistical tests, assuming a significance level of 0.05.

First, we explore the relationships between Job Satisfaction, Age, and Education.

Let's examine the relationship between JobSatisfaction and Education.

## 
##  Pearson's Chi-squared test
## 
## data:  table(HR$Education, HR$JobSatisfaction)
## X-squared = 13.03, df = 12, p-value = 0.3669

The p-value for the chi-square test of independence is over 0.05, which means we fail to reject the null hypothesis of the two variables being independent. This implies that JobSatisfaction and Education are independent. This is a result that is also consistent with our graph.

Now let's explore the relationship between Age and JobSatisfaction and then Age and Education.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Age by JobSatisfaction
## Bartlett's K-squared = 0.19396, df = 3, p-value = 0.9786
## 
##  One-way analysis of means
## 
## data:  Age and JobSatisfaction
## F = 0.051798, num df = 3, denom df = 1466, p-value = 0.9844

These tests all produce a p-value higher than 0.05, so we fail reject the null hypothesis for each. This means that Age likely has similar means and variances for each group of JobSatisfaction. This implies that JobSatisfaction does not have an effect on Age, which agrees with our overall observations of the graph.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Age by Education
## Bartlett's K-squared = 4.5329, df = 4, p-value = 0.3387
## 
##  One-way analysis of means
## 
## data:  Age and Education
## F = 20.842, num df = 4, denom df = 1465, p-value < 2.2e-16

The difference in variances test produces a p-value higher than 0.05, so we fail reject the null hypothesis for this, but the differences in mean test produces a p-value lower than 0.05, so we reject the null hypothesis for that. This means that Age likely has different means and similar variances for each group of Education. This implies that Education does an effect on Age, though the effect is likely small, which agrees with our overall observations of the graph.

Second, we explore the variations we noticed in the graph by conducting tests on Age and JobSatisfaction within “Below College”, “College”, and “Doctor” levels of Education

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Age by JobSatisfaction
## Bartlett's K-squared = 4.4777, df = 3, p-value = 0.2143
## 
##  One-way analysis of means
## 
## data:  Age and JobSatisfaction
## F = 0.7328, num df = 3, denom df = 166, p-value = 0.5338

For the “Below College” education level, we can see that the p-value for both are significantly above 0.05, so we fail to reject the null hypothesis and do not have convincing evidence that the means and variances of Age across various types of JobSatisfaction are different despite what we saw in the graph.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Age by JobSatisfaction
## Bartlett's K-squared = 7.5255, df = 3, p-value = 0.05691
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  Age and JobSatisfaction
## F = 0.35087, num df = 3.00, denom df = 142.03, p-value = 0.7886

For the “College” education level facet, we had noted that there seemed to be differences in variances, though similarity in means of Age across various types of JobSatisfaction. The tests show that the differences in variances we saw were not significant, and that the similar means are significant.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Age by JobSatisfaction
## Bartlett's K-squared = 1.4349, df = 3, p-value = 0.6974
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  Age and JobSatisfaction
## F = 0.35043, num df = 3.000, denom df = 15.168, p-value = 0.7894

For the “Doctor” education level facet, we had noted that there seemed to be differences in variances and means Age across various types of JobSatisfaction. The tests though show that the differences in variances and means that we saw were not significant.

In short, we found that Education has a significant effect on Age, but Age and JobSatisfaction are independent and Education and JobSatisfaction are independent.

We examined the relationship between Age, Education, and JobSatisfaction and became curious about the effect Education, and JobSatisfaction would have on TotalWorkingYears. Thus, we produced a graph examining this.

Graph 4: Density Curves of Total Working Years By Education, Facetted by Job Satisfaction

We then created a faceted stacked density plot on the variables Total Working Years, Education, and Job Satisfaction. This graph shows the distribution of Total Working Years for different levels of Job Satisfaction where each facet represents a different level of Job Satisfaction, ranging from low to very high.

We note that the graph shows contrasting results for “Medium” JobSatisfaction, particularly for employees with a “Doctor” level of Education. It seems that employees that have “Medium” JobSatisfaction and have a “Doctor” level of Education have three very specific ranges of Total Working Years.

From the graph, it appears that the distribution of Total Working Years is similar across all levels of Job Satisfaction. However, there is a slight trend of increasing Total Working Years as Job Satisfaction increases. This trend is most noticeable in the facets representing high and very high JobSatisfaction, with the peaks of employees with “Below College” and “Doctor” Education levels showing a slight shift to the right by about 4 years.

But overall, this graph suggests that Total Working Years may be a weak predictor of JobSatisfaction, as the right skewed pattern is shared in all four graphs based on JobSatisfaction levels and so there is not a clear pattern of association between these two variables.

In addition, the graph shows that Education seems to play a part in Total Working Years as the curves for each level of Education differ slightly.

Furthermore, the graph suggests that JobSatisfaction and Education are independent as these variables showed consistency across all Total Working Years, which makes sense given our previous statistical test over these two variables.

Let's conduct several statistical tests to examine the significance of our findings using 0.05 as our level of significance.

First, we will examine if there's differences in means and variances of Total Working Years and Job Satisfaction.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  TotalWorkingYears by JobSatisfaction
## Bartlett's K-squared = 6.9918, df = 3, p-value = 0.07216
## 
##  One-way analysis of means
## 
## data:  TotalWorkingYears and JobSatisfaction
## F = 0.27594, num df = 3, denom df = 1466, p-value = 0.8428

These tests all produce a p-value higher than 0.05, so we fail reject the null hypothesis for each. This means that Total Working Years likely has similar means and variances for each group of JobSatisfaction. This implies that JobSatisfaction does not have an effect on Total Working Years, which agrees with our observations of the graph.

Second, we will examine if there's differences in means and variances of Total Working Years and Education.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  TotalWorkingYears by Education
## Bartlett's K-squared = 12.236, df = 4, p-value = 0.01568
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  TotalWorkingYears and Education
## F = 8.6458, num df = 4.00, denom df = 275.54, p-value = 1.379e-06

These tests all produce a p-value lower than 0.05, so we reject the null hypothesis for each. This means that Total Working Years likely has different means and variances for each group of Education. This implies that Education does have an effect on Total Working Years. This is shown consistently in the graph as well.

We've already examined the relationship between JobSatisfaction and Education, which we found to be independent.

We conclude that our graphs and statistical tests show similar results, those being that JobSatisfaction and Total Working Years are independent, Education and Total Working Years are dependent, and JobSatisfaction and Education are independent.

Question 2: How do work factors affect employee attrition and job satisfaction?

We explored how work factors affect employee Attrition and JobSatisfaction.

We looked into the relationships between the quantitative variables and Attrition. We used 11 quantitative variables related to work factors in this data set and it's quite difficult to visualize them simultaneously. Thus we performed a Principal Component Analysis on the quantitative variables to see which contributes to the most variances.

Graph 5: Scree Elbow Plot of HR Quantiative Variables

Looking at the scree/elbow plot produced above, we see that after the 3rd component, the proportion of variances starts to become flat. This means that using the first three principal components in our graphics and other analysis is enough to explain the amount of variation to be captured in the data.

Graph 6: Biplot of HR Quantiative Variables

To further see the trends and correlations among different job factors for employee attrition, we displayed a bi-plot using the first two principal components above where the points are colored by Attrition. There are more employees staying at the company than employees leaving the company as the first principal component increases. There doesn't seems to be any relationships with employee attrition and the second principal component. We can see that the variables YearsWithCurrManager, YearsIncurrentRole, YearsAtcompany, and YearsSinceLastPromotion all point the same direction, indicating that they are all correlated. Since they are pointing towards the upper right corner of the bi-plot, these variables are positively associated with the first and second principal components. The variables TotalWorkingYears, Age, and NumCompaniesWorked point downwards towards the right, indicating a positive association with the first principal component but negative association with the second principal component. TrainingTimeLastYear, DistanceFromHome, HourlyRate, and PercentSalaryHike have short arrow lengths which indicate that these variables don't contribute as much to employee attrition compared than the other variables.

We also wanted to see how different departments affect employee Attrition, so we used a mosaic plot to see if there are significantly more or less employee Attrition than we would expect under the null hypothesis of independence across Department.

Graph 7: Mosaic Plot of Employee Attrition by Company Department

The mosaic plot above is shaded by Pearson residuals. We see that the standardized residuals for sales Department and “Yes” employee Attrition (those leaving the company) are significant. This means that employees who are in the sales Department have significantly more employee Attrition. Thus we are able to conclude that Department and employee Attrition are not independent of each other.

After exploring how the 11 quantitative variables affect Attrition, we wanted to see the JobSatisfaction for each JobRole and JobInvolvement.

Graph 8: Stacked Bar Chart of Job Satisfaction for each Job Role, Facetted by Company Department

We created a faceted stacked bar chart to display the conditional distribution of JobSatisfaction for each JobRole. It also displays the conditional distribution of JobInvolvement for each level of JobSatisfaction within each JobRole.

The faceted stacked bar chart indicates that JobSatisfaction is somewhat different for each JobRole. The levels of JobSatisfaction have different counts within each JobRole. The tendency seems to be that either there are high proportions of high and very high JobSatisfaction for each JobRole or that there are somewhat equal proportions of JobSatisfaction for each JobRole. These differences may mean that JobSatisfaction and JobRole are dependent variables. There also appears to little to no relationship between JobInvolvement and JobSatisfaction as the proportions of JobInvolvement seem to be similar for each level of JobSatisfaction, meaning that those are likely independent of one another.

We checked if these observations were statistically significant by conducting statistical tests at a level of significance of 0.05.

First, we investigated the relationship between JobSatisfaction and JobRole.

## 
##  Pearson's Chi-squared test
## 
## data:  table(HR$JobSatisfaction, HR$JobRole)
## X-squared = 18.4, df = 24, p-value = 0.7832

We fail to reject our null hypothesis as our p-value is 0.7832, which is above the level of significance. This indicates that the JobSatisfaction and JobRole are independent.

Next, we examined JobSatisfaction and JobInvolvement.

## 
##  Pearson's Chi-squared test
## 
## data:  table(HR$JobSatisfaction, HR$JobInvolvement)
## X-squared = 7.4214, df = 9, p-value = 0.5933

We again fail to reject our null hypothesis as our p-value is 0.5933, which is above the level of significance. This indicates that the JobSatisfaction and JobInvolvement are independent as well.

While the graph indicates that JobSatisfaction and JobRole may have a relationship, the chi-square test states that this is not statistically significant. The graph and the chi-square test of independence indicate that JobInvolvement and JobSatisfaction are independent variables.

Question 3: How do different factors affect each other?

We're particularly interested in the effect variables had on MonthlyIncome. We investigated JobLevel, YearsAtCompany, and TotalWorkingYears.

Graph 9: Scatterplot of Monthly Income vs. Years at Company, with Linear Regression Models for each Job Level

Above is a scatterplot depicting relationship between Monthly Income and Years at Company and Job Level. We created a linear regression model for each group of JobLevel.

There seems to be a somewhat positive relationship between MonthlyIncome and YearsAtCompany, though it is also common for an employee to have a relatively low number of YearsAtCompany and a high MonthlyIncome. There are no employees with a high number of YearsAtCompany and a low MonthlyIncome.

There is also a relationship between JobLevel and MonthlyIncome. The graph displays five groups of each JobLevel, and there seems to be specific range of MonthlyIncome per JobLevel. Each group has a different mean and variance, indicated by the regression lines shown in the grap.

However, there does not seem to be much relationship between JobLevel and YearsAtCompany. They seem to be independent. Each JobLevel has a mostly horizontal regression line. Although, some groups of JobLevel do differ in range of YearsAtCompany, and thus would likely have different means and variances.

We checked if our observations are statistically significant by conducting statistical tests at a level of significance of 0.05.

First, we investigated the relationship between MonthlyIncome and YearsAtCompany

## 
## Call:
## lm(formula = MonthlyIncome ~ YearsAtCompany, data = HR)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9504  -2499  -1188   1393  15484 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3733.3      160.1   23.32   <2e-16 ***
## YearsAtCompany    395.2       17.2   22.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4039 on 1468 degrees of freedom
## Multiple R-squared:  0.2645, Adjusted R-squared:  0.264 
## F-statistic: 527.9 on 1 and 1468 DF,  p-value: < 2.2e-16

The p-values for this regression model are less than 0.05, indicating that this model is statistically significant. The two variables, MonthlyIncome and YearsAtCompany have a positive relationship as MonthlyIncome increases by about 395.2 with each increase in YearsAtCompany.

Next we examined the relationship between MonthlyIncome and JobLevel. The graph showed that these seem to be dependent, but we wanted to see if this observation is statistically significant with our chosen significance level of 0.05.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  MonthlyIncome by JobLevel
## Bartlett's K-squared = 400.18, df = 4, p-value < 2.2e-16
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  MonthlyIncome and JobLevel
## F = 14732, num df = 4.00, denom df = 317.07, p-value < 2.2e-16

The p-value of these tests are below 0.05, indicating that its results are statistically significant and that we can reject the null hypothesis for each. We can conclude that each group of JobLevel likely has different means and differen variances of MonthlyIncome, which is implied with the graph.

Finally, we examined the relationship between YearsAtCompany and JobLevel.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  YearsAtCompany by JobLevel
## Bartlett's K-squared = 497.95, df = 4, p-value < 2.2e-16
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  YearsAtCompany and JobLevel
## F = 103.93, num df = 4.0, denom df = 275.8, p-value < 2.2e-16

The p-value of these tests are below 0.05, indicating that its results are statistically significant and that we can reject the null hypothesis for each. We can conclude that each group of JobLevel likely has different means and variances of YearsAtCompany, which is implied with the graph.

While the two one-way analysis of means tests show that MonthlyIncome and YearsAtCompany are likely affected by JobLevel, these tests do not show how much these variables are affected.

We would suggest constructing an ordinal regression model involving these variables due to JobLevel being an ordinal variable to discover this information.

We noted that YearsAtCompany didn't appear to influence MonthlyIncome at times as there were often people who had a low number of YearsAtCompany but a high MonthlyIncome. This made us interested in looking at the relationship between YearsAtCompany, TotalWorkingYears, and MonthlyIncome.

Graph 10: Heatmap of Years At Company vs. Total Working Years, Points Colored by Monthly Income.

The graph above is a heat map of YearsAtCompany and TotalWorkingYears colored by MonthlyIncome. The red density scale demonstrates the density of the plot points. The darker the red, the more of that data point there is. The lightblue to black scale shows the density of MonthlyIncome for each data point.

This graph shows two distinct groups. There's one group that of employees; one group that has mostly worked at the one company for their entire career and another group that has worked at other companies for 0 to 20 years before coming to work at this company. The most common employees though have between 0 to 10 TotalWorkingYears and 0 to 10 TotalWorkingYears as shown by the density scale. In addition, MonthlyIncome is relatively low from 0 to 10 TotalWorkingYears, relatively medium from 10 to 20 TotalWorkingYears, and relatively high from 20 to 40 TotalWorkingYears.

Overall, the graph appears to depict a positive correlation between TotalWorkingYears and YearsAtCompany, a strong positive correlation between TotalWorkingYears and MonthlyIncome, and nearly no relationship between YearsAtCompany and MonthlyIncome. It is important to note that this graph has a limit as YearsAtCompany cannot exceed TotalWorkingYears.

To examine the relationships on these three quantitative variables, we modeled two linear regression models.

## 
## Call:
## lm(formula = YearsAtCompany ~ TotalWorkingYears + MonthlyIncome, 
##     data = HR)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.423  -1.986   0.025   2.686  19.683 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.315e+00  2.247e-01   5.853 5.94e-09 ***
## TotalWorkingYears 4.510e-01  2.517e-02  17.923  < 2e-16 ***
## MonthlyIncome     9.310e-05  4.159e-05   2.238   0.0253 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.762 on 1467 degrees of freedom
## Multiple R-squared:  0.3966, Adjusted R-squared:  0.3958 
## F-statistic: 482.1 on 2 and 1467 DF,  p-value: < 2.2e-16

The p-values demonstrate statistical significance. There is a small, positive relationship between YearsAtCompany and TotalWorkingYears and a small, positive relationship between YearsAtCompany and MonthlyIncome.

## 
## Call:
## lm(formula = MonthlyIncome ~ YearsAtCompany + TotalWorkingYears, 
##     data = HR)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10769.4  -1750.9    -65.7   1364.6  11407.0 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1175.68     139.09   8.453   <2e-16 ***
## YearsAtCompany       36.56      16.33   2.238   0.0253 *  
## TotalWorkingYears   449.58      12.86  34.957   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2984 on 1467 degrees of freedom
## Multiple R-squared:  0.5987, Adjusted R-squared:  0.5982 
## F-statistic:  1094 on 2 and 1467 DF,  p-value: < 2.2e-16

The p-values demonstrate statistical significance. There is a small, positive relationship between YearsAtCompany and MonthlyIncome and a large, positive relationship between TotalWorkingYears and MonthlyIncome.

We can conclude that all these variables have a weak positive relationship with each other, except for TotalWorkingYears and MonthlyIncome. These two have a strong positive correlation.

Conclusion

Through our analysis of data visualizations from this dataset, we were able to gain insights into our three research questions.

We found that there is no observed relationship between the demographic variables Gender, Education, Marital Status, Age, and TotalWorkingYears to JobSatisfaction according to the dendogram and mosaic plots we created. We did discover though that Age and Education were dependent on one another as well as TotalWorkingYears and Education.

By conducting Principal Component Analysis, we found using the first three principal components in our graphics and other analysis would suffice. We only plotted the first two principal components in our bi-plot. It showed us that employee Attrition is related to YearsWithCurrManager, YearsIncurrentRole, YearsAtcompany, YearsSinceLastPromotion, TotalWorkingYears, Age, and NumCompaniesWorked. On the other hand, TrainingTimeLastYear, DistanceFromHome, HourlyRate, and PercentSalaryHike don't contribute as much to employee Attrition. JobSatisfaction and JobRole may have a relationship based on the faceted bar plot but the chi-square test states that this is not statistically significant. We also found that JobInvolvement and JobSatisfaction are independent variables.

The relationships implied from the scatter plot and heat maps agree with our linear regression models. After exploring the relationships among JobLevel, MonthlyIncome, YearsAtCompany, and TotalWorkingYears, we concluded that the latter three variables all have a weak positive relationship with each other, except for TotalWorkingYears and MonthlyIncome. These two have a strong positive correlation. In order to find the estimated effect JobLevel has on any of these three variables, we would need to perform an ordinal regression model. This is a technique we have yet to learn.

For further analysis, we would like to see the times when JobSatifaction and Attrition were high or low. By working with time related variables, we could look deeper into the reasons for this rise or drop. It will give companies a better idea of necessary steps to take to retain employees. This dataset unfortunately only collected data from one time period, so more data would be needed to answer this question.