Employee attrition, the gradual reduction in employees as they leave and aren't replaced, is an issue that every company faces. It greatly impacts not only the company but the industry as well. In today's competitive business market, attrition results in negative impacts and complex challenges for the company. Some attrition is unavoidable but identifying people-related trends or patterns will allow companies, especially the HR department, to take the necessary steps to improve the organization and ensure the company runs smoothly in the future. We will explore how different factors affect employee attrition and job satisfaction.
The dataset used was obtained from Kaggle from the following link: https://www.kaggle.com/datasets/rishikeshkonapure/hr-analytics-prediction. It includes information about 1470 employees each with 35 attributes ranging from demographics to work factors. We cleaned the data by removing several variables that we perceived as overlap such as DailyRate, HourlyRate, and MonthlyRate as they are all related using calculations.
These are the variables we are focusing on in our analysis:
Age
- age of employee
Attrition
- employee leaving the
company
Department
- employee department
DistanceFromHome
- distance from home to
office in kilometers
Education
- qualification of employee
(1=Below College, 2=College, 3=Bachelor, 4=Master, 5=Doctor)
Gender
- gender of employee
HourlyRate
- hourly salary/rate
JobInvolvement
- job involvement (1=Low,
2=Medium, 3=High, 4=Very High)
JobLevel
- level of job (1=Entry-level,
2=Intermediate, 3=Mid-level, 4=Senior, 5=Executive)
JobRole
- job role of employee
JobSatisfaction
- satisfaction with the
job (1=Low, 2=Medium, 3=High, 4=Very High)
MaritalStatus
- whether employee is
married or not (divorced, married, single)
NumCompaniesWorked
- number of companies
employee worked for
PercentSalaryHike
- percentage increase in
salary
TotalWorkingYears
- total years
worked
TrainingTimesLastYear
- hours spent
training
YearsAtCompany
- total number of years at
the company
YearsInCurrentRole
- total years in
current role
YearsSinceLastPromotion
- years since last
promotion
YearsWithCurrManager
- years worked under
current manager
We will address these three research
questions:
1. How do demographics affect job satisfaction?
2. How do work factors affect employee attrition and job
satisfaction?
3. How do different factors affect each other?
We looked into the relationships between the quantitative variables
and JobSatisfaction
. We specifically
analyzed 3 categorical and 2 quantitative variables related to
demographics, namely: Gender
,
Education
,
Marital Status
,
Age
, and
TotalWorkingYears
to discover their
relationship with JobSatisfaction
. We also
explored the effects Education
and
JobSatisfaction
had on
Age
and
TotalWorkingYears
.
Above is a dendrogram depicting the relationship between employee
demographic variables (Gender
,
Education
,
Marital Status
) and
JobSatisfaction
. The dendrogram was
constructed using hierarchical clustering and leaves are colored based
on job satisfaction levels.
Upon inspection of the dendrogram, there does not appear to be a
strong relationship between any of the demographic variables and
JobSatisfaction
as there are too many
leaves that do not form a distinct cluster.
We created several mosaic plots to examine if our findings are true, but have only included one for demonstration purposes.
The above is a mosaic plot for Gender
and JobSatisfaction
. The observed
frequencies in each category are exactly what would be expected under
the null hypothesis of independence, indicating that the two variables
are independent. Similar null results were displayed with the other
variables Education
and
Marital Status
.
Thus, we have evidence that
JobSatisfaction
is independent of
Gender
,
Education
, and
Marital Status
.
We've examined three of the four demographic variables so far. We
still need to further examine the relationship between
JobSatisfaction
and
Age
. We're also curious about whether
Education
affects this relationship.
Thus, we created a stacked density plot was created with
Age
on the x-axis, and
Job Satisfaction
represented by colored
density curves. The plot was then faceted by
Education
, allowing for a more detailed
examination of the relationship between
Job Satisfaction
,
Education
, and
Age
.
This graph shows the distribution of
Job Satisfaction
over
Age
for different levels of
Education
.
Overall, the density curves for
Job Satisfaction
levels, ranging from low
to very high, share similar distributions and overlap substantially
across the Education
facets. This suggests
that Education
levels are not strongly
associated with Job Satisfaction
.
Age
does seem to have some effect on
Job Satisfaction
as the density curves of
Job Satisfaction
over
Age
differ somewhat.
However, we note that the plot shows some variation in the
relationship between Age
,
Job Satisfaction
, and
Education
level.
The “Below College” Education
facet
shows less of an overlap between
Job Satisfaction
density curves as the
“Very High” Job Satisfaction
density curve
is slightly taller than the rest.
For the “College” Education
level
facet, we can observe that while the “Medium”
Job Satisfaction
density curve is slightly
taller than the rest and for both “Bachelor” and “Master” facets, the
density curves almost completely overlap.
It is interesting to note that the “Doctor” level facet has an
extraordinarily high density curve for the “Medium”
Job Satisfaction
level of around 0.13
(person per unit of age) at around age 30 and another smaller but
outlying curve with a density of around 0.04 (person per unit of age) at
around age 55, while the remaining density curves significantly
overlapped with each other. It seems that employees that have a “Doctor”
level of Education
and “Medium”
JobSatisfaction
have two very specific
ranges of Ages
.
We explore these relationships further with statistical tests, assuming a significance level of 0.05.
First, we explore the relationships between
Job Satisfaction
,
Age
, and
Education
.
Let's examine the relationship between
JobSatisfaction
and
Education
.
##
## Pearson's Chi-squared test
##
## data: table(HR$Education, HR$JobSatisfaction)
## X-squared = 13.03, df = 12, p-value = 0.3669
The p-value for the chi-square test of independence is over 0.05,
which means we fail to reject the null hypothesis of the two variables
being independent. This implies that
JobSatisfaction
and
Education
are independent. This is a
result that is also consistent with our graph.
Now let's explore the relationship between
Age
and
JobSatisfaction
and then
Age
and
Education
.
##
## Bartlett test of homogeneity of variances
##
## data: Age by JobSatisfaction
## Bartlett's K-squared = 0.19396, df = 3, p-value = 0.9786
##
## One-way analysis of means
##
## data: Age and JobSatisfaction
## F = 0.051798, num df = 3, denom df = 1466, p-value = 0.9844
These tests all produce a p-value higher than 0.05, so we fail reject
the null hypothesis for each. This means that
Age
likely has similar means and variances
for each group of JobSatisfaction
. This
implies that JobSatisfaction
does not have
an effect on Age
, which agrees with our
overall observations of the graph.
##
## Bartlett test of homogeneity of variances
##
## data: Age by Education
## Bartlett's K-squared = 4.5329, df = 4, p-value = 0.3387
##
## One-way analysis of means
##
## data: Age and Education
## F = 20.842, num df = 4, denom df = 1465, p-value < 2.2e-16
The difference in variances test produces a p-value higher than 0.05,
so we fail reject the null hypothesis for this, but the differences in
mean test produces a p-value lower than 0.05, so we reject the null
hypothesis for that. This means that Age
likely has different means and similar variances for each group of
Education
. This implies that
Education
does an effect on
Age
, though the effect is likely small,
which agrees with our overall observations of the graph.
Second, we explore the variations we noticed in the graph by
conducting tests on Age
and
JobSatisfaction
within “Below College”,
“College”, and “Doctor” levels of
Education
##
## Bartlett test of homogeneity of variances
##
## data: Age by JobSatisfaction
## Bartlett's K-squared = 4.4777, df = 3, p-value = 0.2143
##
## One-way analysis of means
##
## data: Age and JobSatisfaction
## F = 0.7328, num df = 3, denom df = 166, p-value = 0.5338
For the “Below College” education level, we can see that the p-value
for both are significantly above 0.05, so we fail to reject the null
hypothesis and do not have convincing evidence that the means and
variances of Age
across various types of
JobSatisfaction
are different despite what
we saw in the graph.
##
## Bartlett test of homogeneity of variances
##
## data: Age by JobSatisfaction
## Bartlett's K-squared = 7.5255, df = 3, p-value = 0.05691
##
## One-way analysis of means (not assuming equal variances)
##
## data: Age and JobSatisfaction
## F = 0.35087, num df = 3.00, denom df = 142.03, p-value = 0.7886
For the “College” education level facet, we had noted that there
seemed to be differences in variances, though similarity in means of
Age
across various types of
JobSatisfaction
. The tests show that the
differences in variances we saw were not significant, and that the
similar means are significant.
##
## Bartlett test of homogeneity of variances
##
## data: Age by JobSatisfaction
## Bartlett's K-squared = 1.4349, df = 3, p-value = 0.6974
##
## One-way analysis of means (not assuming equal variances)
##
## data: Age and JobSatisfaction
## F = 0.35043, num df = 3.000, denom df = 15.168, p-value = 0.7894
For the “Doctor” education level facet, we had noted that there
seemed to be differences in variances and means
Age
across various types of
JobSatisfaction
. The tests though show
that the differences in variances and means that we saw were not
significant.
In short, we found that Education
has a
significant effect on Age
, but
Age
and
JobSatisfaction
are independent and
Education
and
JobSatisfaction
are independent.
We examined the relationship between
Age
,
Education
, and
JobSatisfaction
and became curious about
the effect Education
, and
JobSatisfaction
would have on
TotalWorkingYears
. Thus, we produced a
graph examining this.
We then created a faceted stacked density plot on the variables
Total Working Years
,
Education
, and
Job Satisfaction
. This graph shows the
distribution of Total Working Years
for
different levels of Job Satisfaction
where
each facet represents a different level of
Job Satisfaction
, ranging from low to very
high.
We note that the graph shows contrasting results for “Medium”
JobSatisfaction
, particularly for
employees with a “Doctor” level of
Education
. It seems that employees that
have “Medium” JobSatisfaction
and have a
“Doctor” level of Education
have three
very specific ranges of
Total Working Years
.
From the graph, it appears that the distribution of
Total Working Years
is similar across all
levels of Job Satisfaction
. However, there
is a slight trend of increasing
Total Working Years
as
Job Satisfaction
increases. This trend is
most noticeable in the facets representing high and very high
JobSatisfaction
, with the peaks of
employees with “Below College” and “Doctor”
Education
levels showing a slight shift to
the right by about 4 years.
But overall, this graph suggests that
Total Working Years
may be a weak
predictor of JobSatisfaction
, as the right
skewed pattern is shared in all four graphs based on
JobSatisfaction
levels and so there is not
a clear pattern of association between these two variables.
In addition, the graph shows that
Education
seems to play a part in
Total Working Years
as the curves for each
level of Education
differ slightly.
Furthermore, the graph suggests that
JobSatisfaction
and
Education
are independent as these
variables showed consistency across all
Total Working Years
, which makes sense
given our previous statistical test over these two variables.
Let's conduct several statistical tests to examine the significance of our findings using 0.05 as our level of significance.
First, we will examine if there's differences in means and variances
of Total Working Years
and
Job Satisfaction
.
##
## Bartlett test of homogeneity of variances
##
## data: TotalWorkingYears by JobSatisfaction
## Bartlett's K-squared = 6.9918, df = 3, p-value = 0.07216
##
## One-way analysis of means
##
## data: TotalWorkingYears and JobSatisfaction
## F = 0.27594, num df = 3, denom df = 1466, p-value = 0.8428
These tests all produce a p-value higher than 0.05, so we fail reject
the null hypothesis for each. This means that
Total Working Years
likely has similar
means and variances for each group of
JobSatisfaction
. This implies that
JobSatisfaction
does not have an effect on
Total Working Years
, which agrees with our
observations of the graph.
Second, we will examine if there's differences in means and variances
of Total Working Years
and
Education
.
##
## Bartlett test of homogeneity of variances
##
## data: TotalWorkingYears by Education
## Bartlett's K-squared = 12.236, df = 4, p-value = 0.01568
##
## One-way analysis of means (not assuming equal variances)
##
## data: TotalWorkingYears and Education
## F = 8.6458, num df = 4.00, denom df = 275.54, p-value = 1.379e-06
These tests all produce a p-value lower than 0.05, so we reject the
null hypothesis for each. This means that
Total Working Years
likely has different
means and variances for each group of
Education
. This implies that
Education
does have an effect on
Total Working Years
. This is shown
consistently in the graph as well.
We've already examined the relationship between
JobSatisfaction
and
Education
, which we found to be
independent.
We conclude that our graphs and statistical tests show similar
results, those being that JobSatisfaction
and Total Working Years
are independent,
Education
and
Total Working Years
are dependent, and
JobSatisfaction
and
Education
are independent.
We explored how work factors affect employee
Attrition
and
JobSatisfaction
.
We looked into the relationships between the quantitative variables
and Attrition
. We used 11 quantitative
variables related to work factors in this data set and it's quite
difficult to visualize them simultaneously. Thus we performed a
Principal Component Analysis on the quantitative variables to see which
contributes to the most variances.
Looking at the scree/elbow plot produced above, we see that after the 3rd component, the proportion of variances starts to become flat. This means that using the first three principal components in our graphics and other analysis is enough to explain the amount of variation to be captured in the data.
To further see the trends and correlations among different job
factors for employee attrition, we displayed a bi-plot using the first
two principal components above where the points are colored by
Attrition
. There are more employees
staying at the company than employees leaving the company as the first
principal component increases. There doesn't seems to be any
relationships with employee attrition and the second principal
component. We can see that the variables
YearsWithCurrManager
,
YearsIncurrentRole
,
YearsAtcompany
, and
YearsSinceLastPromotion
all point the same
direction, indicating that they are all correlated. Since they are
pointing towards the upper right corner of the bi-plot, these variables
are positively associated with the first and second principal
components. The variables
TotalWorkingYears
,
Age
, and
NumCompaniesWorked
point downwards towards
the right, indicating a positive association with the first principal
component but negative association with the second principal component.
TrainingTimeLastYear
,
DistanceFromHome
,
HourlyRate
, and
PercentSalaryHike
have short arrow lengths
which indicate that these variables don't contribute as much to employee
attrition compared than the other variables.
We also wanted to see how different departments affect employee
Attrition
, so we used a mosaic plot to see
if there are significantly more or less employee
Attrition
than we would expect under the
null hypothesis of independence across
Department
.
The mosaic plot above is shaded by Pearson residuals. We see that the
standardized residuals for sales
Department
and “Yes” employee
Attrition
(those leaving the company) are
significant. This means that employees who are in the sales
Department
have significantly more
employee Attrition
. Thus we are able to
conclude that Department
and employee
Attrition
are not independent of each
other.
After exploring how the 11 quantitative variables affect
Attrition
, we wanted to see the
JobSatisfaction
for each
JobRole
and
JobInvolvement
.
We created a faceted stacked bar chart to display the conditional
distribution of JobSatisfaction
for each
JobRole
. It also displays the conditional
distribution of JobInvolvement
for each
level of JobSatisfaction
within each
JobRole
.
The faceted stacked bar chart indicates that
JobSatisfaction
is somewhat different for
each JobRole
. The levels of
JobSatisfaction
have different counts
within each JobRole
. The tendency seems to
be that either there are high proportions of high and very high
JobSatisfaction
for each
JobRole
or that there are somewhat equal
proportions of JobSatisfaction
for each
JobRole
. These differences may mean that
JobSatisfaction
and
JobRole
are dependent variables. There
also appears to little to no relationship between
JobInvolvement
and
JobSatisfaction
as the proportions of
JobInvolvement
seem to be similar for each
level of JobSatisfaction
, meaning that
those are likely independent of one another.
We checked if these observations were statistically significant by conducting statistical tests at a level of significance of 0.05.
First, we investigated the relationship between
JobSatisfaction
and
JobRole
.
##
## Pearson's Chi-squared test
##
## data: table(HR$JobSatisfaction, HR$JobRole)
## X-squared = 18.4, df = 24, p-value = 0.7832
We fail to reject our null hypothesis as our p-value is 0.7832, which
is above the level of significance. This indicates that the
JobSatisfaction
and
JobRole
are independent.
Next, we examined JobSatisfaction
and
JobInvolvement
.
##
## Pearson's Chi-squared test
##
## data: table(HR$JobSatisfaction, HR$JobInvolvement)
## X-squared = 7.4214, df = 9, p-value = 0.5933
We again fail to reject our null hypothesis as our p-value is 0.5933,
which is above the level of significance. This indicates that the
JobSatisfaction
and
JobInvolvement
are independent as
well.
While the graph indicates that
JobSatisfaction
and
JobRole
may have a relationship, the
chi-square test states that this is not statistically significant. The
graph and the chi-square test of independence indicate that
JobInvolvement
and
JobSatisfaction
are independent
variables.
We're particularly interested in the effect variables had on
MonthlyIncome
. We investigated
JobLevel
,
YearsAtCompany
, and
TotalWorkingYears
.
Above is a scatterplot depicting relationship between Monthly Income
and Years at Company and Job Level. We created a linear regression model
for each group of JobLevel
.
There seems to be a somewhat positive relationship between
MonthlyIncome
and
YearsAtCompany
, though it is also common
for an employee to have a relatively low number of
YearsAtCompany
and a high
MonthlyIncome
. There are no employees with
a high number of YearsAtCompany
and a low
MonthlyIncome
.
There is also a relationship between
JobLevel
and
MonthlyIncome
. The graph displays five
groups of each JobLevel
, and there seems
to be specific range of MonthlyIncome
per
JobLevel
. Each group has a different mean
and variance, indicated by the regression lines shown in the grap.
However, there does not seem to be much relationship between
JobLevel
and
YearsAtCompany
. They seem to be
independent. Each JobLevel
has a mostly
horizontal regression line. Although, some groups of
JobLevel
do differ in range of
YearsAtCompany
, and thus would likely have
different means and variances.
We checked if our observations are statistically significant by conducting statistical tests at a level of significance of 0.05.
First, we investigated the relationship between
MonthlyIncome
and
YearsAtCompany
##
## Call:
## lm(formula = MonthlyIncome ~ YearsAtCompany, data = HR)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9504 -2499 -1188 1393 15484
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3733.3 160.1 23.32 <2e-16 ***
## YearsAtCompany 395.2 17.2 22.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4039 on 1468 degrees of freedom
## Multiple R-squared: 0.2645, Adjusted R-squared: 0.264
## F-statistic: 527.9 on 1 and 1468 DF, p-value: < 2.2e-16
The p-values for this regression model are less than 0.05, indicating
that this model is statistically significant. The two variables,
MonthlyIncome
and
YearsAtCompany
have a positive
relationship as MonthlyIncome
increases by
about 395.2 with each increase in
YearsAtCompany
.
Next we examined the relationship between
MonthlyIncome
and
JobLevel
. The graph showed that these seem
to be dependent, but we wanted to see if this observation is
statistically significant with our chosen significance level of
0.05.
##
## Bartlett test of homogeneity of variances
##
## data: MonthlyIncome by JobLevel
## Bartlett's K-squared = 400.18, df = 4, p-value < 2.2e-16
##
## One-way analysis of means (not assuming equal variances)
##
## data: MonthlyIncome and JobLevel
## F = 14732, num df = 4.00, denom df = 317.07, p-value < 2.2e-16
The p-value of these tests are below 0.05, indicating that its
results are statistically significant and that we can reject the null
hypothesis for each. We can conclude that each group of
JobLevel
likely has different means and
differen variances of MonthlyIncome
, which
is implied with the graph.
Finally, we examined the relationship between
YearsAtCompany
and
JobLevel
.
##
## Bartlett test of homogeneity of variances
##
## data: YearsAtCompany by JobLevel
## Bartlett's K-squared = 497.95, df = 4, p-value < 2.2e-16
##
## One-way analysis of means (not assuming equal variances)
##
## data: YearsAtCompany and JobLevel
## F = 103.93, num df = 4.0, denom df = 275.8, p-value < 2.2e-16
The p-value of these tests are below 0.05, indicating that its
results are statistically significant and that we can reject the null
hypothesis for each. We can conclude that each group of
JobLevel
likely has different means and
variances of YearsAtCompany
, which is
implied with the graph.
While the two one-way analysis of means tests show that
MonthlyIncome
and
YearsAtCompany
are likely affected by
JobLevel
, these tests do not show how much
these variables are affected.
We would suggest constructing an ordinal regression model involving
these variables due to JobLevel
being an
ordinal variable to discover this information.
We noted that YearsAtCompany
didn't
appear to influence MonthlyIncome
at times
as there were often people who had a low number of
YearsAtCompany
but a high
MonthlyIncome
. This made us interested in
looking at the relationship between
YearsAtCompany
,
TotalWorkingYears
, and
MonthlyIncome
.
The graph above is a heat map of
YearsAtCompany
and
TotalWorkingYears
colored by
MonthlyIncome
. The red density scale
demonstrates the density of the plot points. The darker the red, the
more of that data point there is. The lightblue to black scale shows the
density of MonthlyIncome
for each data
point.
This graph shows two distinct groups. There's one group that of
employees; one group that has mostly worked at the one company for their
entire career and another group that has worked at other companies for 0
to 20 years before coming to work at this company. The most common
employees though have between 0 to 10
TotalWorkingYears
and 0 to 10
TotalWorkingYears
as shown by the density
scale. In addition, MonthlyIncome
is
relatively low from 0 to 10
TotalWorkingYears
, relatively medium from
10 to 20 TotalWorkingYears
, and relatively
high from 20 to 40 TotalWorkingYears
.
Overall, the graph appears to depict a positive correlation between
TotalWorkingYears
and
YearsAtCompany
, a strong positive
correlation between TotalWorkingYears
and
MonthlyIncome
, and nearly no relationship
between YearsAtCompany
and
MonthlyIncome
. It is important to note
that this graph has a limit as
YearsAtCompany
cannot exceed
TotalWorkingYears
.
To examine the relationships on these three quantitative variables, we modeled two linear regression models.
##
## Call:
## lm(formula = YearsAtCompany ~ TotalWorkingYears + MonthlyIncome,
## data = HR)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.423 -1.986 0.025 2.686 19.683
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.315e+00 2.247e-01 5.853 5.94e-09 ***
## TotalWorkingYears 4.510e-01 2.517e-02 17.923 < 2e-16 ***
## MonthlyIncome 9.310e-05 4.159e-05 2.238 0.0253 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.762 on 1467 degrees of freedom
## Multiple R-squared: 0.3966, Adjusted R-squared: 0.3958
## F-statistic: 482.1 on 2 and 1467 DF, p-value: < 2.2e-16
The p-values demonstrate statistical significance. There is a small,
positive relationship between
YearsAtCompany
and
TotalWorkingYears
and a small, positive
relationship between YearsAtCompany
and
MonthlyIncome
.
##
## Call:
## lm(formula = MonthlyIncome ~ YearsAtCompany + TotalWorkingYears,
## data = HR)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10769.4 -1750.9 -65.7 1364.6 11407.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1175.68 139.09 8.453 <2e-16 ***
## YearsAtCompany 36.56 16.33 2.238 0.0253 *
## TotalWorkingYears 449.58 12.86 34.957 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2984 on 1467 degrees of freedom
## Multiple R-squared: 0.5987, Adjusted R-squared: 0.5982
## F-statistic: 1094 on 2 and 1467 DF, p-value: < 2.2e-16
The p-values demonstrate statistical significance. There is a small,
positive relationship between
YearsAtCompany
and
MonthlyIncome
and a large, positive
relationship between TotalWorkingYears
and
MonthlyIncome
.
We can conclude that all these variables have a weak positive
relationship with each other, except for
TotalWorkingYears
and
MonthlyIncome
. These two have a strong
positive correlation.
Through our analysis of data visualizations from this dataset, we were able to gain insights into our three research questions.
We found that there is no observed relationship between the
demographic variables Gender
,
Education
,
Marital Status
,
Age
, and
TotalWorkingYears
to
JobSatisfaction
according to the dendogram
and mosaic plots we created. We did discover though that
Age
and
Education
were dependent on one another as
well as TotalWorkingYears
and
Education
.
By conducting Principal Component Analysis, we found using the first
three principal components in our graphics and other analysis would
suffice. We only plotted the first two principal components in our
bi-plot. It showed us that employee
Attrition
is related to
YearsWithCurrManager
,
YearsIncurrentRole
,
YearsAtcompany
,
YearsSinceLastPromotion
,
TotalWorkingYears
,
Age
, and
NumCompaniesWorked
. On the other hand,
TrainingTimeLastYear
,
DistanceFromHome
,
HourlyRate
, and
PercentSalaryHike
don't contribute as much
to employee Attrition
.
JobSatisfaction
and
JobRole
may have a relationship based on
the faceted bar plot but the chi-square test states that this is not
statistically significant. We also found that
JobInvolvement
and
JobSatisfaction
are independent
variables.
The relationships implied from the scatter plot and heat maps agree
with our linear regression models. After exploring the relationships
among JobLevel
,
MonthlyIncome
,
YearsAtCompany
, and
TotalWorkingYears
, we concluded that the
latter three variables all have a weak positive relationship with each
other, except for TotalWorkingYears
and
MonthlyIncome
. These two have a strong
positive correlation. In order to find the estimated effect
JobLevel
has on any of these three
variables, we would need to perform an ordinal regression model. This is
a technique we have yet to learn.
For further analysis, we would like to see the times when
JobSatifaction
and
Attrition
were high or low. By working
with time related variables, we could look deeper into the reasons for
this rise or drop. It will give companies a better idea of necessary
steps to take to retain employees. This dataset unfortunately only
collected data from one time period, so more data would be needed to
answer this question.