Dataset Description

“College Majors” is a dataset from FiveThirtyEight containing variables centered around 172 college majors and their subjects’ employment in the years following graduation. The recent-grads subset of the data contains information about graduates under the age of 28, with the following 21 variables.

Data Exploration

Firstly, we wanted to learn about whether popularity of each major category has any correlation to the median income of graduates within the discipline, suggesting we should examine the Total and median income variables. To examine these for each category, we made a new dataset consisting of major categories as well as the sum of their totals and the average of their median incomes.

We see the largest number of students are business majors, and the smallest number of students are interdisciplinary majors. We then wanted to see if there was a relationship between the popularity and the median income of each major category, perhaps suggesting a reason it could be popular.

The trend of the graph suggests there’s no relationship between total number of graduates and median income, however there are some notable outliers such as the much higher than average engineering median income and, as seen before, the very popular business degree.

We also wanted to better understand how are women represented in the college majors, and how well do those majors do after college? To do this we plotted the ShareWomen variable against the Median variable, using that as the scale of success.

There is a large cluster of majors that have between 50-75% women and are on the lower end of median earnings after college. The more we increase both the percentage of women and median earnings, the sparser we get. There appears to be a slight negative relationship between median earnings and percentage of women.

We then wanted to look at the distribution of women in STEM vs non-STEM majors because the median income of majors with greater than 50% women appears to be lower than those with less.

The data suggests that for are more non-STEM majors with majority women than STEM majors with majority women.

Additionally, to continue the exploration we dive into the unemployment rate of the graduates using the Unemployment_rate and Major variables. To see if majors with majority woman graduates have a higher unemployment rate, the points are colored to discern between majors that are majority women and those that are not.

## 
## Call:
## lm(formula = ShareWomen ~ Unemployment_rate, data = college)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4843 -0.1804  0.0153  0.1768  0.4623 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         0.4843     0.0434  11.160   <2e-16 ***
## Unemployment_rate   0.5579     0.5829   0.957     0.34    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2313 on 170 degrees of freedom
## Multiple R-squared:  0.005359,   Adjusted R-squared:  -0.0004919 
## F-statistic: 0.9159 on 1 and 170 DF,  p-value: 0.3399

We ran a linear regression test to test whether the proportion of women in a major has a relationship with unemployment rate. The null hypothesis states that there is no relationship between share of women and unemployment rate. The linear regression produced a p-value of 0.3399, prompting us to accept the null hypothesis that there is no relationship between the two variables.

We then wondered, how does unemployment after college vary by major category? To look into this, we added the sum of unemployed graduates for each major category and analyzed it individually.

Based on the graph, we can see that most major categories have around the same unemployment totals after graduation, with a few outliers. The mean of the unemployment totals is 26,105 people. We can see that a majority of the data points are around there. The main outliers are business and humanities/liberal arts. The major category with the lowest unemployment total is interdisciplinary. This might be because interdisciplinary is not as common of a major category as the others are, thus resulting in a lower unemployment total as well.