Our dataset contains information about different college majors and the wages/employment of people in those majors. Each row of the data is a college major, and the columns describe information about a sample taken of people who had that college major. There are 173 rows of total data. 76 of these rows are data on stem majors, which contains extra information about gender.
Categorical variables: -Major: Corresponds to each major. -Major_category: Broadly groups the majors into 5 main categories: Engineering, Computers & Mathematics, Physical Sciences, Biology & Life Science and Health -is_stem: Represents wheither the major is classified as STEM (Science-Technology-Engineering-Math). True corresponds to STEM majors -Median_cat: Groups the median income for a major into two categories: High and low.
Quantitative variables: -Total.x: Represents the total number of students in a particular major -Men: Represents the number of male students with a particular major -Women: Represents the number of female students with a particular major -Median.x: Represents the median income of a certain major -ShareWomen: Represents the percentage of people studying a certain major that are female -Unemployment_rate: The percent of people with a certain major that are unemployed.
One of our main research objectives is to examine inequalities between women and men in the workforce and how it relates to college majors. We also aim to figure out ways to help students decide what field they should try to study in, as well as if they are recommended to go to grad school or not, and to help colleges pick which majors they should advertise and focus their resources on.
More concretely, these are our 3 main research questions:
First Graph
To determine whether the share of women has an effect on median salary, we will look at the variables ShareWomen, and Median.x. We will compare Median.x to ShareWoman, where a correlation will determine that the share of women have an effect on median salary.
From the graph, we can see that for Median.x and ShareWoman has a distribution in which it dips in the center and is right skewed where the more share per woman decreases the median salary. This suggests that share of women has affected the median salary as the correlation here is not zero, which is represented by a curved line and not a flat horizontal line.
Second Graph
In order to see how the share of women affects the median salary, the variables of ShareWomen and Median.x and Major_category will be utilized in the graph.
From the graph we can see a negative relationship between median salary and the share of the women. It can also be observed that the major categories are grouped up with each other, separated from other groups where engineering at a higher median and at a lower share of women, and health at a lower median and at a higher share of women. Thus, this demonstrates that the different majors correlates to a different amount of women as share of total, which results in varying median salaries.
Now moving on to exploring the relationship between Major and Gender, we see that there seems to be a strong correlation between the category and the percentage of total students that are female. Particularly, we see that across the categories Computers & Mathematics and Engineering, we see a far lower percentage of female students than the other categories, or the naive assumption that the distribution would follow the population distribution - roughly equal. For Health, we see the reverse behavior, with a far higher percentage of females with that major than other categories. One particular major has a percentage close to 100% for women, which highlights the polarity of genders and majors.
Now examining the distribution of the percentage of females across different categories, we see that the distribution is fairly similar for all but one category. Biology & Life Science has an unusually tight distribution, with the majority of majors having a fairly similar ratio of men to women. This could perhaps indicate that this field is equally attractive and has the same opportunities for both men and women.
The plot is faceted by whether the share of women in the category of major is greater than 0.4 or not. It shows the median salary for each category. It shows that the median salary is larger for Engineering and Computers & Mathematics when the share of women is less than 0.4 than more, but less for Physical Sciences.
In order to explore different types of majors, we will split our data into stem majors and non-stem majors. There are 76 stem majors in our data, and 97 non-stem majors.
We can make a boxplot comparison of the median salary between samples of non-stem majors and stem majors.
Visually, it appears that non-stem majors have lower salaries on average. We can do a Chi-Squared test of independence as a statistical test to validate that this is true:
##
## Pearson's Chi-squared test
##
## data: df.all_ages$is_stem and df.all_ages$Median
## X-squared = 99.122, df = 57, p-value = 0.0004621
Since the p-value of 0.0004621 is below alpha at level 0.05, we reject the null hypothesis that median salary and the type of major (stem or not) are independent.
We can do a similar check to see if unemployment rates are different among stem and non-stem majors:
##
## Pearson's Chi-squared test
##
## data: df.all_ages$is_stem and df.all_ages$Unemployment_rate
## X-squared = 170.97, df = 171, p-value = 0.4863
Since the p-value of 0.4863 is not below alpha at level 0.05, we do not reject the null hypothesis that unemployment rate and the type of major (stem or not) are independent. This means that, although it appears visually in the boxplot that unemployment rates are lower for stem majors, we do not have enough statistical evidence in our data to say that this is true.
From the residual mosaic plot, it seems that there are more low salary samples than would be expected if all major categories had the same median salary in ‘Arts’, ‘Education’, and ‘Humanities’. There are more high salary samples than would be expected in ‘Business’, ‘Engineering’, and ‘Physical Science’. There are fewer low salary samples than would be expected in ‘Business’, and fewer high salary samples than would be expected in ‘Education’.
This suggests that ‘Arts’, ‘Education’, and ‘Humanities’, which are not stem majors, are not good majors to go into if you expect a high salary. ‘Business’, ‘Engineering’, and ‘Physical Science’, which are stem majors besides ‘Business’, would be good majors to go into.
Based off our analysis, we have noted that there are striking difference in the majors, especially in terms of salary, and gender disparities. First, we analyzed the gender wage gap and noted that there was a negative correlation between median salary and the percentage of women that studied a particular major. Further diving into the major distribution, we saw that proportion of women in certain fields was unusually high in Health, and much lower in fields like Computer Science, Mathematics and Engineering. Our last research question that focused on the relationship between STEM and non-STEM majors and the job prospects of these two fields. In general, STEM fields tended to have better job prospects with a higher median salary than non-STEM majors.
Overall, we can conclude that certain majors tend to have higher income, and that these higher income major have a lower proportion of women coming from them. Of course, further analysis could be done exploring this relationship and if the proportion of women in a major and median income of that major is causal. A further analysis could also be done within each major category as the sheer number of different majors did not make it feasible to do a deep dive within each major category and explore the majors individually. I think both of these would be relevant towards advancing our own study’s goal of understanding which major is right for students and how the majors relate to inequality in the work force.