Demand for institutions of higher education has never been higher. Federal and local governments worked together to increase access to these institutions. With many of these programs, there has been questions on their effectiveness as well as where can one get the best education without being loaded with lifelong debt. We took a data driven approach to explore the complex relationship between outcome and institution while trying to account for as many confounders as possible. To better tackle the complex problem, we broke it up into a few different questions:
How do public and private schools differ, in terms of academic and economic traits?
What is the relationship between a school’s PELL grant fraction and their Loan reception?
How does the cost of an university relate to its student debt and is it different across university types?
This dataset was obtained from the Department of Education, and it outlines the many facets of each college in the United States, providing information ranging from average SAT scores to faculty income. The dataset containing the most recent data on all United States institutions is substantially large and contains numerous variables, so before we tackle our research questions, we will need to narrow down the scope of our dataset to extract the most explanatory variables relevant to our study. After conducting analysis that will be outlined below, we chose to focus on the variables concerning admission rate, cost, type of university (for-profit, private, public), average SAT score, the proportion of students receiving Pell grants, the proportion of students receiving federal loans, median income 10 years after graduation, average faculty salaries, and median student debt. Aside from the type of university, which we will treat as a categorical variable, all the other variables outlined above will be treated as continuous variables. Additionally, we will focus on median household income, poverty rate, and percentages of low income families to further analyze the causes and effect of student debt. We will treat these three variables as continuous variables as well.
We first explored how SAT scores differed geographically across public and private institutions. Note that for this part we combined for-profit types into private since for-profit institutions are also private ones.
We can see that private institutions in Washington state, Florida, and Pennsylvania has higher SAT scores, as well as Minnesota. Public institutions tend to have lower average SAT scores in general. However, it is more interesting to note states that flip colors between public and private institutions - for example, public institutions in California tend to be on the upper end of SAT scores, but private institutions there have one of the lowest average SAT scores. Pennsylvania flips the other way, with high SAT scores for their private institutions and low SAT scores for their public ones. We can conclude that while the general trend is that private institutions tend to have higher SAT scores, these scores vary greatly between states across the country.
Then, we explored how median income differs between institutions across the country
In terms of median household income, the distributions are extremely similar between private and public institutions. We can see that the few states with higher SAT scores also have higher median household income, and Vermont has extremely low median household income. However, unlike the distribution for SAT scores, the colors between the two maps are very similar across almost all the states. We do see that the map for private institutions does lean a bit more to the right of the spectrum, though, suggesting that while the distribution between states is similar between the two types of institutions, private institutions have a higher median household income in general, which is expected.
The following graph focuses the question of cost of an institution. The cost of the an institution is one of the more important variable for a prospective student. Our questions asks about how the relationship between cost and debt and control plays out. This graph takes a slightly different spin on that. The faculty salary shows how much the school is investing in its education. This graph takes that perspective and tries to visualize the relationship between the average faculty salary and the cost of the of the college.
Here, we can see the relationship between average faculty salary and cost across the three control types along with a loess curve and corresponding confidence bounds. Private colleges as expected has the largest cost. Though interestingly, its slope is not as steep as public colleges so as the cost increases the average salary is not increased by much. This means that the price of the college is not going to salaries as a proportion as much as the public colleges. Though past the $60,000 mark for cost the salaries really start to increase by what it looks like an exponential relationship. Expectedly this would be the top colleges, and these colleges would be competing with each other to maintain prestigious faculty. Therefore, we see a large jump. Another interesting note is the shape of the for profit college, it actually seems to be convex. This is most likely due to the one outlier with a very high cost but low salaries. The for-profit group does have what seems the lowest slope, which is somewhat expected. The goal of this institution is different from the other two. This one is to optimize cost, and should be on the side of cost saving when it comes to expenditures.
With the geographic map we explored average SAT but we have not seen relationships between on debt plays in these universities. The following graph demonstrates the relationship between debt and net price across the types universities.
Although the main relationship we wish to highlight with this chart is the one between median student debt and the type of university, creating a scatter plot between the cost of the institution and the median debt of its students allowed for clearer examination of this relationship. Although the relationship between median debt and the cost of the institution may be fairly trivial, as higher costs generally lead to more debt, the trends in fact differ across university types. As cost increases in public universities, the median debt soars, while increasing costs in private universities creates a much gentler slope. While the trend line for-profit universities seem to find a middle ground between public and private, many outliers seem to exist in the data for for-profit institutions, so we remain skeptic about its true relationship.
Between public and private universities, many tend to choose public universities when cost is a large factor, as public institutions have generally cheaper tuition, which is discounted even more if a student is in-state. This phenomenon may lead to another hypothesis that those who do end up attending private institutions can, to some degree, afford to pay for the costs. Although this may not be true in all cases, in the graphs above, for private institutions, median debt does not necessarily increase with cost, which may show that the median debt trend in private universities is not as high as the trend in public universities possibly because those who attend private universities are usually more able to afford its costs.
Since there are so many different confounders with different effects, the best way is to try a dimension reduction technique to bring the data from a high dimensional space into two or three dimensions for us to better visualize. Principle component analysis, was chosen create variables that would maximize the variation from the higher dimensions.
The elbow plot seems to show the bend is at 2 principle components, so that was used for plotting.
This plot is the plot of the 2 PCA vectors that explain the most variation, plotted with the control group of the college. We can see kind of 2 main clusters form public versus private. The number of for-profit colleges are quite low so we can’t really see clustering for them, though it seems that they are in between public and private. It seems that the most defining differences between the two groups is admission rate, cost, and federal loan proportion. Though these three variables don’t account for everything, they seem the most orthogonal to the division between the two clusters. Higher admission rates seems to indicate public university. On the other hand, increased cost seems to make a college more likely to be private. What is interesting to us is that the increase in average SAT does not seem to move us closer to either private or public cluster. Also interesting is that average SAT and pell grant proportion seems to be negatively correlated, with schools with high sat usually having low pell proportion and vice versa.
Median income, which measures outcome in our data set, also doesn’t seem to split the clusters apart. Thought it does show a new cluster of private schools that have some of the higher median incomes. This cluster also has some of the highest average SAT scores couples with low Pell Grant proportions and admission rates. If one had to guess these would be the Ivy League and similarly competitive schools.
The graph above shows the relationship between a college’s student body proportion of Pell grant recipients and the student body proportion of federal loan recipients. Our question asks about the relationship between the two types of assistance a student receives. The federal Pell grant does not have to be repaid, while the normal federal loan does. The graph above tries to portray the relationship between these two types of aid.
The relationship shown above is given by a loess curve and a confidence bound. The loess curve portrayed an interesting relationship that a linear model could not capture, due to the high proportion of schools with with 0% of their student body receiving federal loans. For profit institutions seem to have a positive relationship between their Pell grant fraction and their federal loan proportion. The relationship however inverts once the Pell grant percentage hits 100%, indicating that students at for profit schools seem to not pursue federal loans as often as they would in the middle percentages. But both options seem to be equally valued for the student. Private institutions exaggerate this trend with the dip occurring earlier in the graph. There also seems to be more emphasis on pursuing federal loans over Pell grants, as seen with the steeper slope. Finally, public schools seem to have a higher proportion of Pell grant recipients when compared to federal loan recipients. This might show that federal loans are less popular compared to Pell grants at public institutions.
Our graphs found that the general trend of SAT scores is that private institutions tend to have higher scores, although these scores vary greatly between states across the country. We also found that median household income conversely does not have much state to state variability, but private institutions still have higher median household income in general. Another conclusion we drew was that higher cost of a university translated in higher faculty salary to a certain degree (which indicates how much a university is investing in its education). Additionally, we found that medium debt does not necessarily increase with cost in private universities possibly due to the fact that students and private universities are more likely to be able to afford its costs. The PCA analysis found that higher admission rates indicate public universities, higher costs indicates private universities, and finally SAT scores and pell grant proportion are negatively correlated. Finally, we found that although federal loans and pell grants are positively correlated, private institutions have more federal student loans vs pell grants when compared to public institutions.
Future studies need to distinguish the endowments of these universities, because controlling for university wealth may explain more of the cost / value of these universities. A measure for networking can also be investigated to better understand the job prospects for students leaving American universities.