The Socioeconomic Roots of Racial Disparities in Hospitalizations

Authors

Jainiah Harden

Nicole Sorensen

Cristina Antonacci

Macey Kalmanek

Published

July 26, 2024

Introduction

Every year, millions of Americans are admitted to hospitals for conditions that could have been prevented. In 2017 alone, $33.7 billion was spent on 3.5 million adult inpatient stays that might have been avoided with proper and timely outpatient care (McDermott 2020). These preventable hospital stays are not just a financial burden, but also a key measure of healthcare quality, often highlighting gaps in care that lead to worse health outcomes. 

Understanding the role of socioeconomic factors in influencing preventable hospital stays for different racial groups can help identify health disparities and provide insights that can inform public health policies and develop targeted interventions. 

Motivation

A significant portion of US healthcare expenditure is devoted to hospital care, with billions spent on preventable hospital stays each year. These stays often result from diseases that can be effectively managed with appropriate outpatient care, such as diabetes, asthma, and hypertension. Beyond the financial burden, these hospitalizations cause considerable patient suffering and reduced quality of life. 

Preventable hospitalizations are particularly common among Medicare beneficiaries. Individuals aged 65 and older had a rate more than 12 times higher than those aged 18 to 44 (McDermott 2020). Additionally, 17–20% of Medicare patients were readmitted within 30 days, with 76% of these hospital readmissions considered potentially preventable. Disparities also exist among racial groups. Asian/Pacific Islander patients had the lowest rate of potentially preventable hospital stays at 580.9 per 100,000 population, while Black patients had the highest rate at 2,572 per 100,000 (Abt Associates 2023).

While these racial disparities in preventable hospital stays are well-documented, a less explored area is what underlying factors contribute to these disparities and how the determinants of preventable hospital stays differ across racial groups. 

As areas in the United States have grown more unequal, researchers have published studies linking income inequality to worse health outcomes, with many of them finding a causal relationship between income inequality and health (Pickett 2014). Income inequality can reduce social cohesion and increases chronic stress, adversely impacting health outcomes, especially for African Americans (Avancena 2021).

Education also plays a significant role in health disparities (Lee 2016). High school completion affects literacy and access to resources, enabling people to take preventative health measures. Investigating racial differences in the effect of high school completion on preventable hospital stays may be key to developing targeted interventions to improve health outcomes. Similarly, unemployment can often reduce economic stability and lead to loss of health insurance which may harm some minority groups more than others.

For these reasons, the aim of this paper is to understand how income inequality, high school completion, and unemployment affect preventable hospital stays for certain races. Doing so will help guide targeted interventions to improve health outcomes. 

Data

The dataset we used is sourced from the County Health Rankings and Roadmaps. This dataset contains a number of variables pertaining to county-level healthcare outcomes, racial demography, and socioeconomic factors.

Response Variable

Preventable Hospital Stays: The number of hospital stays for ambulatory-care sensitive conditions per 100,000 Medicare participants.

Explanatory Variables

  • Income Inequality: The ratio of household income at the 80th percentile to income at the 20th percentile. This measures the distribution of income across the population within each county.

  • High School Completion: The percentage of adults 25 and older who have received at least a high school diploma or equivalent.

  • Unemployment: The percentage of unemployed people seeking work over 16 years of age.

  • Percent Non-Hispanic White: The percentage of people who identify as non-hispanic White.

  • Percent AIAN: The percentage of population who identify as American Indian or Alaska Native.

  • Percent Asian: The percentage of the population who identify as Asian.

  • Percent Black: The percentage of the population who identify as Black or African American.

  • Percent Hispanic: The percentage of the population who identify as Hispanic.

High school completion

First, we wanted to observe the relationship between high school completion rates and rates of preventable hospital stays. It is important to understand that the variable high school completion does not only mean someone graduating directly from high school, but also someone completing a high school completion program such as a G.E.D. 

This scatterplot shows the relationship between high school completion rates and preventable hospital stays for the various racial groups. High school completion rate is plotted on the x-axis, and log-transformed preventable hospital stays are plotted on the y-axis. The racial groups are color-coded. We see that racial groups with higher higher school completion rates have lower preventable hospital stays, except American Indian / Alaskan Native. One explanation for this relationship could be that with higher school completion rates, the more likely these people are to have knowledge about what their injury is and what actions can be taken than others without completing some form of high school. 

Now there is a question of what factors go into one not completing a form of high school? While possible factors can include working to support a family or economic disparities, our dataset provided teen birth rates as a variable. 

This graph is a scatter plot showing the relationship between teen birth rates at the county level and high school completion rates. Each point on the graph represents data from a specific county. The x-axis shows the high school completion rates, ranging from 0.6 to 1.0, with 1.0 representing 100% completion. The y-axis shows teen birth rate at the county level, ranging from 0 to 80 births per 1,000 people. We see a clear negative trend, suggesting that counties with higher high school completion rates tend to have lower teen birth rates. 

Income Inequality

This next plot shows the relationship between income inequality and preventable hospital stays for each racial group. We applied a log transformation to preventable hospital stays to compress the range of its values so the patterns could be better visualized. For every racial group, we see a positive trend except for American Indian / Alaskan Native people where the direction of the trend is unclear. This suggests that counties with higher income inequality also tend to have more preventable hospital stays. 

Given that we are working with county-level data, it is useful to identify regions of the country with high levels of income inequality and preventable hospital stays to highlight areas that may benefit from targeted interventions.

This is a choropleth map of counties in the United States. Darker teal indicates higher levels of income inequality, while darker magenta indicates higher rates of preventable hospital stays. We see that the southeast region tends to have more dark purple counties, revealing that these counties have both high levels of income inequality and high preventable hospital stays. This suggests that these dark purple counties may be areas to focus when studying the effect of income inequality on preventable hospital stays.

Unemployment

When looking at the relationship between unemployment and preventable hospital stays, we see that the relationship is not the same for all races. For American Indian / Alaskan Native people, we see that there appears to be a decreasing trend, suggesting that counties with more people unemployed also have lower preventable hospital stays. This is unexpected and could indicate that there may be specific interventions or cultural factors driving this trend. This could also be due to the relatively small sample size of this racial group compared to the others which may make the trend more susceptible to outliers. For the other races, we see a positive trend where counties that have more people unemployed tend to have more preventable hospital stays.

Methods

Our project’s objectives center around interpretability so we want to use models that will help us determine the directions and strengths of the relationships between our predictors and our response variable, preventable hospital stays. Furthermore, we suspect some potential nonlinear relationships. For instance, one might assume that small changes of unemployment at low levels may have large effects on preventable hospital stays, however the effects of those changes might start to decrease at higher levels of unemployment. Therefore our data might not satisfy the linear assumption of linear regression, so we chose to use a Generalized Additive Model which maintains interpretability of effects of predictors, while being flexible enough to accommodate nonlinear effects. 

Generalized Additive Models (GAMs)

Generalized Additive Models, also known as GAMs, are a generalization of linear regression, where instead of coefficients, the response is a combination of smooth functions of each predictor.

f(x) = \beta_0 + s_1(x_1) + s_2(x_2) + \dots + s_n(x_n)

These assume that the effects of the predictors are additive, meaning that the combined effect of all the predictors on the response can be expressed as a sum of the individual effects of each predictor on the response. To evaluate this assumption, we used 5-fold cross validation to compare the GAM model to a random forest model which is a non-additive model. The RMSE for the GAM model was 974.83, while the RMSE for the random forest was 945.47 which is about a 3% improvement. Since the non-additive model did not have a large performance jump, we decided that an additive model was sufficient. Other models assumptions include homoscedasicity and normality of errors which we determined were reasonably valid, as shown in the appendix. One issue was that the errors showed some slight signs of dependence, however we decided to proceed with the GAM since the violation was not major.

Random Forests

We also want to figure out which of our predictor variables are more important than the others in predicting preventable hospital stays. GAMs do not have a built-in method to calculate variable importance, so we decided to go with a random forest which also has the flexibility to capture nonlinear effects and has a simple measure of variable importance based on impurity. 

Since we are interested in discovering disparities in preventable hospital stay dynamics for different races, we decided to train one model predicting preventable hospital stays for all races and including percent of each race in that county as variables in our model.

Another objective of our project is to determine how the effects of income inequality, unemployment, and high school completion on preventable hospital stays differ for each race. So for this we trained 5 separate random forests predicting preventable hospital stays for each race and examined the relative variable importance of our predictors.

Results

Modeling preventable hospital stays using GAM:

We trained a generalized additive model using 5-fold cross validation to tune the model’s hyperparameters. Then, we visualized the effects of each predictor on the response variable using partial effects plots.

Each of these three plots shows the individual effect of each of our predictors (income inequality, unemployment, and high school completion) on preventable hospital stays, holding all other variables constant. The light gray region around the lines represents the 95% confidence intervals, however it is important to note that for the regions where the data is sparse as indicated by the rug plot, we can not be sure of the trend. 

For the first plot, we see that income inequality has no effect at lower levels, but after an income inequality ratio of about 5, the partial effect becomes positive. This suggests that there is a positive relationship between income inequality and preventable hospital stays. 

With unemployment, we see that low levels of unemployment have a positive partial effect with no clear effect at higher levels, suggesting that when very few people are unemployed, there are more preventable hospital stays. This may be because people are busy at work and don’t have as much time for preventative health measures or because after a certain level of unemployment, lots of people would go on government assistance programs which may provide more access to care services. 

For high school completion, at low levels the data is sparse making it hard to determine the trend, however after about 80% completion rate, we see that high school completion rates decrease preventable hospital stays.

Since different racial groups may have varying access to healthcare services and encounters with healthcare professionals, we also included race percentages in our models to help to account for these differences.

For both races, we see a general decrease in the partial effect until about 75%, after which there is a sharp increase. This nonlinear trend suggests that there may be underlying factors like income and access to care that may influence preventable hospital stays. The average median household income of counties that have greater than 75% Hispanic or black people is significantly lower than the average median household income for the country. This suggests a potential income effect where areas under more economic strain tend to have more people going to the hospital for preventable conditions.

For the remaining races, the trends are less certain. For the percent of Asians, we see that as there are more Asians in a county, there are fewer preventable hospital stays, although the sparsity of the data at higher percentages and the wide confidence intervals makes us uncertain about this trend. Similarly for percent native hawaiian or other pacific islander, we see very wide confidence intervals, making it hard for us to determine the effect, although at low levels the partial effect is close to 0, suggesting that there is not much of an effect on preventable hospital stays. For American Indian / Alaskan Native, there is still a lot of uncertainty, but it appears that counties with higher percentages of American Indian / Alaskan Native have higher preventable hospital stays in general. 

Variables most important for predicting preventable hospital stays

We trained a random forest using income inequality, unemployment, and high school completion, along with race percentages in order to determine relative feature importance. Training was done using 5-fold cross validation. The following plot displays the relative importance of our variables in predicting preventable hospital stays.

We see that high school completion is the most important variable in predicting preventable hospital stays. This seems to make sense, since more people completing high school generally leads to people securing better-paying jobs (Social Security Administration 2015), allowing them to have health insurance and afford healthcare services. There could also be an effect of greater health literacy as people are more likely to understand health advice, follow medical advice, and effectively navigate the healthcare system.

The next most influential variable in predicting preventable hospital stays is the percent of a county that is black. There are a variety of possible explanations for this, one being a disparity in access to care, where African Americans may face economic or systemic barriers to care which could cause delays in treatment (Connell 2019). It is also possible that percent black is highly correlated with other factors like income or education which could be the driving factors in increasing preventable hospital stays.

These results generally point to a need to focus resources on raising high school completion rates and investigating potential barriers to care in largely African American communities. To further investigate what factors most influence preventable hospital stays for different races, we move on to train 5 separate random forests for each race and display the following variable importance plots.

Each bar plot above shows the relative variable importance in predicting preventable hospital stays for each race. We see that for white people, high school completion is clearly the most important factor in predicting preventable hospital stays. One potential explanation for this could be that high school completion may be closely related to socioeconomic status (Ferguson 2007), where people with higher high school completion are more likely to escape the cycle of poverty which may significantly affect health outcomes. White people living in poverty may have less access to support systems or community resources compared to other minority races which might have stronger community networks or organizations aimed at supporting them. As for why high school completion is relatively most important in black communities, many black people have faced historical barriers and challenges that have led to persistent poverty and lower high school completion rates which can result in worse health outcomes (Evans 2015).

For Hispanic people, unemployment and high school completion appear slightly more important than income inequality. One reason why unemployment may be more important is that employment is often directly tied to health insurance coverage in the U.S. and Hispanic people tend to rely on employment for health insurance (Baumgartner 2023), which can cause loss of jobs to more negatively impact access to preventative health care compared to other races. Hispanic culture also often emphasizes strong family support networks and losing employment could not only lead to loss of income but also disrupt the support from the social networks which could make health outcomes worse. 

For Asians we see that income inequality has the highest variable importance, suggesting that the effect of income inequality on preventable hospital stays is higher than the effects of the other variables. This may be because Asians tend to live in cities more than other racial groups and cities tend to have more income inequality in general (Kochhar 2018). 

For American Indian or Alaskan Native we see that income inequality, unemployment, and high school completion are equally important in predicting preventable hospital stays. This could be a result of limited data for this race, as there are only 306 counties that reported preventable hospital stays for American Indian or Alaskan Native people.

Discussion

Our analysis of how income inequality, unemployment, and high school completion impact preventable hospital stays found the following insights:

  1. Counties with higher income inequality tend to have higher rates of preventable hospital stays, meaning as societies become more unequal, more people are going to the hospital for preventable conditions.  

  2. When almost everyone has jobs, even a small increase in unemployment can lead to more preventable hospital stays. However, when many people are unemployed, changes in unemployment do not make much of a difference in preventable hospital stays.

  3. As more people in a county complete high school, fewer individuals go to the hospital for preventable conditions. 

Looking at how race percentage influences preventable hospital stays, we saw some nonlinear, unexpected effects. We found that counties with very high proportions of black and hispanic people tend to have more preventable hospital stays, suggesting that these communities may face unique challenges in quality healthcare and preventative services.

When looking at how our variables predict preventable hospital stays separately for each race, we found that high school completion is one of the most important variables for all races except Asians, for whom income inequality is the most important. These results suggest that potential underlying factors such as poverty, health insurance, or urbanization may be driving these differences in variable importance between the races.

Limitations

Our research question asked us to look into the effects of income inequality, high school completion, and unemployment on preventable hospital stays for each race, however, we do not have data on these variables for each race. This may have been useful for something like high school completion, where knowing the high school completion rates for a certain racial group would help us better understand the relationship between high school completion and preventable hospital stays for this group of people. 

A second limitation of our study is that preventable hospital stays were only recorded for medicare enrollees, meaning we only have data for this variable on people who are 65 and older. This means that we can’t generalize our findings to younger populations, and further research would be needed to investigate if the trends we found carry through for younger people. 

Additionally, with regards to modeling, some of the model assumptions for our Generalized Additive Model may not be adequately valid such as the independent errors assumption, which increases the uncertainty about how well the form of the model captures the data. 

We also recognize that other variables besides the three we examined might better predict preventable hospital stays such as access to healthcare services, health behaviors, community support, and other socioeconomic factors like poverty rate and housing stability. Future research could include a similar analysis of these variables to see if there are differences in how these impact preventable hospital stays for different racial groups. 

Next steps:

1. Investigate appropriateness of model

Our analysis revealed that the independent errors assumption was not fully satisfied, suggesting the presence of missing predictors that could better explain the patterns in the errors. We will explore additional predictors and consider models with interaction terms to improve model performance.

2. Examine other influencing factors

To get a better understanding of what factors are impacting preventable hospital stays for each race beyond the three we looked at in this analysis, we will investigate additional factors. These may include access to healthcare, health behaviors, community support, poverty rate. Incorporating these variables could enhance the predictive power of our models and identify actionable strategies to improve health outcomes.

3. Expand age range of data

Since preventable hospital stays were only recorded for people 65 and older, we could incorporate data from Medicaid, private insurers, or hospital records in order to see if the trends we observed generalize to younger populations. 

4. Examine geographic differences

We briefly explored regional differences in preventable hospital stays and income inequality with our choropleth map. With future research we would like to further investigate these patterns with our other socioeconomic variables as well to see if there are substantial regional differences. This would help identify regions where the effects are more prominent which could help guide distribution of resources and interventions to improve health outcomes.

References

Abt Associates. (2023, January). Specifications for the home health within-stay potentially … Centers for Medicare & Medicaid Services. https://www.cms.gov/files/document/hh-qrp-specificationspotentiallypreventablehospitalizations.pdf

Avanceña, A. L. V., DeLuca, E. K., Iott, B., Mauri, A., Miller, N., Eisenberg, D., & Hutton, D. W. (2021, August 31). Income and Income Inequality Are a Matter of Life and Death. What Can Policymakers Do About It? https://ajph.aphapublications.org/doi/10.2105/AJPH.2021.306301

Baumgartner, J. C., Collins, S. R., & Radley, D. C. (2023, March 16). Inequities in health insurance coverage and access for black and Hispanic adults. Inequities in Health Coverage and Access Black and Hispanic Adults | Commonwealth Fund. https://www.commonwealthfund.org/publications/issue-briefs/2023/mar/inequities-coverage-access-black-hispanic-adults

Connell, C. L., Wang, S. C., Crook, L., & Yadrick, K. (2019, August). Barriers to healthcare seeking and provision among African American adults in the rural Mississippi Delta region: Community and Provider Perspectives. Journal of community health. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612316/

Evans III, A. C. (2015, September 4). A historical overview of the challenges for African Americans … Digital Commons. https://digitalcommons.fiu.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1454&context=sferc

Ferguson, H., Bovaird, S., & Mueller, M. (2007, October). The impact of poverty on educational outcomes for children. Paediatrics & child health. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2528798/

Kochhar, R. (2018, July 12). Key findings on the rise in income inequality within America’s racial and ethnic groups. Pew Research Center. https://www.pewresearch.org/short-reads/2018/07/12/key-findings-on-the-rise-in-income-inequality-within-americas-racial-and-ethnic-groups/

Lee, J. O., Kosterman, R., Jones, T. M., Herrenkohl, T. I., Rhew, I. C., Catalano, R. F., & Hawkins, J. D. (2016, October). Mechanisms linking high school graduation to health disparities in young adulthood: A longitudinal analysis of the role of health behaviours, psychosocial stressors, and health insurance. Public health. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5061606/

McDermott, K. W. (2020, June 16). Characteristics and costs of potentially preventable inpatient stays, 2017. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs [Internet]. https://www.ncbi.nlm.nih.gov/books/NBK559945/

Pickett, K., & Wilkinson, R. (2014, December 30). Income inequality and health: A causal review. Social Science & Medicine. https://www.sciencedirect.com/science/article/pii/S0277953614008399?via%3Dihub

Social Security Administration. Research Summary: Education and Lifetime Earnings. (2015, November). https://www.ssa.gov/policy/docs/research-summaries/education-earnings.html

Appendix

Evaluating GAM model assumptions

The QQ-plot suggests a slight deviation from normality at the upper tail, but since the deviations are not major, we can conclude the assumption is reasonably valid. The residuals vs linear predictor plot shows a relatively even spread around 0, validating the homoscedasticity assumption.