Executive Summary

This study seeks to determine social factors associated with wealth inequality that are predictors of adverse events from substance use. The study was conducted utilizing data produced by University of Wisconsin’s Population Health Institute’s County Health Rankings & Roadmaps for 2023. The four predictor variables used were unemployment, mental health providers, primary care providers, and food insecurity. The two response variables used were drug overdose deaths and alcohol-related incidents. Preliminary exploratory data analysis was conducted to understand which geographic regions had the highest concentration of the selected variables as well as which variables had the strongest relationships with the response variables. Given the seemingly linear relationships explored in the exploratory data analysis, linear regression, regularization techniques, and a random forest model were compared, and the random forest model produced the most accurate predictions. From there, a variable importance model revealed that the most important predictors for drug overdose deaths are poor mental health days, insufficient sleep, and mental health providers, whereas the most important predictor for alcohol-impaired driving deaths is unemployment, controlling for demographic information. With this information, United Health Group (UHG) can implement targeted solutions to help reduce fatalities from substance use and expand into further studies in which demographic and individualized data is used.

Introduction

Question: Are there demographic and social factors that are predictors of drug overdose, alcohol-related incidents (e.g., driving accidents)?

Drug overdoses and alcohol impaired driving are actions that not only affect the well being of an individual, but also the community. United Health Group is a company that provides healthcare coverage along with other health oriented services. As a company with a primary focus on the well being of its customers, understanding and preventing the possible causes of an issue as widespread as substance use, can aid in reducing the effects, benefiting United Health group customers. To find possible predictors of drug overdoses and alcohol impaired driving, after preliminary exploratory data analysis, we hypothesize that unemployment, food insecurity, primary care physicians, and mental health providers could be possible indicators. We used County Health Ranking Data to verify or dispute these hypotheses as well as determine other indicators.

Motivation

As the COVID-19 pandemic placed massive pressure on the U.S. economy, the wealth gap between the rich and the poor intensified. In the years following, this economic inequality has continued to grow wider as “the rich get richer and the poor get poorer”. The poverty rate in the U.S. has also increased following a steady decline in the 2010s. At the peak of the pandemic in 2020, the poverty rate increased by a percentage point from 10.5% in 2019 to 11.4% (Khattar, 2022). It then increased again to 11.6% in 2021 and to 14.4% in 2022 (Huq, 2022). At the same time, there was a significant increase in retail alcohol sales during 2020, representing an increase by 20% from 2019 (Castaldelli-Maia, 2021). The U.S. population also dealt with the growing opioid epidemic (U.S., 2023). With these trends in mind, the U.S. healthcare system is challenged with treating and reducing substance abuse and overdose.

As these trends in wealth inequality, poverty, and alcohol and drug sales follow the same increasing pattern, it is necessary to understand if there are specific factors influencing the usage and abuse of substances. Because of this, we chose a set of predictor variables that we believe are affected by the wealth gap. With some substantive results, UHG can tailor its efforts to help reduce substance-related incidents.

Data

The data utilized comes from the University of Wisconsin’s Population Health Institute’s County Health Rankings & Roadmaps (“How healthy”, 2023).

The data collected comes from counties in each state with the addition of the District of Columbia. The information that was gathered includes figures that are meant to provide comprehension of the current and future health status of the county’s populations. The figures include statistics on the direct medical standing of the county populations, referred to as health outcomes, and the supporting environmental factors that possibly contribute, referred to as health factors. Demographic data is also included in the data set to provide background to the information given from each county.

For this report we focused primarily on social factors, or health factor data sets that could have a correlation to drug overdoses, alcohol impaired driving, and substance use in general. With these data sets we narrowed out specific variables to work as predictor variables and response variables.

Predictor Variables

Four predictor variables were chosen as the center of focus for this study. They are as listed below:

  • Unemployment: Percentage of population ages 16 and older unemployed but seeking work

  • Mental Health Providers: Number of mental health care providers per 100,000 of the population

  • Primary Care Physicians: Number of primary care physicians per 100,000 of the population

  • Food insecurity: Percentage of population who lack adequate access to food

Response Variables

The following variables were chosen to be compared against the predictor variables as they represent substance-related fatalities:

  • Alcohol-Impaired Driving Deaths: Percentage of driving deaths with alcohol involvement

  • Drug overdose deaths: Number of drug poisoning deaths per 100,000 population

Exploratory Data Analysis & Data Summary

The following choropleth maps showcase regions of the continental United States where the predictor variables are concentrated.

Drug Overdose Deaths

Drug overdose deaths appear to have the highest concentration in states such as West Virginia and Maryland, among others.

Alcohol-Impaired Driving Deaths

Alcohol-impaired driving deaths appear to have the highest concentration in states such as Montana and North Dakota, among others.

Unemployment

Unemployment appears to have the highest concentration in states such as California and New Mexico, among others.

Mental Health Providers

Mental health providers appear to be concentrated in states such as Massachusetts, among others.

Primary Care Physicians

Primary care providers appear to be concentrated in states in the New England region, among others.

Food Insecurity

Food insecurity appears to be concentrated in the Southern region, in states such as Mississippi, among others.

Unemployment

Rising unemployment rates may contribute to higher substance use rates because of the economic and mental effects of job loss. This is a major risk factor for addiction and abuse which can lead to increased drug and alcohol related deaths. We hypothesized that a higher rate of unemployment would cause an increase in substance abuse deaths.

Unemployment vs Drug Overdose Deaths

The ‘unemployment’ vs ‘drug overdose deaths’ scatter plot represents a positive relationship between the two variables for the majority of the data. This means as the unemployment percentage increases, the amount of drug overdose deaths increases as well. The line and confidence band in the plot were generated by a smoothing spline, which is a flexible machine learning estimator. We use the lines and bands as visual aids to highlight qualitative trends, such as positive or negative relationships, shown by the scatter plots themselves. There are outliers in the data set that weaken the regression relationship and make the graph appear less accurate.

Unemployment vs Alcohol Impaired Driving Deaths

The ‘unemployment’ vs ‘alcohol impaired driving deaths’ scatter plot does not represent a clear relationship between the two variables. As the unemployment percentage increases, the amount of alcohol impaired driving deaths does not increase in a positive fashion.

Mental Health Providers

Mental Health Providers offer interventions and support that target underlying factors that contribute to both drug overdose and alcohol impaired driving deaths. We hypothesized that more mental health providers in a specific area would lead to less deaths due to substance abuse. The lack of counties that have an adequate amount of mental health providers should be noted when observing the graph.

Mental Health Providers vs Drug Overdose Deaths

The ‘mental health providers’ vs ‘drug overdose deaths’ scatter plot does not show a strong relationship between the two variables. Since majority of the data set has less than 1% mental health providers, the true impact that these providers have on the drug overdose can not be seen.

Mental Health Providers vs Alcohol Impaired Driving Deaths

The ‘mental health providers’ vs ‘alcohol impaired driving deaths’ scatter plot does not show a strong relationship between the two variables.

Primary Care Physicians

Primary Care Physicians are important because they help to provide essential information and treatment that can save the lives of patients dealing with substance abuse. We hypothesized that more primary care physicians in a certain area would lead to a decrease in drug overdose and alcohol impaired driving deaths.

Primary Care Physicians vs Drug Overdose Deaths

The ‘primary care physicians’ vs ‘drug overdose deaths’ scatter plot shows a weak relationship between the two variables. As the percentage of physicians increases, the drug overdose deaths appear to decrease.

Primary Care Physicians vs Alcohol Impaired Driving Deaths

The ‘primary care physicians’ vs ‘alcohol impaired driving deaths’ scatter plot shows a weak, minimal relationship between the two variables.

Food Insecurity

Food insecurity is an important variable to observe because it can cause an individual to develop habits that lead to drug overdose and alcohol impaired driving deaths. These habits can arise from the consequences that come with having an inadequate amount of food to eat. These consequences include stress and malnutrition which can cause an individual to use extreme methods to cope. We hypothesized that the greater the food insecurity in a certain area, the greater the amount of substance abuse deaths.

Food Insecurity vs Drug Overdose Deaths

The ‘food insecurity’ vs ‘drug overdose deaths’ scatter plot shows a positive, strong relationship between the two variables. This means as the food insecurity percentage increases, the drug overdose deaths increases as well.

Food Insecurity vs Alcohol Impaired Driving Deaths

The ‘food insecurity’ vs ‘alcohol impaired driving deaths’ scatter plot shows a weak, minimal relationship between the two variables based on the data points on the graph.

Methods

Before we can find the best factors to predict drug overdose deaths and alcohol impaired driving deaths, we need to determine which method of prediction works best with the data. To accomplish this, we explored linear regression, regularization techniques, and a random forest.

Linear Regression

Linear regression assumes a linear relationship between the predictor variables and outcome variables and estimates this relationship by minimizing the least squares error. Linear regression is a very common model in statistics and machine learning. It has many benefits, including ease of interpretability, but it is often too inflexible to capture nuanced relationships between variables.

Regularization

Regularization techniques can improve on linear regression by reducing the impact of overfitting and collinearity. Both lasso and ridge regression alter the linear regression function by introducing a penalty. Lasso regression excludes less relevant variables, whereas ridge regression discourages large coefficients to limit the impact of outliers. Elastic net regression combines both lasso and ridge regression.

Random Forest

The root of the random forest is the decision tree: partition the data into similar subgroups, then meet certain conditions until a stopping criteria is reached. Random forests take the average of the decision trees used in a training phase to produce a final model, thereby reducing overfitting of the data. Due to their accuracy, robustness, and ease of use, random forests are among the most popular machine learning tools in use today.

Comparing the predictive models

The figure shows the prediction error for the five estimators described above, which we can use to decide which model to focus on subsequently. For this purpose, we calculated the out-of-sample root mean squared error of each estimator using cross-validation with five folds. The point and whiskers show the average root mean squared error and 95% confidence interval for each estimator. We can see that random forest is the best.

As the random forest model outperformed linear regression and regularization by producing lower root mean squared error values, we will use a random forest model to determine the best predictors for drug overdose deaths and alcohol impaired driving deaths.

Data Cleaning

As the data set had many missing values altering the outcome of the predictive model, we decided to remove factors with more than 50 missing values for a factor. This left us with a new data set with 2635 counties with 63 variables for each.

Results

One output from the random forest is a variable importance model. This model shows which variables are the most important in making predictions, such as predicting drug overdose deaths; each variable’s importance is determined by analyzing how much the random forest’s predictive ability deteriorates in the absence of the given variable.

The most significant predictor of drug overdose deaths is poor mental health days, and the number of mental health providers (3) is the best predictor among the four. Given that frequent mental distress (9) is also in the top 10 predictors, a focus on improving access to mental health care could help reduce the risk for drug overdose deaths.

Concerningly, four of the top five predictors for alcohol impaired driving deaths are demographic factors, with American Indian or Alaska Native far and away the most important predictor. Of the four variables we focused on, unemployment is significantly more important in predicting alcohol impaired driving deaths. A dual focus on decreasing unemployment and increasing minority access to resources to combat alcohol-related issues could decrease the incidence of these events.

Conclusion

The random forest outperformed both linear regression and the regularization techniques.The most important predictors for drug overdose deaths are poor mental health days, insufficient sleep, and mental health providers.The most important predictors for alcohol impaired driving deaths are American Indian or Alaska Native, unemployment, and Asian.

Limitations

The data set utilized provided generalized data for each predictor variable. It contained demographic categories for a few of the variables, which were not the ones selected of interest. Therefore, demographic data could not be explored as part of this study. Additionally, the figures from the variables are influenced by state level effects. These effects may overestimate the figure differences between counties from different states. This can make comparing variable of counties from different states less reliable than in state county comparisons. Additionally, for the variable unemployment, the statistical model used to collect data can vary from state to state.

Future Work

For this study, we primarily hypothesized possible socioeconomic predictors of substance use, specifically focusing on the factors that to some capacity reflect the attributes of the wealth gap in the country. For future studies, analysis could be done on other socioeconomic factors such as a emphasis on race, marital status,political ideology and completion of higher education.This study could be adapted to look at demographic categories within each predictor variable to gauge and develop more specific solutions to help the most vulnerable communities.

References

Castaldelli-Maia, J. M., Segura, L. E., & Martins, S. S. (2021, November). The concerning increasing trend of alcohol beverage sales in the U.S. during the COVID-19 pandemic. Alcohol (Fayetteville, N.Y.). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8421038/#:~:text=There%20was%20a%20significant%20increase,the%20same%20period%20in%202019.

How healthy is your county?: County Health Rankings. County Health Rankings & Roadmaps. (2023). https://www.countyhealthrankings.org/

Huq, S. (2022, July 25). 3.4 million more children in poverty in February 2022 than December 2021. Columbia University Center on Poverty and Social Policy. https://www.povertycenter.columbia.edu/news-internal/monthly-poverty-february-2022#:~:text=Monthly%20poverty%20remained%20elevated%20in,for%20the%20total%20US%20population.

Khattar, R., Pathak, A., Schweitzer, J., Khan, A., & Chang, R. (2022, December 15). Data on poverty in the United States. Center for American Progress. https://www.americanprogress.org/data-view/poverty-data/?yearFilter=2021&national=2021

U.S. Department of Health and Human Services. (2023, July 10). Drug overdose death rates. National Institutes of Health. https://nida.nih.gov/research-topics/trends-statistics/overdose-death-rates