Examining Premature Death at the County Level: The Role of Alcohol, Income, and Education
Introduction
Gun violence, pandemics, addiction play massive roles in our health outcomes and length of life. More indirectly, our lifespans are possibly impacted by challenges such as poor schooling, unemployment, and income inequality. We wish to investigate how all of these factors, whether driven by natural disease or human intervention, are related to cumulative premature death in the US. By premature death we mean the years of life not attained relative to the average life span on a per capita basis. We in particular examine the impact of excessive drinking, high school completion rates, and suicide on the level of premature deaths at the county level.
We found that there was a negative correlation between excessive drinking and premature deaths based on geographical trends. We also quantified how household income and age impact one of the key causes of premature death, namely suicide. Additionally, we found inconsistent relationships between high school completion rates and premature death, but conjecture a better measure of education that takes into account quality would show the expected negative correlation. For each of these variables we also studied the geographic distribution and highlighted some key regional characteristics, e.g. that of the Bible Belt.
Throughout this investigation, we ultimately saw inequality as a unifying theme in premature deaths. Whether inequality in access to high quality education or ability to afford preventative health care, these key differences are associated with corresponding variation in premature death rate and suggest possible interventions.
Data
Our dataset is the County Health Rankings & Roadmaps (CHR&R) compiled by the University of Wisconsin Population Health Institute. This dataset contains a host of variables related to health outcomes at the county level for several years, although we focus on 2023 data. Our key outcome, premature death, is defined as the years of potential life lost before age 75 per 100,000 people. We then investigated numerous explanatory variables. First we investigated death due to violent acts such as suicide, homicide, and firearm fatalities, all of which were recorded as totals per 100,000 people. We also examined excessive drinking which is the percentage of adults reporting binge or heavy drinking, along with high school completion rates, defined as the percentage of adults ages 25 and over with a high school diploma or equivalent, and household median income.
We display in Table 1 the percentage of counties that have missing data for each variable. The variable ‘homicide’ is missing most frequently, with 57% of counties not reporting the rate, followed by firearm fatalities (27.7%), and suicides (22.6%). Some states report sparser data than others: 90 counties in Nebraska (96% of them) are missing homicide data, for instance, with South Dakota and North Dakota similar in nature. Caution is therefore needed in interpreting observed regional trends in these states.
variable | missing | Percent of Total Counties |
---|---|---|
Homicides | 1816 | 56.9% |
Firearm Fatalities | 871 | 27.3% |
Suicides | 709 | 22.2% |
Years Premature Death | 60 | 1.9% |
Median Household Income | 2 | 0.1% |
Excessive Drinking | 2 | 0.1% |
High School | 0 | 0.0% |
Excessive Drinking
Alcohol misuse is the seventh leading risk factor for premature death (National Institute on Alcohol Abuse and Alcoholism (NIAAA), 2023), and in our dataset, we found that excessive drinking use is negatively associated with premature death among adults aged 20 to 64 years. We recall that the County Health Rankings defines excessive drinking as the percentage of adults that report participating in binge or heavy drinking, so the data may differ depending on what adults consider heavy drinking.
Figure 1 presents the percentage of excessive drinking in U.S counties. High levels of excessive drinking and premature deaths are prevalent in the Western Region (California, Nevada, Washington), the Midwest (North and South Dakota, Nebraska, Kansas, Missouri, and Illinois), the Northeast (Maine, New York, Pennsylvania, and Washington D.C) , and Texas, Louisiana, and Florida. The Bible Belt States (Northern Georgia, Tennessee, Kentucky, Alabama, Mississippi, North Carolina, West Virginia, and North and South Carolina) have the highest level of premature death compared to other states.
This can be related to each state’s own social and cultural influences around alcohol use. For example, in the culture of New Orleans, Louisiana, alcohol is consumed during celebration and is the center of festivities, therefore, their drinking laws are more lenient and allow the sale of alcohol 24/7. Lenient drinking laws can serve as a segue to heavy drinking. States such as Minnesota, Iowa, and Wisconsin permit alcohol sales at later hours as well. In the state of Wisconsin, despite the drinking age of 21, a person of 18 years can consume alcohol in the presence of a parent or guardian. And in Iowa, those of under 21 can drink in any home if a parent or guardian is present. On the other hand, Utah has strict alcohol laws due to the large population of Latter-day Saints and their traditional teachings, which leads to lower levels of reported excessive drinking. Thus, the cultural background of states is valuable to understanding rates of excessive drinking.
We hypothesized that states with a high level of binge drinking would lead to more premature deaths. However, we present data in Figure 2 that shows the opposite: states with greater rates of excessive drinking also had lower levels of premature death. We conjecture that gun laws may play a role in this anomaly. However, although California has stricter gun laws than Texas, its counties saw more premature deaths from excessive drinking while Texas saw less. This means, drinking was more lethal in the less gun-friendly state. Thus, there must be some other confounding factor to cause drinking to be more lethal in Texas. We then hypothesized that motor fatalities (caused by drunk driving) could cause California to see more deaths from alcohol consumption; however, Texas actually had more motor fatalities per capita than CA at all levels of alcohol consumption, and more alcohol had a similar effect size on motor fatalities in both states.
We constructed a bivariate choropleth in Figure 3 to understand how drinking and premature death are associated by geographic location. According to the data, the Bible Belt states have lower levels of excessive drinking coinciding with higher levels of premature death, in contrast to say states like Wisconsin NY, where levels of drinking are high and premature deaths are relatively low. To understand why, note that the Bible Belt is known to be very conservative religious states devoted to Protestant, Christianity, Fundamentalist, and Pentecostal practices where religion is integrated with politics. The strict drinking laws can explain the low levels of excessive drinking. For example, drink deals such as, Happy Hours, are illegal in North Carolina (Gordon, 2021). In Alabama, “provocative” labels on alcohol bottles are not allowed to be sold (Zezima, 2006).
Moreover, our choropleth displays very obvious state lines, which is unexpected given that two neighboring counties across state lines shouldn’t be so dramatically different in their alcohol consumption and premature death. For example, moving from adjacent rural counties of southern Pennsylvania to West Virginia is accompanied by large changes in premature death and drinking. This could suggest the importance of state legislation on related issues or data collection differing at the state level. For example, different states might handle non-English responses differently, given that the data is self-reported.
We fail to see a consistent relationship across all states between premature deaths and excessive drinking, suggesting alternative primary causes of premature death. We did find regional patterns such as the Bible Belt that display the importance of cultural factors such as religion and conservatism. Politicians use religion, for instance, to regulate access to certain types of healthcare which may increase premature death. We also found oddities along state borders suggesting data collection differences. The negatively correlated relationship we observed between excessive drinking and premature death shows us more research on the effects of binge drinking on the population is required.
Suicides
A major driver of premature death is suicides which has significant impacts on many people’s lives, whether it be friends, family, classmates; a single suicide affects entire communities (CDC, 2021). The demographics of a county, particularly age, can significantly impact suicide rates due to social isolation or health problems (Motillon-Toudic et al.; K. Ahmedani, Brian et al.). Moreover, suicide rates may have differing impacts on premature death levels depending on the age distribution of a county. For instance, the death of older individuals contributes less to the total years of premature death than a death of a young individual, but this effect may be outweighed by the higher frequency of suicides among older Americans. We hypothesize that suicide rates will be higher in counties that are older and have a lower median household income.
To explore our hypothesis, we visualize in Figure 4 the suicide rate per 100K against the proportion of a county’s population over 65 and the suicide rate per 100K, where the size of each dot represents the median household income of the counties. In general, the older a county is, the more suicides they experience. We also see that of the counties that have the higher suicide rates, they generally have lower incomes. In contrast, the bottom of the graph shows counties with low suicide rates and typically higher incomes.
To explore how older counties differed from younger ones, we took the proportions over 65 across all counties and computed its median, followed by comparing counties whose proportion exceeded the median to those whose proportion did not. We saw that in the ‘older’ counties, their suicide rate was about 21 per 100K, while for the ‘younger’ counties, their suicide rate was 17.4 per 100K. Thus, ‘older’ counties experience about 20% more suicides than ‘younger’ counties.
We then performed a similar analysis but took the median suicide rate and classified counties as either having a high or low suicide rate, depending on if they exceeded the median rate or not. We then compared the median household incomes of the counties above and below the median suicide rate. Counties whose suicide rate is above the median have an average household income of just $56,186, and countries whose suicide rate is below the median have an average household income of $64,610. Therefore, counties that have a suicide rates above the median of 17.37 per 100K, have median household incomes of just 85.6% of what counties below the median suicide rate have.
In short, and in support of our hypothesis, counties that are older tend to experience more suicides per 100K, and of the counties above the median suicide rate, their median household incomes are, on average, significantly lower.
Having verified that the median household incomes is a fairly correlated variable to suicide rates, explore the relationships as different severities of suicide rates. In Figure 5, we separate the suicide rates for the counties into quintiles, where the lowest (red/orange) density graph contains the counties whose suicide rates are among the lowest 20%, the dark green contains 21%-40%, and so on. As hypothesized, as the suicide rates get higher, the median household income gets slightly lower. Most of the quintiles for suicide show a unimodal distribution of median household incomes. But the median household income density graph for the first quintile mimics a bimodal distribution, which is unexpected and worth further exploration.
We then generalized this in Figure 6 to compare the distribution of median household incomes between age groups and within each distinct quintile of suicide rates. As before, we defined an “old” county as those whose proportion over 65 is greater than the median proportion over 65, and analogously for “young” counties. At every quintile of suicide rates, we can see that the younger counties have a greater median household income. The greatest differences between the age classification of counties are at the 2 lowest quintiles. Unlike the other densities, that of the old counties in the lowest quintile of suicide rates has a distinct trimodal distribution. That is, there is unexpectedly high density of poorer and richer old counties within the lowest quintile of suicide rate. For the other quintiles and densities, there is a clear unimodal distribution with no concentration at high or low incomes for either county age classifications.
Median Household Income
We hypothesized that the states with the highest household median income would have the lowest years of premature death. While this hypothesis was predominantly true, there were a few deviations in this trend among the states.
The rose diagram in Figure 7 illustrates each states’ median household income as the color, with length of each bar proportional to the premature death level. Massachusetts is seen to have both the highest median household incomes as well as one of the lowest premature deaths (hence its short length and light colored bar). Mississippi, West Virginia, Louisiana, and Alabama by contrast have the lowest household income medians and while simultaneously having the highest premature death value, expressing a quite strong negative correlation. Washington, D.C. is a major outlier, lying in the highest median household income quartile as well as the highest quartile of premature death. Similarly, Indiana, though not an outlier, has a fairly low median household income and premature death rate. Generally speaking, moving clockwise from MN, the rose plot shows a general gradient from high incomes and low premature death to low incomes and high premature death.
All of the highest median income states also hold a spot in the top nine college graduation rates by state, which is consistent with richer households generally consuming more education. Eight of the states in the Southeast region make up the top ten states with the highest premature death rate and simultaneously lack a high median household income. Potential reasoning for this trend includes lack of access to quality healthcare and job security due to a poor median household income. Northeast and West Regions both contain four states within the top ten states with the lowest premature death value, however there are states within this area with higher household income medians, potentially due to regional differences.
High School Completion
Past research has demonstrated that education is highly correlated to life expectancies and quality of life, and education levels often differ by state and county as well as are frequently associated with other factors that affect overall health (County Health Rankings and Roadmaps, 2023). With this in mind we investigate the effects of high school completion rate on years of life lost to premature death.
Our original hypothesis was that counties with higher high school completion rates would have fewer years of life lost to premature death. Our reasoning was that individuals with lower education levels would see higher unemployment rates and lower incomes, lessening their quality of life and increasing premature deaths. For example, such individuals may be unable to afford quality health care or housing, and their diet may be of lower nutrition. We selected the top four and bottom five states for high school completion rate to hopefully identify an association between the two factors at the county-level.
While we originally anticipated a linear decrease, some states, such as CA and TX, showed no clear trend for premature deaths as high school completion rate increased (Figure 9). We then hypothesized that the income and wealth of a county might instead affect both variables, so the counties were then identified based on whether the county was above or below the median income for the state. We also filtered out counties that had a population less than the 20% quantile for the state, to exclude counties that may be outliers due to especially low populations. This analysis demonstrated that while the original hypothesis seemed to hold true for counties above the median income, counties below the median income showed more variability, with some states showing an upward trend instead for these counties. In contrast to our original hypothesis, the level of education does not seem to be a direct indicator of premature deaths among states.
We conjecture that perhaps high school completion rates are not capturing the true impact of education. Instead, the quality of schooling may matter: in a county with poor quality schooling, completing or not completing high school may be less impactful than in a county with high quality schooling. This may explain why for counties with low median incomes (and potentially worse schools), there is no clear relationship between premature death and completion rates. In a study on schools in Virginia, “High poverty schools in Virginia have less experienced teachers, lower teacher salaries, are less likely to have critical math, science, and advanced coursework, spend less per student on instructors and instructional materials with state and local dollars, and have fewer advanced courses” (Duncombe, 2021). While these schools may have similar completion rates to their wealthier counterparts, they have far fewer resources, leading to lower quality education.
Modeling
We additionally built a predictive model to assess whether we could predict premature death levels using some of our variables. We selected 11 variables: income inequality, homicides, firearm fatalities, high school completion, excessive drinking, math scores, drug overdose, food insecurity, motor fatalities, median household income, and child poverty. In all cases, we filtered for observations that had complete data. We built a random forest model, with hyperparameters of 450 trees and 5 randomly chosen variables considered at each split point. To determine these values we ran a grid search over a predefined range of possible values for both hyperparameters and trained the model for each combination. We evaluated the performance of each model using 5-fold cross-validation, and chose the hyperparameter with the lowest root mean square error. After training the model and picking hyperparameters, we plotted in Figure 10 the predicted premature death for a given county in our test data against its actual premature death to assess the accuracy of our model.
Next, we plotted in Figure 11 variable importance scores determined by the random forest model using the vip package. This calculation found median household income, child poverty, and motor fatalities to be the 3 most important variables in predicting years of premature death. However, it is important to remember that the variable importance determinant is univariate. In other words, while these three variables in isolation are good predictors of premature death, there may be combinations of these or other variables with even better predictive power.
Discussion/Conclusion
We have studied premature death and potential causes in the US using county level data for 2023. We find the following key conclusions:
Premature death and median household income have a fairly negative correlation, with this relationship impacted by geographic clustering and trends. “The socioeconomic gradient in longevity has been attributed to factors such as inequality, economic and social stress and differences in access to medical care” (Chetty 2016). A regression model would help quantify this impact on premature death, to understand how economic development in a region may reduce mortality. Additionally, median household income may indirectly lead to differences in the physical environment, such as unhealthy food intake and heavy air pollution exposure, leading to increases in premature death. There may be numerous causal mechanisms through which income and premature death are associated, warranting further study.
Next, investigating the geospatial distribution of excessive drinking rates, we found variation by location, likely reflecting state alcohol laws, culture, and religious traditions. We then found that there was a negative correlation between excessive drinking and premature deaths. We conjectured that gun laws and motor fatalities could play a role but our further analysis was inconclusive. We constructed a bivariate choropleth to understand how excessive drinking and premature death are associated by geographic location, and we found some regions had high premature death rates despite low drinking levels and vice-versa (e.g. Louisiana compared to Wisconsin). We conjectured that the high premature death despite relatively lower excessive drinking rates in the Bible Belt was due to the role of religion and contraception laws. Limited prenatal care contributes to increasing levels of maternal mortality, specifically among Black women, which can contribute to the high levels of premature deaths (Mcneel 2023).
After finding that both age distributions and median household incomes for counties highly correlate to suicide rates, we better understand how we may drive suicide rates down by focusing government and private sector resources. More specifically, we recommend interventions into counties that we know are older and into counties that have lesser household median income.
Our plots of high school completion rates and premature death rates failed to show a consistent trend. We believe data that can better capture the quality of the exposure of students to education would have a more clear negative association with premature deaths.
Additionally, we found that random forests were an effective tool to predict premature death, and we used measures of variable importance to possibly identify the most salient explanatory variables. However, a limitation of variable importance scores that we used is that they were univariate only, i.e., while it identified ‘important’ variables, the technique fails to identify possible combinations of variables that are better predictors than identified variables in isolation. We recommend exploration of recent work with iterative random forests for more robust (Basu, Sumanta, et al., 2018).
To reduce premature mortality, we recommend interventions that may address the negative effects of low income, excessive drinking, and high school completion. Understanding the drivers of premature death and reducing their impact has numerous advantages. It promotes health equity in a time where health outcomes can drastically vary by race or economic status. It increases the overall standard of living and reduces the often crippling financial impact of experiencing health issues. We ultimately believe that everyone deserves a fair and just opportunity to maximize their years of life spent in good health.
References
Basu, Sumanta, et al. “Iterative random forests to discover predictive and stable high-order interactions.” Proceedings of the National Academy of Sciences 115.8 (2018): 1943-1948.
Centers for Disease Control and Prevention. (n.d.). Suicide - health, United States. Centers for Disease Control and Prevention. https://www.cdc.gov/nchs/hus/topics/suicide.htm
Chetty, Raj, et al. “The Association between Income and Life Expectancy in the United States, 2001-2014.” JAMA, 26 Apr. 2016, www.ncbi.nlm.nih.gov/pmc/articles/PMC4866586/.
County Health Rankings and Roadmaps. (2023). High School Completion. County Health Rankings & Roadmaps. https://www.countyhealthrankings.org/explore-health-rankings/county-health-rankings-model/health-factors/social-economic-factors/education/high-school-completion?year=2023
Duncombe, C. (2021). Unequal Opportunities: Fewer Resources, Worse Outcomes for Students in Schools with Concentrated Poverty. The Commonwealth Institute. https://thecommonwealthinstitute.org/research/unequal-opportunities-fewer-resources-worse-outcomes-for-students-in-schools-with-concentrated-poverty/
Gordon, B. (2021, November 17). NC Answers: Why is there no happy hour in North Carolina? The Fayetteville Observer. https://www.fayobserver.com/story/news/2021/11/17/why-happy-hour-banned-north-carolina-law-restaurants-alcohol-bars/6037393001/
McNeel, B. (2023, June 8). New Moms Die More Often in the Bible Belt. Pastors Want to Fix That. Sojourners. https://sojo.net/articles/new-moms-die-more-often-bible-belt-pastors-want-fix
Motillon-Toudic, Chloé, et al. “Social Isolation and Suicide Risk: Literature Review and Perspectives.” European Psychiatry, vol. 65, no. 1, 2022, https://doi.org/10.1192/j.eurpsy.2022.2320.
K. Ahmedani, Brian et al. “Information for CME Credit— Major Physical Health Conditions and Risk of Suicide.” American Journal of Preventive Medicine, vol. 53, no. 3, 2017, https://doi.org/10.1016/j.amepre.2017.07.001.
Zezima, K. (2006, December 3). Ban on Saucy Beer Labels Brings a Free-Speech Suit. The New York Times. https://www.nytimes.com/2006/12/03/us/03beer.html
NIAAA. (2023). Global Burden | National Institute on Alcohol Abuse and Alcoholism (NIAAA). Www.niaaa.nih.gov. https://www.niaaa.nih.gov/alcohols-effects-health/alcohol-topics/alcohol-facts-and-statistics/global-burden