The Impact of Mental Health Professional Density on Poor Mental Health

Authors

Lay Len Ching

Nikhil Roy

Urvi Chaubal

Published

July 26, 2024

Introductions

Mental health providers are an important part of the population’s access to mental health care. According to a study from the Kaiser Family Foundation and CNN, 90% of the public believe there is a current mental health crisis; with almost one third of participants stating that they could not acquire mental health services and 60% of psychologists unable to see new patients. The inability to see professionals may lead to further mental health issues that negatively progress overtime.

As a critical component to overall health, prolonged poor mental health raises risks for physical health issues. As chronic health conditions persist, mental illnesses continue or worsen. By studying the relationship of mental health professionals in a county and the average poor mental health days, we can uncover underlying relationships and additional factors that contribute to the number of poor mental health in a county.

Data

Data Source

We used the County Health Rankings 2024 dataset to conduct our analysis and base our research. The organization is part of a program from the University of Wisconsin Population Health Institute which focuses on improving personal and community health. The original dataset includes specific variables on health outcomes, health behaviors, clinical care, social and economic factors, and physical environment for each United States county. From the original dataset, we select variables related to mental and physical health. We specifically choose such variables due to their relevance to individuals’ lifestyle and the changes they can make to improve their mental health.

Main Variables

The two main variables we are using are a ratio of population to mental health providers and poor mental health days. The ratio variable uses the county population and total mental health providers in that county based on the National Provider Identification file. The identification file classifies family therapists, marriage therapists, alcohol and drug abuse therapists, psychiatrists, psychologists, clinical social workers, counselors, and mental health specialized nurses as mental health providers. The poor mental health days variable is an age-adjusted, 30 day average of poor mental health days. By using an age-adjusted variable, counties with varying age groups can be fairly compared with one another. This is especially important for those with higher proportions of specific age groups. For example, according to the World Health Organization, older adults are more susceptible to poor mental health conditions such as depression and anxiety due to living conditions (isolation, nursing home care) and poor physical health (chronic illnesses, neurological conditions). Therefore, counties with higher elderly adult populations may have a higher rate of poor mental health days than others, which can ultimately skew an average value. Hence, an age-adjusted variable standardizes the value and maintains comparability between counties with different age groups.

Supplemental Variables

In addition to the main predictor and response variables, we examine other variables health and lifestyle variables. One variable of importance is Frequent Mental Distress, an age-adjusted variable which refers to a percentage of adults that report 14 or more poor mental health days in the past 30 days. Other variables such as excessive drinking, food insecurity, adult obesity, and physical inactivity are all age-adjusted percentages of the adult population in the county.

EDA

To begin exploring the response and predictor variables, we create a histogram to model poor mental health days in the United States. In modeling our response variable, its normal distribution becomes evident, appearing symmetrical and unimodal. This assumption is further confirmed when we look at the median and mean, with values of 5.222 and 5.201 respectively.

Furthermore, we build a choropleth map of Poor Mental Health Days, allowing us to visualize trends across the United States.

Looking at this map, we can discern two specific trends. First, the north of the US appears to have the lowest Poor Mental Health numbers, particularly the states of North Dakota and South Dakota. Second, we see very high numbers of Poor Mental Health days across Southern states, such as Arkansas, Kentucky, and Tennessee. We hone in on two states: Arkansas and South Dakota. Discussions on motivation behind these choices is further touched upon in the results section.

The graphs below focus on average Poor Mental Health days in Arkansas and South Dakota respectively by county.

Methods

Variable Selection

To begin our exploration of the potential variables we could employ when building our model, we begin with an analysis of the two main variables from our research question, (1) Ratio of Population to Mental Health Professionals and (2) Poor Mental Health Days. By plotting these two against each other, we create a scatter plot that allows us to visually interpret their relationship.

Looking at this graph, we can clearly see that there appears to be no relationship between our two main variables, whether linear or otherwise. Points appear randomly scattered throughout the plot, following no direction and providing us no insight into our response variable, Poor Mental Health Days. Our first conclusion is formulated here: the number of mental health professionals in a county does not predict the number of poor mental health days an individual within that same county will experience.

Now that we understand Ratio of Population to Mental Health Professionals (referred to as “Ratio” for the rest of the report) does not have a role in predicting Poor Mental Health Days, we aim to uncover the variables that are correlated and can be used in predicting our response variable. To reiterate, our focus is on factors that affect health and pertain to individuals’ lifestyles, including but not limited to: smoking, drinking, and exercise. We selected 22 variables fitting this criteria for our initial analysis.

For this analysis, we employ a correlation matrix of all 22 variables. We further refine our criteria by selecting variables that have a correlation greater than |0.30| and are left with eight key variables that are pertinent to the direction of our research and are highly correlated with our response variable. A correlation matrix for these eight variables is created to further visualize relationships.

On the left side of the plot, we can see a correlation matrix of these eight variables, plus our original potential predictor, Ratio, and our response, Poor Mental Health Days. On the right side of the plot, we hone in specifically on our response and visualize its relationship with each of the eight key variables. Interestingly, the variables high school completion and excessive drinking appear negatively correlated with our response. We can develop a compelling narrative to explain these negative correlations. First, those who complete high school are more likely to have better mental health outcomes. A study by Kondirolli & Sunder finds that “an extra year of education led to a lower likelihood of reporting any symptoms related to depression and anxiety”, (Kondirolli & Sunder, 2022). Second, while excessive drinking being negatively correlated could appear counterintuitive, we can understand that for some, drinking becomes not only a vice, but a relief from the stresses of everyday life. According to the National Institute on Alcohol Abuse, heavy drinking is defined as having more than 8 drinks per week for women and more than 15 drinks per week for men (Litten et al., 2024). While this might seem like a large amount, it simply amounts to two or three nights of heavy drinking per week. Those who drink “excessively” might be experiencing poor mental health in their day-to-day life, which, for them, is relieved by getting a drink after work with their friends or going to the bar on the weekends.

Linear Model Building

From this correlation matrix, we can see that Frequent Mental Distress is the variable which is most highly correlated with our response. Thus, we begin by building a model that uses Frequent Mental Distress as the only predictor to the response. We continue adding more predictors that also seem to be correlated with our response and analyze which variables contribute to the model best through performance metrics, namely adjusted R-squared, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). While building these models, it is imperative that we consider the possibility of multicollinearity between variables. Looking back at our correlation matrix, we make sure to test models which omit predictor variables that appear highly correlated. To test multicollinearity in our models, we use a Variance Inflation Factor (VIF), removing any variables with a VIF > 5.

Additionally, we perform stepwise and backward selection in order to see if statistical algorithms can more accurately select predictors that are more ideal for our response. For stepwise selection, we give the algorithm a model that has Frequent Mental Distress predicting Poor Mental Health Days. While the algorithm is thorough in its search for predictors, it takes larger amounts of computing power and increases complexity in our model building. Similarly, backwards selection, while different from stepwise selection, produces similar results regarding practicality and efficiency. Thus, in holistically re-analyzing our EDA and background research, we select 9 models and test them against each other to find the most practical regression to model Poor Mental Health Days.
After comparing the 9 models we built using various statistical metrics (AIC, BIC, RMSE), we focus towards two models that appeared best in predicting Poor Mental Health Days: models 6 and 7. Model 6 includes 4 predictors (Frequent Mental Distress, Adult Smoking, Physical Inactivity, and Adult Obesity) while Model 7 includes all predictors in model 6 plus an additional 4: Excessive Drinking, Broadband Access, Asian ethnicity percentage, and Social Associations. A key note about the main difference between models 6 and 7 is not the predictors themselves, but the fact that model 7 has double the amount of predictors, making it double as complex.

To further understand our two chosen models, we highlight our same statistical metrics with the addition of Degrees of Freedom (df) and Adjusted R-Squared. In this regard, df relates to the number of predictors in a model, meaning Model 6 has half the degrees of freedom—or half the complexity—as Model 7 . Model 7 has a higher adjusted R-squared, but when we look more closely at the exact number, the difference is only 0.005, which is negligible when considering our scale. Likewise, the AIC and BIC performance metrics are not statistically significant enough to be different from Model 7, confirming our choice of Model 6 since we are looking for the simplest possible model withou sarificing performance evaluations.

Model 6 has four predictors: Frequent Mental Distress Adult Smoking Excessive Drinking Adult Obesity.

Visualizing our four predictor variables, we can sustain assumptions of linearity through scatterplots.

Results

As described earlier, Model 6 was our best-fit model for assess Poor Mental Health Days in the United States. To further test this model, we can make predictions using test-train splitting. We split the data randomly 80-20, with 80% as the “training” set and 20% as our “test” set. Then, we employ 10-fold cross-validation and repeated the bootstrap 3 times. In short, this method randomly selects data with replacements to ensure there is no bias. Finally, we use a prediction function which learns the best model parameters for each variable using the training data and creates predictions.

State Selection & Testing

To further test the accuracy of our model 6, weighted against the models, we hone in on two states: Arkansas and South Dakota. The reason for this selection was the noticeable difference in the response spread across each state, respectively. Reflecting back on the US Choropleth map of Poor Mental Health Days, we can recall our two trends: (1) lowest Poor Mental Health Day averages in the North, and (2) highest poor mental health averages in the South. Regarding trend one, we choose the state of South Dakota because it had the largest sample size of the Northern states with the lowest averages of Poor Mental Health Days. For trend two, we choose the state of Arkansas due to it having a relatively large sample size and being the state with the highest number of Poor Mental Health Days. We will use Arkansas and South Dakota data to conduct a state-specific analysis of the individual lifestyle related factors that contribute to poor mental health.

After our selection, we tested Model 6 for each state. After repeating our model performance and evaluation functions across several models, Model 6 still stood out as the most practical model. However, coefficients for the specific data that contained the entire United States, differed from the coefficient estimates for the specific states of South Dakota and Arkansas. Various predictors, such as adult smoking, exhibited negative coefficients, which might appear counter-intutive. However, we can paint a picture of why this is so. While smoking is a habit that could be said to contribute to poor mental health, it is important to understand that for some, it appears in their life as a vice and therefore a reliever from the stresses of everyday life. Additionally, it is important to note our small sample sizes. It is hard to obtain statistically significant results with such small sample sizes. This is further discussed below in our limitations.

Here are the coefficients for South Dakota using model 6:

  Poor Mental Health Days
raw value
Predictors Estimates CI p
(Intercept) 4.15 4.11 – 4.18 <0.001
Frequent Mental Distress
raw value
0.80 0.54 – 1.06 <0.001
Adult Smoking raw value -0.30 -0.60 – -0.01 0.043
Physical Inactivity raw
value
0.00 -0.18 – 0.18 0.993
Adult Obesity raw value -0.01 -0.09 – 0.07 0.865
Observations 52
R2 / R2 adjusted 0.949 / 0.945

Likewise, here are the coefficient for Arkansas using model 6:

  Poor Mental Health Days
raw value
Predictors Estimates CI p
(Intercept) 5.95 5.90 – 5.99 <0.001
Frequent Mental Distress
raw value
0.37 0.27 – 0.48 <0.001
Adult Smoking raw value -0.07 -0.21 – 0.07 0.323
Physical Inactivity raw
value
-0.06 -0.18 – 0.06 0.316
Adult Obesity raw value -0.02 -0.10 – 0.06 0.631
Observations 60
R2 / R2 adjusted 0.728 / 0.708

As you can see, both models have Frequent Mental Distress as the most significant predictor for the response. Since the estimated value is positive, we can say that for every increase in 1 unit in Frequent Mental Distress, we see an increase in Poor Mental Health Days by 0.80/0.37 days for South Dakota/Arkansas, respectively. On the contrary, Adult Obesity is negative for both states, so we would see a decrease in Poor Mental Health Days for every increase in 1 unit of Adult Obesity. This was surprising because as there are more Adult Obesity, it decreases the amount of poor mental health days. One important difference to note is that Physical Inactivity is roughly 3 times more statistically significant/impactful on the response for the state of Arkansas than it is for South Dakota. Thus, this exemplifies that each state in the US are different in terms of society, environment, people, population, etc. Therefore, we cannot use model 6 to say that it has the best predictors for every individual state, but rather we use model 6 generally if we want to map the entire United States and stick with 1 model to go more in-depth within a state itself.

Discussion

Conclusion: Answer to Research Question

Our question of interest was the following: Do the Number of Mental Health Professionals per County Affect the Number of Poor Mental Health days? Our first conclusion answers this question directly. The number of mental health professionals does not affect the number of poor mental health days. This can be further backed by a lack of a relationship between the Ratio of Mental Health Providers to Population and Poor Mental Health Days.

Conclusion: Predictor Variables

Our second conclusion focuses on the variables that do predict Poor Mental Health Days. Through an extensive analysis of both linear and non-linear regression models, we can safely state that the predictors that work best to approximate our response variable are:

Frequent Mental Distress
Adult Smoking Physical Inactivity Adult Obesity

Our conclusion is further supported by our state-wise analysis, where we hone in on South Dakota and Arkansas to confirm that these variables continue to be the best predictors of poor mental health days.

Notably, variables such as smoking and drinking are negatively correlated with our response variable of Poor Mental Health Days. We believe this could point to a greater issue, which we have touched upon briefly earlier in this report—these substances are being used as vices, avenues through which individuals can find relief from their mental health and its causes. These trends raise several important points for consideration. Firstly, the negative correlation does not imply that smoking and drinking improve mental health. Rather, we believe it suggests that individuals with poor mental health may resort to these behaviors as coping mechanisms. Secondly, these results and the context by which they are surrounded (studies show that those with poor mental health often turn to substances to relieve everyday symptoms) indicate a need for comprehensive mental care that not only addresses symptoms, but other lifestyle choices that might make an individual feel as if they are happier, but which truthfully does more harm than good. Interventions should focus on providing healthier coping mechanisms and reducing reliance on substances such as alcohol and tobacco. This could include increased access to mental health services, community support programs, and public health campaigns that promote mental well-being and educate on the risks of substance abuse.

By understanding and addressing the root causes of poor mental health, we can develop more effective interventions that promote overall well-being and reduce reliance on harmful substances.

Limitations

The first limitation we encountered is that we cannot compare across states due to differing data collection methods, which limits our ability to draw consistent and reliable conclusions about mental health trends on a state-by-state basis. Inconsistencies in survey techniques, population coverage, and reporting standards introduce biases and errors, making it challenging to directly compare results from different states. As a result, our ability to identify and analyze state-specific factors influencing mental health outcomes is compromised. To address this, we built a model using entire US data as opposed to state or county data for a more uniform approach. However, this broader model may overlook important local variations, further highlighting the need for standardized data collection practices across states to enhance the accuracy and applicability of our findings.

Our second limitation arises when comparing US data and specific State data. When using aggregate US data, variables appear statistically significant when we consider an alpha of 0.05. However, not all of these variables confirm statistical significance when we apply our models directly during our state-wise analysis. While this is an important fact to consider when analyzing our model, we believe that these variables show beyond doubt relevance in terms of practical significance.

It is also important to consider the differences between states when analyzing factors affecting poor mental health. Factors such as the state’s environment, regulatory policies, healthcare infrastructure, and socioeconomic conditions can significantly influence mental health outcomes. These variables can create disparities and unique challenges that affect the generalizability and performance of statistical models. For instance, states with robust public health initiatives and better access to mental health services may experience lower rates of poor mental health days, despite the fact that these do not directly affect any of our predictor variables. Additionally, cultural norms and attitudes towards mental health play a crucial role, affecting the willingness of individuals to seek help and adhere to treatment. Another critical factor is geography and the environment. Urbanization, pollution levels, and climate can also impact mental health differently across states. The statistical significance observed on a national scale may not hold in states like South Dakota and Arkansas due to these varying local conditions and influence. In South Dakota, factors such as rural settings, limited healthcare access, and cultural attitudes towards mental health may contribute to different mental health outcomes compared to other states. Similarly, in Arkansas, socioeconomic challenges, healthcare infrastructure limitations, and specific state policies may affect the prevalence and management of mental health issues.

Furthermore, the issue of a small sample size can undermine the reliability of our findings. As the number of counties per state provides limited data points for which we can fit our model, the estimates of population parameters become less precise, leading to an increase of the likelihood of a Type II error. Therefore, while our statistical model shows significance on a national level, it is crucial to acknowledge state-specific variations, as well as other limitations, and consider them in our analysis. In understanding the unique context of each state, we can capture a more accurate and comprehensive picture of the factors influencing poor mental health across the United States.

Future Research

For future research, it is crucial to explore potential avenues to address the limitations identified in our current study. One such direction is to shift from county-level data to individual-level data. By doing so, we can increase our sample size significantly, thereby enhancing the precision of our estimates and reducing the likelihood of Type II errors. Individual-level data would also allow us to control for a wider range of variables, providing a more nuanced understanding of the factors influencing mental health outcomes.

Another important area of research is the standardization of data collection practices across states. Developing and implementing uniform survey techniques and reporting standards will reduce the prevalence of biases and errors, which will enable us to more accurately perform state-by-state comparisons. In standardizing data collection methods, we can identify and analyze state-specific factors influencing mental health outcomes, which will lead us to more concrete conclusions as well as more effective interventions.

Additionally, future studies should consider the impact of state-specific variables more comprehensively. Investigating the influence of environmental factors, regulatory policies, and socioeconomic conditions at the state level can provide deeper insights into the disparities observed in mental health outcomes.

By pursuing these research directions, we can develop a more comprehensive and accurate understanding of mental health trends across the United States, which would allow us to provide better recommendations on how individuals can improve their mental health—ultimately, the aim of our study.

Appendix

Comparison of 9 Linear Models

Comparison of Top 3 Linear Models

Assure Model Does Not Overfit

The plot above thoroughly exemplifies the significance model 6 has on estimating Poor Mental Health Days across the United States. As visualized, the red line marks the exact prediction with the associated value with perfect accuracy. The blue and orange points are the predicted values the model produced based on learning from the training and test data. We can see the points are relatively close to the red line (which indicates accuracy) and have minimal distance (called the root mean squared error (RMSE)). Thus, minimizing the distance helps determine a good model and in this case, the distance is small . The difference in color between the points - blue and orange - is the test and training data. The main reason we did this is to ensure our model does not overfit. Overfitting is when the model does well on the testing data but performs poorly on the testing data. Since the blue and orange points are similar with each other, model 6 does not overfit - a good thing. Model 6 does a great job in predicting Poor Mental Health Days with the four predictors, explaining our model well and accurately.

Comparing Models by State

Works Cited

CDC. (2024, April 16). About Mental Health

Kondirolli, F., & Sunder, N. (2022). Mental health effects of education. Health Economics

Litten, R. Z., Kwako, L. E., & Gardner, M. B. (2024, February 27). The basics: Defining how much alcohol is too much. National Institute on Alcohol Abuse and Alcoholism

Stringer, H. (2024, January 1). Mental health care is in high demand. Psychologists are leveraging tech and peers to meet the need. American Psychological Association

University of Wisconsin Population Health Institute. (n.d.). About Us. County Health Rankings & Roadmaps

World Health Organization. (2023, October 20). Mental Health of Older Adults

Download: https://www.kaggle.com/datasets/justinmustaine/continental-us-counties-2022