Introduction

Substance abuse is a very complicated issue. There are so many factors that lead to substance abuse. We looked at three different questions, two in which drug and alcohol abuse are the response variables, and one in which drug and alcohol abuse are used as predictors for another response variables. These three questions are detailed below.

Predictors of Drug Overdose

In 2021, there was a 28.5% increase in drug overdoses in the United States from the previous year (CDC). Which places in the United States had the highest rates of drug overdose deaths, and what are some characteristics of these places? We wanted to look at health behaviors and quality of life factors as well as socioeconomic factors for this section.

Drug Abuse and Teen Births

There are a variety of factors that can lead to teen births. Is excessive drinking and drug use one of these factors? We wanted to look at whether or not drug and alcohol abuse is correlated with teen births, or if other factors not related to excessive drinking and drug are better predictors than teen births alone.

Data

In order to answer our questions, we looked at a variety of datasets.

County Health Rankings Dataset

The County Health Rankings dataset, collected by the University of Wisconsin Population Health Institute, ranks every county in a given state on their Health Outcomes and Health Factors. This dataset also contains the measurements used to calculate the rankings for each county. We primarily focused on the measurements used to calculate the rankings, a table with 3,193 observations of 249 different variables.

FIPS State County Unreliable Deaths Years of Potential Life Lost Rate 95% CI - Low 95% CI - High Quartile YPLL Rate (AIAN) YPLL Rate (AIAN) 95% CI - Low YPLL Rate (AIAN) 95% CI - High YPLL Rate (AIAN) Unreliable YPLL Rate (Asian) YPLL Rate (Asian) 95% CI - Low YPLL Rate (Asian) 95% CI - High YPLL Rate (Asian) Unreliable YPLL Rate (Black) YPLL Rate (Black) 95% CI - Low YPLL Rate (Black) 95% CI - High YPLL Rate (Black) Unreliable YPLL Rate (Hispanic) YPLL Rate (Hispanic) 95% CI - Low YPLL Rate (Hispanic) 95% CI - High YPLL Rate (Hispanic) Unreliable YPLL Rate (white) YPLL Rate (white) 95% CI - Low YPLL Rate (white) 95% CI - High YPLL Rate (white) Unreliable % Fair or Poor Health 95% CI - Low_1 95% CI - High_1 Quartile_1 Average Number of Physically Unhealthy Days 95% CI - Low_2 95% CI - High_2 Quartile_2 Average Number of Mentally Unhealthy Days 95% CI - Low_3 95% CI - High_3 Quartile_3 Unreliable_1 % Low birthweight 95% CI - Low_4 95% CI - High_4 Quartile_4 % LBW (AIAN) % LBW (AIAN) 95% CI - Low % LBW (AIAN) 95% CI - High % LBW (Asian) % LBW (Asian) 95% CI - Low % LBW (Asian) 95% CI - High % LBW (Black) % LBW (Black) 95% CI - Low % LBW (Black) 95% CI - High % LBW (Hispanic) % LBW (Hispanic) 95% CI - Low % LBW (Hispanic) 95% CI - High % LBW (white) % LBW (white) 95% CI - Low % LBW (white) 95% CI - High % Smokers 95% CI - Low_5 95% CI - High_5 Quartile_5 % Adults with Obesity 95% CI - Low_6 95% CI - High_6 Quartile_6 Food Environment Index Quartile_7 % Physically Inactive 95% CI - Low_7 95% CI - High_7 Quartile_8 % With Access to Exercise Opportunities Quartile_9 % Excessive Drinking 95% CI - Low_8 95% CI - High_8 Quartile_10 # Alcohol-Impaired Driving Deaths # Driving Deaths % Driving Deaths with Alcohol Involvement 95% CI - Low_9 95% CI - High_9 Quartile_11 # Chlamydia Cases Chlamydia Rate Quartile_12 Teen Birth Rate 95% CI - Low_10 95% CI - High_10 Quartile_13 Teen Birth Rate (AIAN) Teen Birth Rate (AIAN) 95% CI - Low Teen Birth Rate (AIAN) 95% CI - High Teen Birth Rate (Asian) Teen Birth Rate (Asian) 95% CI - Low Teen Birth Rate (Asian) 95% CI - High Teen Birth Rate (Black) Teen Birth Rate (Black) 95% CI - Low Teen Birth Rate (Black) 95% CI - High Teen Birth Rate (Hispanic) Teen Birth Rate (Hispanic) 95% CI - Low Teen Birth Rate (Hispanic) 95% CI - High Teen Birth Rate (white) Teen Birth Rate (white) 95% CI - Low Teen Birth Rate (white) 95% CI - High # Uninsured % Uninsured 95% CI - Low_11 95% CI - High_11 Quartile_14 # Primary Care Physicians Primary Care Physicians Rate Primary Care Physicians Ratio Quartile_15 # Dentists Dentist Rate Dentist Ratio Quartile_16 # Mental Health Providers Mental Health Provider Rate Mental Health Provider Ratio Quartile_17 Preventable Hospitalization Rate Quartile_18 Preventable Hosp. Rate (AIAN) Preventable Hosp. Rate (Asian) Preventable Hosp. Rate (Black) Preventable Hosp. Rate (Hispanic) Preventable Hosp. Rate (white) % With Annual Mammogram Quartile_19 % Screened (AIAN) % Screened (Asian) % Screened (Black) % Screened (Hispanic) % Screened (white) % Vaccinated Quartile_20 % Vaccinated (AIAN) % Vaccinated (Asian) % Vaccinated (Black) % Vaccinated (Hispanic) % Vaccinated (white) # Completed High School Population % Completed High School 95% CI - Low_12 95% CI - High_12 Quartile_21 # Some College Population_1 % Some College 95% CI - Low_13 95% CI - High_13 Quartile_22 # Unemployed Labor Force % Unemployed Quartile_23 % Children in Poverty 95% CI - Low_14 95% CI - High_14 Quartile_24 % Children in Poverty (AIAN) % Children in Poverty (Asian) % Children in Poverty (Black) % Children in Poverty (Hispanic) % Children in Poverty (white) 80th Percentile Income 20th Percentile Income Income Ratio Quartile_25 # Children in Single-Parent Households # Children in Households % Children in Single-Parent Households 95% CI - Low_15 95% CI - High_15 Quartile_26 # Associations Social Association Rate Quartile_27 Annual Average Violent Crimes Violent Crime Rate Quartile_28 # Injury Deaths Injury Death Rate 95% CI - Low_16 95% CI - High_16 Quartile_29 Injury Death Rate (AIAN) Injury Death Rate (AIAN) 95% CI - Low Injury Death Rate (AIAN) 95% CI - High Injury Death Rate (Asian) Injury Death Rate (Asian) 95% CI - Low Injury Death Rate (Asian) 95% CI - High Injury Death Rate (Black) Injury Death Rate (Black) 95% CI - Low Injury Death Rate (Black) 95% CI - High Injury Death Rate (Hispanic) Injury Death Rate (Hispanic) 95% CI - Low Injury Death Rate (Hispanic) 95% CI - High Injury Death Rate (white) Injury Death Rate (white) 95% CI - Low Injury Death Rate (white) 95% CI - High Average Daily PM2.5 Quartile_30 Presence of Water Violation Quartile_31 % Severe Housing Problems 95% CI - Low_17 95% CI - High_17 Severe Housing Cost Burden Severe Housing Cost Burden 95% CI - Low Severe Housing Cost Burden 95% CI - High Overcrowding Overcrowding 95% CI - Low Overcrowding 95% CI - High Inadequate Facilities Inadequate Facilities 95% CI - Low Inadequate Facilities 95% CI - High Quartile_32 % Drive Alone to Work 95% CI - Low_18 95% CI - High_18 Quartile_33 % Drive Alone (AIAN) % Drive Alone (AIAN) 95% CI - Low % Drive Alone (AIAN) 95% CI - High % Drive Alone (Asian) % Drive Alone (Asian) 95% CI - Low % Drive Alone (Asian) 95% CI - High % Drive Alone (Black) % Drive Alone (Black) 95% CI - Low % Drive Alone (Black) 95% CI - High % Drive Alone (Hispanic) % Drive Alone (Hispanic) 95% CI - Low % Drive Alone (Hispanic) 95% CI - High % Drive Alone (white) % Drive Alone (white) 95% CI - Low % Drive Alone (white) 95% CI - High # Workers who Drive Alone % Long Commute - Drives Alone 95% CI - Low_19 95% CI - High_19 Quartile_34
01000 Alabama NA NA 88086 10350 10246 10454 NA 5967 4840 7094 NA 3411 2945 3876 NA 13245 13023 13467 NA 5244 4901 5588 NA 9563 9439 9686 NA 21 20 23 NA 4.8 4.5 5.1 NA 5.6 5.3 6.0 NA NA 10 10 11 NA 10 8 12 9 9 10 16 16 16 7 7 8 8 8 8 21 20 23 NA 36 35 38 NA 5.3 NA 31 29 32 NA 57 NA 15 14 16 NA 1255 4848 26 25 27 NA 31228 636.9 NA 28 27 28 NA 14 11 17 4 3 5 34 33 35 52 50 54 23 23 23 457718 12 11 12 NA 3228 66 1519:1 NA 2429 49 2026:1 NA 5818 118 846:1 NA 4875 NA 8445 2704 6198 4263 4597 42 NA 37 35 41 32 42 42 NA 39 42 30 32 45 2905059 3344006 87 87 87 NA 761762 1237123 62 61 62 NA 131065 2230132 5.9 NA 21 20 22 NA 26 12 38 38 13 106150 20580 5.2 NA 339392 1090304 31 31 32 NA 5960 12.2 NA 23307 480 NA 21249 87 86 88 NA 37 28 48 28 23 33 86 84 88 43 39 46 92 91 94 9.0 NA NA NA 14 13 14 12 12 12 2 2 2 1 1 1 NA 85 85 85 NA 80 78 81 76 74 79 83 83 84 77 76 79 86 86 86 2095195 35 35 36 NA
01001 Alabama Autauga NA 836 8027 7198 8857 1 NA NA NA NA NA NA NA NA 11549 9369 13728 NA NA NA NA NA 7333 6411 8254 NA 20 18 23 1 4.5 4.2 4.8 1 5.4 5.1 5.7 1 NA 10 9 11 2 NA NA NA NA NA NA 14 12 17 NA NA NA 8 7 9 20 17 22 1 35 34 37 1 6.5 3 32 30 35 1 63 1 16 15 17 4 18 56 32 25 39 3 323 578.1 2 23 20 26 1 NA NA NA NA NA NA 29 23 37 NA NA NA 22 19 25 4366 9 8 11 1 25 45 2235:1 2 19 34 2955:1 2 21 37 2674:1 3 4931 3 NA NA 11278 NA 4356 42 2 NA NA 39 15 43 42 2 NA 36 31 33 44 33587 37860 89 87 90 1 8733 14382 61 55 67 1 1262 25838 4.9 2 15 9 20 1 NA 17 50 14 16 109878 21396 5.1 3 3628 13143 28 22 33 2 72 12.9 2 149 272 2 191 69 59 78 1 NA NA NA NA NA NA 63 44 87 NA NA NA 73 61 84 9.5 4 No 1 15 12 17 13 10 15 1 1 2 2 1 3 3 87 84 89 3 NA NA NA NA NA NA 83 76 90 NA NA NA 83 78 89 24949 41 36 45 3
01003 Alabama Baldwin NA 3377 8118 7667 8570 1 NA NA NA NA NA NA NA NA 11603 9971 13235 NA 4591 3232 6327 NA 8059 7555 8562 NA 17 15 20 1 4.2 3.9 4.5 1 5.2 4.8 5.5 1 NA 8 8 9 1 NA NA NA 9 5 13 15 13 17 6 5 8 8 7 8 20 17 23 1 30 28 31 1 7.4 1 28 25 30 1 75 1 22 21 22 4 57 177 32 28 36 3 750 336.0 1 24 22 25 1 NA NA NA NA NA NA 31 26 35 42 34 50 22 21 24 19085 11 10 12 1 154 69 1450:1 1 112 49 2047:1 1 228 99 1006:1 2 3578 1 NA 4956 5689 2386 3503 44 1 25 44 44 31 44 46 1 43 50 31 35 47 140740 155563 90 89 91 1 33289 50798 66 62 70 1 5425 96763 5.6 3 12 7 18 1 31 NA 16 24 8 120479 27466 4.4 1 8613 46902 18 15 22 1 226 10.1 3 408 204 1 817 75 70 80 1 NA NA NA NA NA NA 63 49 81 26 14 44 80 74 86 7.2 1 No 1 12 11 14 11 10 12 1 1 2 1 0 1 3 83 81 85 1 NA NA NA NA NA NA 87 80 94 68 56 80 82 80 83 97098 38 35 41 2
01005 Alabama Barbour NA 539 12877 11150 14604 4 NA NA NA NA NA NA NA NA 15534 12931 18136 NA NA NA NA NA 9950 7691 12209 NA 31 28 34 4 5.9 5.6 6.2 4 6.1 5.9 6.4 4 NA 12 10 13 3 NA NA NA NA NA NA 16 14 18 NA NA NA 7 5 9 28 25 31 4 40 39 42 4 5.7 4 42 39 44 4 50 2 14 13 15 1 12 32 38 28 46 4 221 895.2 4 35 29 40 3 NA NA NA NA NA NA 34 28 42 77 45 123 30 23 39 2194 13 11 15 3 9 36 2743:1 3 9 37 2732:1 2 6 24 4098:1 3 4548 2 NA NA 5222 NA 4207 46 1 NA NA 46 NA 46 36 3 NA NA 30 NA 37 13300 17797 75 72 77 4 2566 6690 38 33 44 4 605 8587 7.0 4 38 24 51 4 NA NA 59 88 15 79175 13207 6.0 4 2822 5222 54 48 60 4 18 7.3 4 106 414 3 103 82 66 98 1 NA NA NA NA NA NA 82 61 108 NA NA NA 86 64 114 9.0 2 No 1 15 13 18 13 11 16 3 2 5 0 0 1 4 84 81 87 2 NA NA NA NA NA NA 77 70 85 NA NA NA 87 83 90 8555 37 31 42 2
01007 Alabama Bibb NA 460 11191 9626 12757 2 NA NA NA NA NA NA NA NA 11737 8711 15474 NA NA NA NA NA 11369 9516 13222 NA 25 22 27 3 5.2 4.9 5.4 3 5.8 5.6 6.1 3 NA 10 9 11 2 NA NA NA NA NA NA 17 13 21 NA NA NA 8 7 10 25 23 28 4 41 40 42 4 7.6 1 38 35 40 3 11 4 16 15 17 3 6 28 21 12 32 1 121 540.3 2 34 28 40 3 NA NA NA NA NA NA 24 14 38 NA NA NA 36 29 43 1824 11 9 13 1 13 58 1723:1 1 6 27 3689:1 3 6 27 3689:1 3 6329 4 NA NA 7592 NA 6232 37 3 NA NA 41 NA 37 33 4 NA NA 22 NA 35 12931 15987 81 78 84 4 2392 6333 38 31 44 4 573 8640 6.6 3 22 14 30 2 NA NA 42 8 25 87594 16239 5.4 3 1666 4539 37 25 49 3 20 8.9 3 20 89 1 116 103 85 122 4 NA NA NA NA NA NA 95 60 142 NA NA NA 109 88 134 9.4 4 No 1 11 7 15 9 6 13 1 0 2 1 0 2 2 88 84 91 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8107 55 47 63 4
01009 Alabama Blount NA 1143 10787 9825 11749 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 23 20 26 2 4.9 4.6 5.3 2 5.7 5.4 6.0 2 NA 8 7 9 1 NA NA NA NA NA NA NA NA NA 8 6 10 8 7 9 22 19 25 2 39 37 40 3 7.8 1 33 31 36 2 23 4 16 15 17 4 14 93 15 10 21 1 234 404.7 2 30 26 33 2 NA NA NA NA NA NA NA NA NA 31 23 41 30 26 33 6663 14 13 16 4 13 22 4448:1 4 11 19 5262:1 4 10 17 5788:1 4 4260 1 NA NA 10161 2138 4231 37 3 NA NA NA 32 37 40 2 NA NA 25 30 40 32976 39814 83 81 84 3 7843 13939 56 51 62 2 1008 24661 4.1 1 19 14 24 1 NA NA 3 39 15 100031 20993 4.8 2 3012 13352 23 17 28 1 43 7.4 4 279 483 3 303 105 93 117 4 NA NA NA NA NA NA NA NA NA 40 20 72 113 100 126 9.4 4 No 1 10 9 12 8 6 9 2 1 2 1 1 2 1 88 86 91 4 NA NA NA NA NA NA NA NA NA 78 36 100 84 81 87 22636 60 54 65 4

Median Household Income by State

The World Population Review published a dataset with with the median household incomes for each state in the United States, for the year 2022. There are 50 observations with two variables: State and Median Household Income.

State HouseholdIncome
Maryland 84805
New Jersey 82545
Hawaii 81275
Massachusetts 81215
Connecticut 78444
Alaska 77640

12 Month-ending Provisional Number and Percent Change of Drug Overdose Deaths

The National Vital Statistics System from the CDC published provisional counts for drug overdoses caused by a variety of different drugs, by state, for each month between January 2015 and February 2022. There were 50,052 observations of 12 variables over the years, but the variables of interest we used were state and total deaths.

State Year Month Period Indicator Data.Value Percent.Complete Percent.Pending.Investigation State.Name Footnote Footnote.Symbol Predicted.Value
AK 2015 April 12 month-ending Cocaine (T40.5) 100 0 Alaska Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality. **
AK 2015 April 12 month-ending Heroin (T40.1) 100 0 Alaska Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality. **
AK 2015 April 12 month-ending Methadone (T40.3) 100 0 Alaska Numbers may differ from published reports using final data. See Technical Notes. Data suppressed (<10). Data not shown due to low data quality. **
AK 2015 April 12 month-ending Number of Deaths 4,133 100 0 Alaska Numbers may differ from published reports using final data. See Technical Notes. **
AK 2015 April 12 month-ending Percent with drugs specified 88.0952381 100 0 Alaska Numbers may differ from published reports using final data. See Technical Notes. **
AK 2015 April 12 month-ending Psychostimulants with abuse potential (T43.6) 100 0 Alaska Numbers may differ from published reports using final data. See Technical Notes. Data not shown due to low data quality. **

Methods

Predictors of Drug Overdose

Looking at this US map, we can see the total deaths caused by drug overdoses in each state.

Here, we can see that California and Florida have the highest total drug overdose deaths of all the states. We can also see that Texas and Ohio also have high drug overdose deaths From this, we decided to look at the most common drug that caused overdose in California, Florida, Ohio, and Texas.

##   State                   Indicator Data.Value
## 1    CA Opioids (T40.0-T40.4,T40.6)       3324
## 2    CA Opioids (T40.0-T40.4,T40.6)       3249
## 3    CA Opioids (T40.0-T40.4,T40.6)       3103
##   State                   Indicator Data.Value
## 1    FL Opioids (T40.0-T40.4,T40.6)       3981
## 2    FL Opioids (T40.0-T40.4,T40.6)       3923
## 3    FL Opioids (T40.0-T40.4,T40.6)       3872
##   State                   Indicator Data.Value
## 1    TX Opioids (T40.0-T40.4,T40.6)       1452
## 2    TX Opioids (T40.0-T40.4,T40.6)       1431
## 3    TX Opioids (T40.0-T40.4,T40.6)       1412
##   State                   Indicator Data.Value
## 1    OH Opioids (T40.0-T40.4,T40.6)       3484
## 2    OH Opioids (T40.0-T40.4,T40.6)       3463
## 3    OH Opioids (T40.0-T40.4,T40.6)       3394

Based on this information, we decided the best way to look at the data was via clustering, in order to further detect patterns in the data. We used k-means clustering and model-based clustering.

Drug Abuse and Teen Births

First, we did some exploratory data analysis on the different variables.

We also decided to do a ridge regression on the County Health Rankings Dataset to answer the questions pertaining to drug abuse and teen births. Then, we did a simple linear regression to regress teen births against excessive drinking and smoking adults, to see if these specific behaviors had any relationships with the number of teen births in a specific county.

Results

Predictors of Drug Overdose

After standardizing the variables, we clustered the data to display the relationship between total drug overdoses and three different socioeconomic factors: high school completion, unemployment, and some college experience, as showcased below:

Drug Abuse and Teen Births

## # A tibble: 6 x 13
##   Name    Teen.births.raw.… Teen.births..Asia… Teen.births..Bl… Teen.births..Hi…
##   <chr>               <dbl>              <dbl>            <dbl>            <dbl>
## 1 Autaug…              23.8                  0             28.9              0  
## 2 Baldwi…              26.0                  0             32.9             45.0
## 3 Barbou…              37.1                  0             38.0             56.6
## 4 Bibb C…              37.8                  0             24.2              0  
## 5 Blount…              31.2                  0              0               30.2
## 6 Bulloc…              45.5                  0             47.3             63.3
## # … with 8 more variables: Teen.births..White. <dbl>,
## #   Excessive.drinking.raw.value <dbl>, Unemployment.raw.value <dbl>,
## #   Children.in.poverty.raw.value <dbl>,
## #   Children.in.single.parent.households.raw.value <dbl>,
## #   Poor.mental.health.days.raw.value <dbl>, Adult.smoking.raw.value <dbl>,
## #   Total <dbl>
## # A tibble: 3,142 x 13
##    Name   Teen.births.raw.… Teen.births..Asia… Teen.births..Bl… Teen.births..Hi…
##    <chr>              <dbl>              <dbl>            <dbl>            <dbl>
##  1 Autau…              23.8                  0             28.9              0  
##  2 Baldw…              26.0                  0             32.9             45.0
##  3 Barbo…              37.1                  0             38.0             56.6
##  4 Bibb …              37.8                  0             24.2              0  
##  5 Bloun…              31.2                  0              0               30.2
##  6 Bullo…              45.5                  0             47.3             63.3
##  7 Butle…              36.9                  0             37.5              0  
##  8 Calho…              32.3                  0             32.1             35.3
##  9 Chamb…              42.3                  0             44.5              0  
## 10 Chero…              32.7                  0              0                0  
## # … with 3,132 more rows, and 8 more variables: Teen.births..White. <dbl>,
## #   Excessive.drinking.raw.value <dbl>, Unemployment.raw.value <dbl>,
## #   Children.in.poverty.raw.value <dbl>,
## #   Children.in.single.parent.households.raw.value <dbl>,
## #   Poor.mental.health.days.raw.value <dbl>, Adult.smoking.raw.value <dbl>,
## #   Total <dbl>

## Discussion

Predictions of Drug Overdose

Based on the clustering, we discovered that drug overdose deaths tend to occur more in states with a lower unemployment rate, higher college experience, and higher high school completion rate. We also discovered that opioids were the most used substance in drug overdose deaths.

With this information, we want to look further into areas where opioid usage is more common, and see if opioids have different effects on specific racial and age groups.

There were some limitations we encountered when working on this project: there was no drug data in the county health rankings dataset, which made it difficult to examine patterns on such a granular level. Also, due to time contraints, we could not build and tune as many models as we would’ve liked.

Based on the work achieved so far, in the future, we wish to potentially obtain more county-level data, look more into the ooioid model, and experiment with more clustering and model building for each question.

Acknowledgements

We want to thank Dr. Ron Yurko, our professor during this summer, as well as Wanshan Li, YJ Choe, and all the other TAs for supporting us through this journey. We would also like to thank everyone in the Carnegie Mellon Statistics department who supported the SURE 2022 program. Thanks to Danita Kiser and everyone in Optum who advised us, including our guest speakers and our mentors.

References