Police violence is an issue that has plagued the United States for hundreds of years, but renewed attention in recent years has brought the issue to the foreground of American politics.
We examine a dataset taken from the Carnegie Mellon data repository, compiled by a Ph.D. student at Carnegie Mellon, Jessica Zhiyu Guo. It lists every known killing involving a police officer in the U.S. between 2013 and 2023, assigning a plethora of sixty-three variables - things like the age, race, gender, and mental health status of the deceased, as well as the state and locality in which the killing took place, the circumstances around the killing, and information on whether the officer involved was ever charged.
We seek to answer three broad questions. Number one, how does race impact police violence? Number two, what impacts whether officers are actually charged with an infraction? And number three, how are things changing over time?
We can first make a bar plot of the distributions of race in terms of police violence to get a general overview of how exactly race impacts police violence.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
##
## Attaching package: 'lubridate'
##
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
##
## Rows: 12334 Columns: 62
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (53): name, gender, race, victim_image, date, street_address, city, stat...
## dbl (9): age, wapo_id, mpv_id, fe_id, tract, hhincome_median_census_tract, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We can see from the bar plot that police violence against White, Black, and Hispanic people appear to be disproportionately higher compared to other races. There could be many reasons for this, but first to make sure that there is an actual difference, we perform a Chi-squared test in order to check for significance.
##
## Chi-squared test for given probabilities
##
## data: race$n
## X-squared = 12266, df = 5, p-value < 2.2e-16
We can see from the Chi-squared test that the p-value is less than 2.2e-16, which is less than the significance level of 0.05. Therefore, we reject the null hypothesis that there are no differences in the expected and observed categorical data. There is convincing evidence that the differences are significant.
This brings up the interesting question of how things have changed in America, with respect to race and police violence, in the past ten years. For that, we’ve broken down police killings by race, and analyzed them over time.
We can see in the plot that police violence against white and black individuals decreased between 2014 and 2022, while killings against Hispanics slightly increase. There’s a very noteworthy rise in the number of killings where the victim’s race was unknown. In addition, there does not appear to be much seasonality in recorded police violence incidents.
Lastly, there’s the question of how race and age affect the type of force police will use, so we created violin plots of cause of death and age, facetted by race. However, this plot uses filtered lists of race and cause of death, for proper visibility.
There are numerous takeaways from the above graph. Firstly, it seems like death by vehicle is the most evenly spread out by age, while death by gunshots, physical restraint, and tasers seemed to be proportionally higher for younger victims. Secondly, Black victims appear to be likelier to die at younger ages, irrespective of cause. And finally, overall densities are much higher for black victims. In black victims specifically, death by beating seems to the proportionally highest outcome for younger people, followed by gunshot.
First, we wanted to see if the victim having signs of mental illness and whether or not the police had a body camera impacted if they were held accountable. We first cleaned the data so that if the police officer had no known charges, were acquitted, had their charges dropped, or were deceased, then they would be considered to have not been charged. Otherwise, they were considered to have been charged and therefore held accountable. In addition, if there was police video or they had a body camera, those would both count as having a body camera. Otherwise, the video came from some other source like a bystander video. We also removed NA and unknown values from the mental illness category, and put drug and alcohol use into the category of having a mental illness.
We made a faceted bar plot to show conditional distributions of whether the officer was charged or not depending on if there was a body camera or if the victim displayed signs of mental illness. We can see that when officers had a body camera, they were more likely to be held accountable, and this is true for both victims with mental illnesses and those without mental illnesses. Furthermore, when victims didn’t have a mental illness, the officer was more likely to be charged than when the victim had a mental illness.
We wanted to see how the threat level of a suspect correlated to whether the officer was charged or not. To do this, we created a mosaic plot displaying different threat levels and their associations with the status of charges brought up against the officer.
## Loading required package: grid
In the mosaic plot, NKC means No Known Charges, C means Charged, A means Attack, B means Brandished Weapon, N means None, STM means Sudden Threatening Motion, U means Unclear/Unknown, and UW means Used Weapon. From the mosaic plot above, we can see the majority of cases result in no known charges, with the majority of those cases seeing an attack threat. Cases involving weapons, sudden threatening motions, or no threat at all seem proportionally more likely to result in a charge. Cases with unknown threats seem less likely to result in a charge.
The high rate of police shootings are a national issue, but the solutions often aren’t national - they’re decisions made at the state and local level. As such, we were curious about how regionality impacted the problem of police shootings, which policymakers can then use to dictate localized solutions.
First, we took a look at a faceted bar chart to explore how the regions of the country impact the distribution of police killings by both race and gender.
As we can see, in every region and across every racial group, the vast majority of those killed are men. We can also see that white and Black people make up the largest proportions of those killed everywhere except the West, where it’s white and Hispanics that are killed the most, likely because of the markedly different demographics in the Southwest.
We also wanted to examine how overall killings are broken down, per capita, by region - and how that’s changed over time. This could help policymakers see what parts of the country are doing well, and which are doing less well - allowing those lagging behind to use as a model those who seem to be on the right track.
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
We see that when the dataset begins, in 2013, the Northeast and the West are doing quite badly, while the South and Midwest are doing markedly better. Despite wide variation over the course of the decade, most regions - with the exception of the Northeast - are largely where they were in 2013. There are some interesting abberations - for instance, a spike around Christmas 2017 in the South, and a huge drop out west around fall 2016.
One interesting time to look at in more detail is 2020, when the Covid pandemic hit and consigned most Americans to their homes for over a year. As such, we considered it interesting to look at this time with a seasonal decomposition. This would help us sort the noise from the signal, to paraphrase Nate Silver, during that period, and see how (or if) the massive shift towards staying home impacted police killings.
We don’t see a very large change in March 2020, but there are some interesting trends. The Northeast and Midwest were the best performers during this period, and experienced very little variation, but the patterns in the south and west were very inconsistent, varying between 1.0 and 2.5 killings per day over the course of the year. But we don’t see much in the way of a Covid dip in police killings. Note that, as we are just comparing trends and not overall numbers, this graph uses raw numbers, not per capita numbers.
So - what have we learned?
First of all, there are absolutely racial differences in the way people are treated. African-Americans make up around a quarter of police killings, despite only making up an eighth of all Americans; Hispanics see a similar disparity, and we have seen this trend remain stubborn over the years, from 2014 to the present day. We’ve also found that the method by which these victims died varies over age.
Second, we’ve learned that the vast majority of officers are not criminally charged after killing a suspect, and that charges are much less likely after an attack. We say this neither as a positive or a negative, simply as a statement of fact; determining whether a shooting was actually justified is beyond the scope of the project. We can also surmise that when an officer has a body camera, charges are far more likely to be filed. This indicates a potential institutional failing in departments without body cameras; after all, there’s nothing likely to be intrinsically more justifiable about police killings in those localities.
And finally, we’ve learned that geography plays a massive role in police killings, with someone in the West about fifty percent more likely to be victimized by police violence than someone in the South or the Midwest. This distinction has held firm over time. And in 2020, when the country’s social fabric were fundamentally different than in years past, the ever-beating drum of police killings refused to yield to the coronavirus.
This is not, of course, the end-all and be-all of research in the domain; there’s plenty that can still be examined when it comes to police killings. It would be interesting to look at data that goes further back, to see what kind of changes have taken place on a generational scale. It would also be interesting to perform principal component analysis on the dataset, or to draw choropleth maps breaking down the data on a state and county level.
We hope that our findings here can help inform public policy towards police violence in some small way, as we strive, as ever, for a more perfect union.