The Police Killings Dataset

Description of the Data

The data comes from a database from the Guardian that is also linked with census data from the American Community Survey.

Census data was calculated at the tract level from the 2015 5-year American Community Survey using the tables S0601 (demographics), S1901 (tract-level income and poverty), S1701 (employment and education) and DP03 (county-level income) which can be found on the census website. Census tracts were determined by geocoding addresses to latitude/longitude using the Bing Maps and Google Maps APIs and then overlaying points onto 2014 census tracts.

Our data contains 467 observations with 34 variables, where each observation is a person who was killed by police in 2015. For each observation, we have the name of the victim, their age, gender, and ethnicity, the date of the death, the location of the death, and details about their death. In addition, we also have demographic features calculated from the census for a victim’s particular location, such as income, racial makeup, poverty rate, unemployment rate, and proportion of people who are 25 years and older who have a college degree.

Descriptions of the variables can be found below.

Column Description Source
name Name of deceased Guardian
age Age of deceased Guardian
gender Gender of deceased Guardian
raceethnicity Race/ethnicity of deceased Guardian
month Month of killing Guardian
day Day of incident Guardian
year Year of incident Guardian
streetaddress Address/intersection where incident occurred Guardian
city City where incident occurred Guardian
state State where incident occurred Guardian
latitude Latitude, geocoded from address
longitude Longitude, geocoded from address
state_fp State FIPS code Census
county_fp County FIPS code Census
tract_ce Tract ID code Census
geo_id Combined tract ID code
county_id Combined county ID code
namelsad Tract description Census
lawenforcementagency Agency involved in incident Guardian
cause Cause of death Guardian
armed How/whether deceased was armed Guardian
pop Tract population Census
share_white Share of pop that is non-Hispanic white Census
share_bloack Share of pop that is black (alone, not in combination) Census
share_hispanic Share of pop that is Hispanic/Latino (any race) Census
p_income Tract-level median personal income Census
h_income Tract-level median household income Census
county_income County-level median household income Census
comp_income h_income / county_income Calculated from Census
county_bucket Household income, quintile within county Calculated from Census
nat_bucket Household income, quintile nationally Calculated from Census
pov Tract-level poverty rate (official) Census
urate Tract-level unemployment rate Calculated from Census
college Share of 25+ pop with BA or higher Calculated from Census

Research Questions

The motivation behind our research topic is due to increasing movements against police brutality and a rising sentiment for greater police accountability. The main research questions we want to examine using this dataset relate to determining what factors are common among police killings, and what trends exist in police killings.

More specifically, our research questions include the following:

  • Question: How has the number of police killings changed over time? Is there a different trend if we account for ethnicity?

  • Question: How does the relationship between college education and proportion of black people across different income levels relate to the number of police killings?

  • Question: Is there a higher concentration of police killings in any region of the US??

  • Question: Which state has the most police killings by population? What if we control for if the deceased was armed?

  • Question: What proportion of the police killing victims are male vs female?

  • Question: How do variables like unemployment rate, poverty rate, and median income relate to whether or not the victim was armed and their cause of death?

  • Question: Is there any ethnic group that has a higher proportion of deaths by gunshot compared to the others? Is there any ethnic group that tends to carry more weapons when they are killed?

Exploring the data

Question 1

We wanted to look into how the number of police killings has changed over time. Furthermore, we wanted to see if there is a different trend for different ethnic groups. This suggests that we should explore the number of police killings sorted by date, and plot separate lines for each ethnic group.

Overall, the above graph shows that the trend for the number of police killings is not constant and is not the same across different ethnicities over these six months, since the moving average lines show different patterns. We can see that there is a clear uptick in the number of police killings in March of 2015 for both Black and White Americans, and looking at the below plot, we notice that most of these killings in March happened in California. After some quick research, the Los Angeles Police Department received some scrutiny for having a large number of police-related deaths in March 2015.

Question 2

We are interested in the relationship between the share of the black population and share of population with a college degree for victims in a particular census-tract location controlling for national income quintile.

From this graph, we can tell that most of the police killings occur at lower income quintiles, and that when the proportion of black people is high, then there are generally more police killings. It doesn’t seem like there is a relationship between proportion of college educated people and proportion of black people except perhaps in the 4th quintile of income, where there is a positive association for both ethnic groups.

Question 3

We wanted to understand how police killings are spread across the United States and if there are any particular hotspots where there are a lot of police killings. To do explore this we first used a Loess contour plot over a map of the continental United States.

Looking at the above map we can see that there is the highest concentration around the Oklahoma area. This is interesting due to the fact we used police killing per person in a census tract. This means that there is an abnormally high amount of police killings around Oklahoma for its population. Meanwhile places like California and the East coast have lower per capita killings which indicate that the police killings there are to be expected with the amount of people in the area.

Question 4

We wanted to understand what police killings across the United States look like and if there is a difference in whether the killings by police differ if the suspect is armed when encountering the police.

Looking at the above map we can see that the states with high police killings per person when they are armed are Arizona and Oklahoma and for unarmed it is Nevada and Arizona. This is interesting because this reflects one aspect of our previous plot where we found that Oklahoma had an abnormally high amount of police killings for the population of the census tract the killing was in. Another interesting thing to see here is Arizona has a much higher killings per person than Oklahoma but did not show up on our previous map. This could be because Arizona has larger cities so the tract level data would seem reasonable but the state level data would then be very high for Arizona’s population. We saw that for unarmed suspects, Arizona and Nevada were the two highest which suggest Arizona could just have a lot of police killings in general in its cities or more populous areas.

Question 5

Another question we wanted to answer was if there was an even ratio between male and female victims. To do so, we plotted the number of male victims and female victims in a side by side bar chart below.

Our main takeaway is that there are significantly more police killings for males than there are for females. This plot appears to suggest that police killings are much more common for males rather than females.

Question 6

We wanted to understand if census level demographic features like income, poverty rate, college education proportion are associated with what the victim was armed with and their cause of death. To do this, we decided to construct a PCA biplot graph so that we could identify any clusters and identify which covariates influence the first two principal components. Since many of the census level metrics are highly associated, for example the unemployment rate and poverty rate are very highly correlated, we decided to drop variables with high correlation to make the graph more visually appealing, without affecting the conclusion significantly.

We find that there is no obvious relationship with census-tract demographic features and what the victim was armed with. Interestingly enough, there seems to be a slight association with the share of black population and the victim getting tazed. As for cause of death, it seems that the larger the share of the black population, the more likely they get tazed; however, this association is very weak. We can see that the vast majority of deaths are caushed by gunshot, and it doesn’t seem to be an obvious association.

Question 7

We want to know if any ethnic group is killed by gunshot more than any other group, and if any ethnic group tends to carry more weapons with them. To answer this question, we decided to use a stacked bar chart of proportions where the x-axis is ethnicity and the y-axis is the proportion of what the victim was armed with relative to their racial group. This graph would allow us to easily visualize any different in proportions of armed status relative across difference ethnicities.

We observe that gunshot is the most common cause of death across all racial groups. It is interesting to note that Native Americans and Asians/Pacific Islanders die from taser more than other groups, and Native Americans also tend to die in custody more often. However, we do not know if there is any statistical significance because of the small sample size.

As for what the victims were armed with, we notice that Asians/Pacific Islanders tend to be armed with firearms less often than other racial groups, and actually tend to be more unarmed. The proportion of armed and unarmed victims stays relatively the same across all the other groups.

Conclusions and Future Work

From our analysis, we made a few key conclusions. First, over time the average number of police killings for each ethnic group is independent of each other. Additionally, most police killings occur at lower income quintiles, and higher proportions of Black Americans is associated with more police killings. The proportion of college educated people and proportion of Black Americans does not appear to be correlated. From the contour plot, we saw that there is a high concentration of tract level police killings per capita around the Oklahoma area, meaning that there was a significantly higher number of police killings given the relatively lower population size. Oklahoma and Arizona have higher rates of police killings especially if the suspect is armed, while Nevada and Arizona have higher rates of police killings for unarmed suspects. We also found that there are significantly higher numbers of police killings against male suspects than there are for female suspects, and most police killings are from gunshots, regardless of ethnicity or income level. Lastly, Asians and Pacific Islanders tend not to carry weapons as often as other racial groups when killed by police.

For the additional work that can be done in the future, we recommend finding a dataset that covers police interactions as well as the results of those interactions, and the data should cover a larger span of time. An interesting research question we were unable to answer with our dataset was the association between tract level ethnic demographics on the number of police killings in that census tract. One could also look into if there is a relationship between if the police person’s and suspect’s ethnicity matches, does that result in significantly more or less deaths.