This dataset ((https://github.com/fivethirtyeight/data/blob/master/police-killings/police_killings.csv?plain=1)) comes from a FiveThirtyEight article, and contains incidents in which individuals were shot and killed by law enforcement in the United States. Each entry in the dataset includes detailed information about the individual involved (name, age, gender, race/ethnicity, and the date and location of the incident). Furthermore, additonal details about the incident–the street address, city, state, specific coordinates (latitude and longitude), and the law enforcement agency involved–are provided. For the incidents, the socio-economic and demographic data of the area where the incident occurred, such as population, income levels, and racial composition, along with details about the type of weapon used and the cause of the shooting (e.g., gunshot, Taser) are listed. With this data, we will analyze patterns in police shootings, studying the socio-economic contexts of these incidents, and providing insights into the interactions between law enforcement and the community through our research questions.
To answer this question, we will look at how being armed varies based on age and race.
Looking at the density plot, we can see that the distribution of density of police killings does not really change much from armed to unarmed victims. Both of the density distributions peak at an age around 30, and falter off towards older ages. However, one thing to note is that the distribution for unarmed victims is bimodal. There is another smaller peak in the distribution for unarmed victims around an age of 60. This signifies that there are a higher amount of unarmed victims that are around age 60 compared to armed victims.
We can confirm this relationship by looking at the mosaic plot of the pearson residuals for this data. From the mosaic plot, we can conclude that there is a statistically signifcant increase in victims that are unarmed in the age group 61-70.
Now we will look at the relationship between the race and armed status of the victims. Using a faceted bar graph, we can see that there is an overall larger amount of armed victims being killed, for every race, except possibly asian/pacific islander. However, there are so few victims that are in this ethnic group that this finding is insignificant. From this bar graph, it is easy to tell that there does not seem to be a relationship between race and the armed status of the victim. Although the decrease is significant, there seems to be an equally proportional decrease from armed to unarmed victims regardless of race. Unknown and Hispanic/Latino races may have a slightly larger decrease compared to white and black victims, but this does not seem to be significant. From this graph, it is safe to assume that race does not play a role in the killings of unarmed and armed victims.
To answer this question, we will examine the following quantitative variables:
pov
)urate
)county_income
)college
)We will compare those with this categorical variable:
raceethnicity
)First, we will examine how the 4 quantitative variables correlate with each other in deaths of black individuals:
Here, we can see that poverty rate has a severely positive correlation with unemployment, with relatively large correlations with county income and degree rate. These facts alone do not tell us that much information about the question we want to address; however, we can compare these results for communities of white victims:
The correlations seem to adopt the same pattern in both graphs. One striking difference though is that poverty does not seem to be as strongly correlated with the other variables for white individuals than black individuals. Similar patterns exist for the other variables as well, where the degree of each correlation is much less severe for the white group than the black group.
To better understand how the distribution of poverty differs for all races, we’ve created the following violin and boxplot:
Here, we observe that the median poverty rates for white and Asian/Pacific islander victims are typically much lower than those of Black, Hispanic/Latino, and Native American deaths. One detail to note is that the sample size for Native Americans is notably smaller than the rest of the groups, so results may be skewed.
This question is motivated by the idea that crime is higher areas with larger populations, and areas with crime might have a higher likelihood of police killings.
The map gives us an idea of how the police killings are distributed across the country. Each circle represents one observation, or one instance. We see that many data points are clustered close to metro areas, which have large populations. For example, we see many points in the San Francisco, Los Angeles, New York City, and Baltimore areas, to name a few. However, this does not apply to all the points, so it is hard to make a claim about the relation of population and killings through this graph alone.
The density plot shows us the distributions of the log population in the counties in our police killings dataset, versus the distribution for every county in the country. We are looking at the log population due to the skewed nature of the data, which includes small rural counties and large metropolitan areas. But look somewhat normal, but our subset a much smaller variance in population size, and a median that is clearly smaller. This adds insight to our observations in the graph, and tells us that police killings are not exclusive to highly populated urban areas. We can perform a K-S test to confirm what we see in the visualization, and we get a D-statistic of 0.77292, which correponds to a p-value of less than 2.2e16, which disproves the null that the data series come from the same distribution.
With our investigation of the location, community demographics, and nature of the police killings, we have made the following conclusions:
Our findings are limited in the sense that the data takes place solely within the year of 2015, curbing our ability to generalize the findings for today’s social climate. The data also lacks insights for certain months of the year (July, August, September). This reduces our ability to understand how the data varies throughout the year, since it does not account for Summer months.