About the Dataset

The dataset we are using is a collection of 29 different variables all relating to violence incidents observed in the US. It was compiled by James Ko(jamesko on Github) from the website Gun Violence Archive, and organized by increasing date. The dataset consists of more than 260,000 gun incidents in the US between January 2013 and March 2018.

Individual rows correspond to a single gun violence incident, with the following variables/columns: the incident’s ID on said website, date of occurrence, the US state, the city/county, the address, number of people killed, number of people injured, URL link to the webpage containing the corresponding incident, whether or not the gun used was stolen, the type of gun used, the characteristics of the incident, latitude and longitude, the description of the location, number of guns used, participants’ age and age group, name of the participant, the participant’s relationship to other participants, the participant’s age, age group, name, status(“Arrested”, “Killed”, “Injured”, “Unharmed”), as well as links to new sources covering the incident, and the state House and Senate district.

Introduction

We have chosen to work with this dataset because of the insufficient availability of comprehensive data on gun violence. Our project’s goal is to analyze gun violence incidents and their characteristics in order to raise awareness and forecast future patterns about the prevalence of said incidents in the United States.

Question 1:

We wanted to learn about the geographical distribution of gun incidents, which suggests we should examine longitude, latitude, and state of the observations, as well as the region of each state(West, Northeast, North Central, and South). Questions of concern include: Where do gun violence incidents occur? Are there certain locations/areas where gun violence incidents are more frequent?

Displayed here is a heatmap of all gun violence incidents in the United States. Through this heatmap, the densities of gun incidents can be visualized across spatial geography through the usage of longitude and latitude. Brighter areas of blue indicate a higher density, while darker/less bright indicate a lower density of gun incidents. Looking at the heat map of the gun incidents in the US, it is apparent that gun incidents are more frequent in urban areas/cities. Notable cities include Los Angeles, New Orleans, Chicago, and New York City. As a side note, Alaska and Hawaii were not included in the heatmap, but when the code’s longitude and latitude range is coded to include them, both have levels less than 0.005.

Next is the state-level visualization of gun violence incidents in 2015, per 100,000 people. 2015 was chosen as the year as it was the year for which census data was provided in the usmap package, which was used to code this visualization. Darker red indicates higher number of incidents, while lighter red indicates fewer. At first glance, Alaska stands out as a clear outlier among the states with the highest incident rate. This could be attributed to a number of factors including, but not limited to: low relative population, lax gun policies/oversight, and lack of mental health resources and support. Another fact to consider is that Alaska also has one of the lowest population densities among the states.

To take a closer look at the states’ incident rate at a regional level, a dendrogram was constructed in order to see if regions were similar in terms of incident rate. The states were labeled by color as follows: Green for Northeast, purple for South, orange for North Central, and blue for West. Looking at the dendrogram, it is difficult to make any decisive conclusions about each region, as each cluster seems to be fairly mixed. In order to make informative conclusions regarding regional differences, more in-depth research would be required, such as regional politics, gun policies and regulation, and gun ownership levels across regions. This could be taken further with research centered around how differences in gun policies across states affect gun violence.

Question 2

Are there any significant differences in the number of people killed and injured in gun violence incidents across different state house and senate districts, and do these incidents show any geographical or temporal patterns?

We wanted to learn about the relationship between gun violence incidents and state house and senate districts, as well as any potential geographical and temporal patterns, which suggests we should examine the variables state_house_district, state_senate_district, n_killed, n_injured, city_or_county, state, and date.

First, let’s visualize the number of incidents per state house and senate districts.

This graph shows the average number of people killed and injured per gun violence incident for the top 10 states with the highest average casualties. This tells us the names of the top 10 states where gun violence incidents seem to be the most severe, on average. Illinois, Arizona, Nevada, and Alabama top the list. This is interesting because most of these states except Illinois do not have the highest number of incidents, but only the most severe ones. This brings up the question of how many people are killed or injured per state?

Now, let’s visualize the total number of people killed and injured in gun violence incidents per state.

This graph shows the total number of people killed and injured in gun violence incidents for the top 10 states with the highest casualties. Evidently, the Illinois was impacted the most in terms of causalties due to gun violence. Next, there is California. Texas and Florida have an almost equal number of causalties. Following them, Ohio, Pennsylvanie, North Carolina, New York, Louisiana, and Georgia are almost alike.

When comparing the results of this graph to the previous graph on number of gun incidents in each state (RQ1), we find that the results are very similar. This means that it might be safe to say that the higher the number of incidents, more the number of people killed or injured. However, we must run statistical tests to verify this hypothesis.

To test the claim that a higher number of incidents in a state leads to more people killed or injured, we can use a correlation test such as Pearson’s correlation test. This test will measure the strength and direction of the relationship between two continuous variables.

## 
##  Pearson's product-moment correlation
## 
## data:  incidents_by_state$total_incidents and incidents_by_state$total_casualties
## t = 39.753, df = 49, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9734733 0.9913670
## sample estimates:
##       cor 
## 0.9848484

The correlation is very close to 1. This supports the claim that a higher number of incidents in a state leads to more people killed or injured. The p-value represents the probability of observing a correlation as extreme as the one calculated if there were no correlation in the population. Since the p-value recorded here is < 2.2e-16, we can reject H0 in favor of the alternative hypothesis that there is a strong positive correlation between the number of incidents in a state and the number of people killed or injured.

Our analysis indicates that there is a strong positive correlation between the number of gun violence incidents in a state and the number of people killed or injured. This finding is supported by a Pearson’s correlation coefficient of 0.9848, which is very close to 1, and a p-value less than 2.2e-16, indicating strong evidence against the null hypothesis of no correlation.

Furthermore, our graphs highlight the top 10 states with the highest average number of people killed and injured per gun violence incident and the top 10 states with the highest total casualties. It is important to note that while some states have a higher number of incidents, they may not necessarily have the most severe incidents on average. This information could be valuable for policymakers when deciding where to focus their efforts on gun violence prevention and mitigation strategies.

Question 3

What is the trend of gun violence incidents in terms of frequency throughout the years? How does this relate to the number of people killed at the state level?

To explore this question, we should look into the sum of total incidents per year, number of people killed, and incidents by state.

This graph explores the trend of the number of gun incidents per year. Although there are only 278 recorded incidents in 2013, this number drastically jumps to above 50,000 in 2014. There is an upward trend as the number of gun violence incidents gradually grew each year. The rate of increase seems to be roughly similar throughout 2014 to 2017, with an increase of about 5000 each year. It is likely that this trend would have continued in 2018, but the data collected included only up to March.

The purpose of this graph is to show how the trend of increasing gun violence is represented in states with the most gun violence. This is done by giving displaying the number of people killed in these gun incidents. Although the overall trend for the United States is a gradual increase, it does not seem like that is the case for all states. Most states have some variance, with some earlier years having more deaths than later years. This shows that at the state-level, the overarching trend is not true always evident.

Year Number of Gun Incidents Number of Deaths
2013 278 317
2014 51854 12557
2015 53579 13484
2016 58763 15066
2017 61401 15511
2018 13802 3533

There is a clear trend of an increasing number of gun incidents from the years 2013 to 2017, with 2018 on track to surpass 2017. Exception for a drastic jump from 2013 to 2014, the number of gun incidents seems to roughly increase by about 5000 each year. However, this is not the case at the state-level. By observing the number of deaths in the top ten states with the most gun violence, we can see that the trend is not always increasing the way it is with gun violence.

Our analysis suggests that the the total number of gun incidents as well as gun deaths increases throughout the years. However, although the total from the sum of the states increases, it is not uncommon to see states that have a lower or higher number of gun deaths each year. For example, Georgia has an increase from 2013-2014, decrease from 2014-2015, and another increase from 2015-2016. This shows that the total increase in gun incidents does not always correlate with the total number of gun deaths in each state.

Conclusion and Further Work

From the above visualizations, a number of conclusions can be drawn. First of all, gun incidents are more likely to occur in urban areas/cities than rural/non-urban areas, and gun violence rates are not equal across regions or states. Furthermore, we also found a strong positive correlation between the number of gun violence incidents and the number of people killed or injured. We also observed that gun violence has been on a rise across the years 2013 to 2017. However, this trend is not consistent across all states as some of the states exhibit a variance in the number of people injured or killed per gun incident.

To build upon this research, we can consider exploring other avenues such as: Analysis of variables such as gun ownership rates and gun regulation policies to predict gun violence rates among states. Analyze the role of socio-economic factors like income, education, and employment so that we can identify areas where interventions may be useful for a reduction in gun violence. Determining characteristics of mass shootings and analysis of mass shootings in order to help develop a better understanding of why they occur.