Hate Crimes in the United States

Introduction

Following the election of current U.S. president, Donald Trump, there has been a common claim that prejudice and discrimination on the basis of racism, sexual orientation, sex, gender, and religion have become both more blatant and rampant. We were interested in testing that theory by analyzing the frequency of hate crimes, defined as a “criminal offense against a person or property, motivated in whole or in part by an offender’s bias against a race, religion, disability, sexual orientation, ethnicity, gender, or gender identity” by the U.S. Department of Justice, in the United States. In addition to our interest in the frequency of hate crimes pre and post Trump, we were also interested in the factors that may influence hate crime prevalence.

The questions we aim to answer in this analysis revolve around finding the variables in the data set that are most related to the rate of hate crimes. To begin, we will examine if there is any clustering in the data in general, as this may help to identify important interactions later on. We would also like to examine the dynamics of hate crime rates around the 2016 election by comparing rates before and after. Lastly, we would like to examine both the Gini Index and education by state as possible predictors of hate crime rates.

The Data Set

The Hate Crimes dataset consists of 12 columns and 51 rows. Each observation corresponds to a particular state in the U.S.A. or the District of Columbia. The variables measured correspond to specific demographic statistics for each state. Some of these include the median household income, the unemployment rate, the share of the population with high school degrees, and the share of the population that voted for Trump. There are two variables that measure the average annual number of hate crimes per 100,000 people. One is collected from the FBI for 2010-2015 while the other is collected from the Southern Poverty Law Center between November 9th and 18th in 2016. The rest of the variables are for the years 2009, 2015, or 2016, and are gathered from the Census Bureau and the Kaiser Family Foundation. The majority of these variables are quantitative, though they may be categorized in our analysis when necessary.

Clustering

Are there states that are similar across several quantitative variables?

Principal Component Analysis

Principal Component Analysis

Multidimensional scaling plot colored by the portion of the population with high school degree and shaped by unemployment rate

Multidimensional scaling plot colored by the portion of the population with high school degree and shaped by unemployment rate

We implement several techniques to look for clustering within the dataset. First, we perform principal component analysis on the dataset to reduce the data into two-dimension that account for the majority of the variation of the dataset. An analysis of our principal components can be seen in figure 1. This plot reveals that the first four principal components seem to explain nearly all of the variation in the dataset, as this is where the portion of variation explained appears to “elbow”. The first two principal components account for slightly more than 60% of the variation in the data. In figure 2 we see a biplot of the first two principal components. The plot is colored by the region of the states, with normal distribution ellipses around the regions. The ellipses around the plot heavily overlap, suggesting that the data is not significantly different for different regions. We also note that both hate crime variables are highly correlated with each other, and both are also highly correlated with median household income. The two are negatively correlated with the portion of voters who voted for Trump. The plot also suggests that many of our other predictors are positively correlated, including the Gini index, unemployment rate, and share of the population that is non-white. Next, we implement multidimensional scaling to try and find the best 2-dimensional projection of the continuous variables in the dataset. A plot of the 2 MDS dimensions is shown in figure 3. The data points in this figure are colored by whether the state has greater than the median portion in high school degrees and shaped by whether the state has greater than the median unemployment rate. The plot suggests there are two modes to our data. It also suggests that the two variables “share_unemployed_seasonal” and “share_population_with_high_school_degree” may be associated with these two modes as the mode around coordinates (-2,0) is associated with states with a below-median share of people with high school degrees and above-median unemployment rates while the mode around coordinates (0,0) is associated with states that have above the median share of people with high school degrees and below the median unemployment rate.

Hate Crime rates and the 2016 Election

Is the change in hate crime rates before and after the 2016 election moderated by whether Trump received over 50% of the vote in each state?

MDS Plot by election results

MDS Plot by election results

The MDS Plot by election results shows that there is similarity across other quatitative variables between states where more than half of the people voted for Donald Trump in 2016 and states where less than half of the people did. In the above plot, the blue dots represent the states where more than half of the people voted for Trump, and they appear to be grouped together at lower values of the 2nd MDS coordinate than the red dots. This graph indicates that these groups of states could be similar across any of the quantitative variables included in the analysis, which was all in the dataset except the quantitative measure of the election resutls.

LIMITATIONS: Because the fbi measurment of hate crimes is averaged accross and the splc measurment is over the course of one week, we decided not to try and compare them directly.

Scatter plots of Hate Crimes per 100k (Nov. 2016) vs. Share of Trump Voters

Scatter plots of Hate Crimes per 100k (Nov. 2016) vs. Share of Trump Voters

The above scatter plots both show the relationship between The rate of hate crimes in the week following the 2016 election, as measured by the Southern Poverty Law Center, versus the share of voters that voted for Donald Trump. In the left scatter plot, the points are also colored by the average hate crime rate from 2010 to 2015 as measured by the fbi to show that high rates from 2015-2016 correspond to high rates in the week after the 2016 election, The linear fit lines show that there is a negative relationship between the hate crime rate and the share of trump voters, meaning that in states with a higher proportion of trump votes there was a lower rate of hate crimes in the week following the 2016 election. However, in the density plot on the left it appears that there are three distinct groups in this scatter plot. Within these groups, it appears that there may be a positive association between hate crime rates and Trump votes. Here we may have an example of Simpson’s Paradox, but the confounding variable causing this effect did not appear to be present in the dataset.

LIMITATIONS: The data set is missing a potential confounding variable. For these graphs, we had to remove District of Columbia as an outlier, as it was preventing trends from emerging. Additionally, there are 4 states missing either the fbi measurment, the splc measurment, or both.

The Gini Index as a Predictor

How does the gini index relate to the number of hate crimes? How does it relate to election results?

One variable that our group wanted to investigate was the gini index and it’s relation to hate crimes. The gini index measures distribution of income across income percentiles with a higher rating indicated more inequality. The spatial map displays that there are some geographical patterns to the gini index as we can see most of the mid west has a very low gini index score and the major states on each coast as well as the south have higher ratings.

We were particularly interested to see not only how the gini index related to the hate crime rate, but also how other varibles, such as the percentage of voters that voted for Trump, impacted this relationship. We see from the overall graph that the gini index seems to have a fairly neutral relationship with the rate of hatecrimes. However, it would appear that for states where a majority of voters voted for Trump, the relationship of the gini index and the average hatecrime rate is moderately negative, but in states where a majority of voters did not vote for Trump, the relationship becomes moderately positive.

Education as a Predictor

Is the number of hate crimes per 100,000 in each state correlated with high school diploma rates?

Starting with some basic exploratory data analysis, I created two scatterplots. The first shows the percent of the population with a high school degree per state on the x-axis and the state’s average number of hate crimes from 2010 to 2015 on the y-axis. The second shows the percent of the population with a high school degree per state on the x-axis and the state’s number of hate crimes from Nov.9 to Nov 16th, 2016, the week following Donald Trump’s election, on the y-axis. Overall, from 2010 to 2015, prior to Trump’s election, there does not seem to be a relationship between high school degree rate and average number of hate crimes. And for November 9th to 16th, 2016, there seems to be a very slight positive relationship between high school degree rate and number of hate crimes. After creating a linear model for both time periods vs. the percent of the population with a high school degree, our previous observations were reflected. There is no significant relationship between high school degree rates and average number of hate crimes from 2010 to 2015, but there is a significant relationship between high school degree rates and number of hate crimes from November 9th to 16th, 2016. It is important to note, however, that a linear model is probably not the best fit for this relationship, since the R-square value is less than 0.2. On both plots, the points are colored by the region that the states belong to. There is no clustering in either of our plots dependent on region. However, some interesting things to note are that a majority of southern states tend to have lower hate crime rates, both in 2010 to 2015 and in November, 2016, and North Central states tend to have higher high school degree rates. Once creating a linear model with an interaction between high school degree rates and region, again our observations are reflected. None of our interaction terms (nor the high school degree or region terms alone) are significant.

Now looking at spatial plots of average hate crimes (2010-2015) next high school degree rates by state as well as number of hate crimes (Nov. 9-16th) next to high school degree rates by state, we can visualize our previous findings a little more clearly. There does not seem to be any overarching trend in both our hate crime plots and our high school degree plot, regardless of time period. However, we can again see that southern states tend to have less hate crimes. We can also see that North Central states tend to have higher amounts of the population with a high school degree, while Southern states tend to have lower amounts.

Conclusion

In conclusion, there are a few main findings from our research that we would like to re-iterate.

We were interested in the presence, or lack thereof, of clustering in our data. Using principal components, we determined that the large majority of the variability in our data came from the first four principal components. Our biplot suggests that the data is not significantly different for the different regions.

Next, we found that during 2010-2015 as well as the week after Trump was elected, there was a negative relationship between the share of Trump voters and the rate of hate crimes. This would suggest that as there are more Trump voters in a state, the rate of hate crimes decreases. We also thought that there were some clusters present in the data that suggested that in clusters, we might see the reverse effect, but our data was missing the confounding variable that was potentially causing this.

We also investigated the Gini Index. From a spatial graph, we noted that the Gini Index did seem to be somewhat geographically influenced. When examining the overall data for the relationship between the Gini Index and the number of hate crimes, there was no strong correlation. However, when dividing the states by those that had a majority of voters vote for Trump and those that did not, we found the Gini Index had a negative relationship with hate crimes in states where the majority of voters voted for Trump and a positive relationship in states where a majority of voters didn’t vote for Trump.

Looking at how high school diploma rates interacted with hate crimes, we found that there was no significant relationship during 2010-2015 but there was a very slight positive statically significant relationship between the two variables in the week after Trump was elected. We noticed that the data was potentially clustering by region but further exploration with linear models and interaction terms led us to believe this was not an issue.

We feel as though we have made a great start into determining the cause of hate crimes but there is more work that can be done. One area that we would like to see further exploration in is the collection of data surrounding hate crimes. In many of the topics that we studied; we saw some evidence of clustering. Though we ruled out a region-based clustering, perhaps there is another variable that could explain these trends. Having more variables to examine could be one way to help determine the cause of hate crimes. Additionally, we recognize that we could only examine reported hate crimes, and cannot take into account unreported crimes or misclasified crimes. This certainly introduces bias, as it also seems likely that in states with more discrimination, more crimes are misclasified.