The city of Los Angeles (L.A.) is well-known for being a highly-populated and popular tourist destination. However, given the sheer number of people within the area, one might expect there to be a wide variety of crimes that occur in the city for motivations such as stealing money from rich tourists or residents or due to potential trafficking. For this analysis, we focus specifically on homicides, which are regarded as some of the most severe crimes due to both their impacts and their punishments for those who commit them. Furthermore, one might hypothesize that there are differences in the demographics of homicide victims, especially if people are targeted for a specific reason, such as being involved in a gang or perhaps being an affluent resident of the city. Using data provided by the Los Angeles Police Department (LAPD), we will explore a series of questions to learn more about the homicides that occur in L.A., including information such as the victim’s age, sex, and race. Another important variable is the case status, which can help us determine both the type of arrest made or, if there is none, if the case is still in progress. Finally, one might consider the time of day and date that a crime occured, along with potential interactions between victim demographics and these variables. On the City of L.A. official website is a dataset of crimes that have taken place in the city from the years 2020 through 2024. Specifically, we will be using a version of the data that was downloaded on November 20th, 2024. Before filtering or cleaning the data, there are a total of 984,045 rows and 28 columns. In particular, each row represents a single crime, and the columns represent the aspects included in a crime report. For this analysis, we will focus on the following variables:
Crm.Cd
which was used to filter for homicides
specifically (note that codes 110 and 113 are used for homicides).Vict.Age
which has been converted to a quantitative
value and indicates how old a victim was in years.Vict.Sex
which is a categorical variable indicating “M”
for male and “F” for female victims (note that “X” for unknown gender is
included in the original dataset but has been excluded here since it was
not found to be prevalent in homicides).Vict.Descent
which is a categorical variable that
indicates race (note that several of the categories have been combined
from the original dataset).Status
which is a categorical variable that corresponds
to the progress that has been made on the case, if any.TIME.OCC
which represents the time (as text data
converted to numeric in our analysis) that the homicide occurred.DATE.OCC
which represents the data (as text data
processed as a Date in our analysis) that a homicide occurred.Weapon.Desc
which is a categorical variable describing
what weapon was used to commit a homicide.Premis.Desc
which is a categorical variable that
indicates where a homicide took place. (e.g., on a street, inside of a
home, etc.).After filtering and cleaning the data, there are a total of 1,513 rows, representing 1,513 different valid homicides to explore.
Using the data available, we will look at the following aspects and potential relationships of homicides committed in L.A. from 2020-2024:
To get a breakdown on the age distribution within our data set for homicides committed in LA we created a density plot of ages. We find a peak in instances of homicide within people falling within their mid-20s, indicating this group has the highest density of victims. The plot appears to be slightly skewed right – indicating a gradual decrease with fewer victims in older age groups.
Using the bar chart for victim gender below, we can clearly observe that men represent the majority of homicides in this dataset, making up over 1,000 of the homicides. This could suggest that men experience homicides more often than women. To formally test this, we decided to perform a chi-square test on the victim gender variable, which can be seen below.
Using the chi-square test, the null hypothesis would be that homicide rates are the same regardless of gender. However, looking at this output, we can see that the p-value is sufficiently small such that, at the alpha = 0.05 significance level, we reject the null hypothesis. This means that, indeed, homicide victims in L.A. do have statistically different values based on gender, suggesting that men do indeed experience homicides at a different rate from women in L.A.
##
## Chi-squared test for given probabilities
##
## data: table(filtered_data$`Vict Sex`)
## X-squared = 815.81, df = 1, p-value < 2.2e-16
To begin analyzing this research question, we wanted to first capture the overall breakdown of homicides by gender, and then add in the crime status code as another potential variable of interest. This is because we wanted to learn if there are any differences between who is arrested for a crime, if anyone, based on the victim’s gender. As a first pass of the data, we created a stacked bar chart by gender and crime status code, shown below
Within this plot, we can again see that, overall, men are more often the victims of homicides than women are in Los Angeles. For women specifically, we observe that the most common outcome of a case is “AA”, which corresponds to an adult being arrested for the crime. The next most common status code is “IC” which is “Investigation Continuing”, and indicates that no perpetrator has been arrested or otherwise found. Moving on to male homicide victims, we observe that again the most common status code is “AA”, so an adult is also typically arrested when a male is a victim of homicide. The second most common status code is again “IC”, showing that men killed in homicides in L.A. still have a large number of cases still being investigated. Based on this stacked bar chart alone, we cannot necessarily tell if homicide cases whose victims are of different genders also have different status codes. In order to understand the research question more in depth, we decided to create a mosaic plot and run chi-squared tests to help us determine if these variables are related or not. The mosaic plot represents information between the sex of the victim and status of the crime committed. Each of the sections in the mosaic plot represent information about a particular group of victims based on sex and the status of crime committed.
##
## Pearson's Chi-squared test
##
## data: table(filtered_data$Status, filtered_data$`Vict Sex`)
## X-squared = 12.467, df = 3, p-value = 0.005943
Using the mosaic plot above, we see that the standardized residuals for “AO”, or “Adult Other” status of crimes committed regarding female victims is the only significant Pearson residual and it is higher than expected since it is colored in blue. This indicates that female victims seem to experience an above average number of cases where there is an adult perpetrator who is not arrested, but experiences some other outcome. Another observation about the graph is that it appears that the “AA” status of crimes is most popular amongst each victim sex category. Therefore, it does seem that there is at least one combination for which the Pearson residuals are significant, but this alone does not tell us if the variables are truly related. As such, we conducted a chi-square test to determine whether or not the gender and status code are independent.
Based on the chi-square output above, the p-value is nearly zero, which indicates that we can reject the null hypothesis that the status of the crime committed is independent of the category of sex for victims. Therefore, using this information, we can formally conclude, at the alpha = 0.05 level, that the variables of crime status and gender are related. In context, this tells us that the status of a crime and the gender of the victim are not independent, and, based on what we saw in the mosaic plot, that there are specific combinations of each variable for which the proportions are significantly different than expected.
To help answer this research question, we wanted to analyze the L.A. homicide trends over time broken down by victim gender and race. To do so, we created two time series plots using a monthly moving average of the homicides. This allows us to both get a better grasp of the monthly patterns in homicide counts in L.A. and provide us with an insight into any potential differences based on the gender of the victim.
From an immediate observation, we find that men are generally more often victims of homicide. The homicide rates for men are consistently higher than the homicide rates for females throughout our observed time between 2020 and 2024. There is a noticeable peak in homicides around the middle of 2020 (most likely July) followed by some declines. The homicide rates for females appear to be consistently lower than men. The line appears to be more stable than the men’s homicide line, with fewer fluctuations and spikes which suggests a more consistent and lower rate of homicide. Our male trends show a potential seasonal variation, as we tend to see spikes around the middle of the year (except for 2021 and 2024, which has less data than the other years in this dataset). This indicates possible factors that contribute to homicide rates, while females show less of this trend indicating less influence by these same periodic events or conditions. The conclusion we can make for this plot is that male victims generally dominate homicide rates in both variability and frequency.
The second plot is intended to analyze the trends of L.A. homicide rates by victim race. Our peak race in homicide rates are Black victims, followed by Hispanic and White victims, respectively. The homicide rates for Black people in our plot experiences a peak towards the end of 2020 and the same trend is seen for Hispanics. One might hypothesize that, given the Black Lives Matter movements were prevalent during this time, that these could have either been incidents that fueled the movement or perhaps were crimes committed in counter-protests. After 2021, we see that rates of homicide lower for all groups. Furthermore, given that the monthly average lines for White and Other victims start much later, that there might not be as many homicides that occur with victims in these two racial categories. One might suggest that, looking at the moving average lines for Black homicide victims, that there could be some potential seasonality for Black victims, as we seem to see spikes in the latter half of a year (except for 2021 and 2024 which, again, we have less data for). Overall, we can see from this plot that Black and Hispanic groups are disproportionately affected by homicides compared to other racial groups.
We used a density plot to reflect the victims of homicide across different ages by year and grouped by time of day—afternoon, evening, late night, morning, and night. We decided to create the groups based on human activity, especially differentiating between evening, night, and late night, to reflect the change from socially active settings to potentially quieter and private environments with less visibility and law enforcement presence.
The plot reflects that for all time of day groups, the maximum peak for victim age is around 25 to 32 years old. This underlines that the age around late twenties to early thirties experiences the highest density of homicides across all times of day, but especially during late night with the highest overall density. While each curve appears to be unimodal and slightly right-skewed, there still appear to be some differences in the spread of age distributions across time bins. Afternoon and morning reflect a broader spread with more representation of middle-aged and older victims, and late night and night appear with a narrower spread with a higher concentration of relatively younger victims. Overall, this graph highlights the distribution age demographics of homicide victims across different times of day, revealing potentially external factors related to social or environmental factors or temporal risks that could influence changes in homicide occurrences.
Additionally, to learn more about the most common types of weapons
used and locations of homicides in L.A., we used the descriptions for
both of these variables, provided in the dataset as
Weapon.Desc
and Premis.Desc
respectively,
broken down by gender. This is useful to gain more knowledge about the
characteristics of homicides and how they differ by gender, and could
provide residents or tourists with areas to be wary of or weapons to
watch out for. Therefore, using a text analysis of these variables of
interest, we produced four word clouds using the frequency that a word
appears to determine which weapons and locations were the most prevalent
for L.A. homicides separated by gender.
Looking at the plot above, we can see that the most common weapon in both genders is a handgun (separated here into “hand” and “gun” based on the dataset separating the word in this manner) based on the size of these two words in the word cloud. Next we see the words “unknown” and “firearm” in the dataset, which actually appear together, but are separated in this graph due to being separated by a space in the provided dataset. Following this, we can see the next most common words for men are “semiautomatic”, and “knife”. As such, it would appear that, generally speaking, some form of close ranfe weapons are used in homicides against men in Los Angeles. However, looking at the female common words, such as “hammer”, and “instrument”, we observe that some homicides require the use of further proximity, or broad weapons. Therefore, we observe that it is the most common for a firearm to be used in a homicide, but we find that for homicides committed against women more proximity weapons are used, while for men close range weapons are more dominant.
Moving onto the text analysis of locations, we can see that the most common location for a homicide to occur for both genders in L.A. is on a “street”, as this is the largest word in the word cloud. Next, we can see that the words “sidewalk” and “dwelling” are the next most prominent, which makes sense because a sidewalk is located on a street, which could further connect to a dwelling. Within males we see that “parking lot” and “alley” appear more common, however within females we see “apartments”, “duplex” or “multiunit” areas. This suggests that homicides committed against males tend to occur in semi-public areas, while homicides committed against women generally occur in residential settings indicating a domestic nature to their homicides. In summary, we can see that most homicides happen to men in semi-open locations, such as on streets or sidewalks, with other more public and enclosed areas being less common.
In summary, our analysis set out to characterize key aspects of homicide cases in Los Angeles, as recorded by the LAPD. These aspects included information about homicide victims, the time and date of homicide occurrences, the status of each homicide case, the weapons used to commit homicides, and the types of physical locations where homicides occur. Specifically, we looked into four research questions using these variables to grasp any relationships or patterns between them. However, before starting that, we looked at the broad underlying patterns in specific variables within the dataset, such as victim gender, victim age, and the number of homicides by the date in which they occurred. In conducting an EDA, we found in a density plot for homicide victim age, the distribution is unimodal and slightly right-skewed, and the peak is found at around late twenties. Hence, it implies that certain age groups are disproportionately impacted by homicides, possibly due to external factors. We also made a barplot of gender, which shows there is a much higher count of male homicide victims than female homicide victims. Therefore, in our following multivariate graphs, we take into account the count disparity between genders when analyzing trends.
For our first research question, we explored whether or not there were differences in crime status based on the victim’s gender. This question was of interest because one might believe that there are differences in the perpetrator based on the victim’s gender, assuming that they have been caught at all. Based on our stacked bar plot, we did not find any clear indications of differences, but did observe that female victims had a significantly higher expected count of cases with an adult perpetrator that was not arrested but was known. Furthermore, using a chi-square test of independence, we showed that these variables indeed were related, as we rejected the null hypothesis that they were independent. Thus, we gathered that there are differences in proportions based on victim gender and crime status codes, and that these crimes potentially have different outcomes when considering the gender of the victim.
Moving on to our second research question, we investigated differences in L.A. homicides over time by victim gender and race using monthly moving average plots to see if there were differences in patterns over time based on these variables. For the gender plot, we observed that, overall, men were typically the victims of homicides more than women. More specifically, we saw a potential seasonal pattern in the data, as there seem to be consistent spikes in the number of homicides in the latter half of the year, around July. For race, we saw that Black and Hispanic victims were more likely than victims who fell into the White or Other racial categories. Furthermore, we also saw indications of potential seasonality for Black victims, who seem to have spikes in the latter half of the year. After making these two plots, we can observe that, in general, male victims and Black victims have seasonal patterns corresponding to the second half of the year, implying that this could be due to more activity occurring during the summer months when the weather is more conducive to outdoor activities.
Thirdly, we explored the relationship between the age of a homicide victim and the time of day in which the homicide occurred. Using a series of conditional density plots, we found that, no matter the time of day, the data appear to be centered between the ages of 25 and 32, indicating that younger adults to slightly middle-aged adults tend to be the victims of homicides regardless of the time of day. That said, the spread of the data seems to be slightly narrower for the times of day that are later, which makes sense given that it is less likely for older adults to be outside late at night.
Finally, we explored the breakdown of weapons used and locations of homicides by gender. This was hypothesized to be an interesting question because of any potential differences in the methods used in homicide by gender. To capture this, we used a series of four word clouds, which show the weapons and locations of homicides conditioned on the victim’s gender. Using these word clouds, we observed that, while firearms are the most common weapon used regardless of gender, the less common weapons did differ slightly. In particular, “knife” was more common for men and “hammer” was more common for women, perhaps suggesting that women are more often victims of homicide at home where hammers would be more present. Moving to the word clouds for location, we do see that “street” and “sidewalk” were more common for male victims, while “dwelling” was most common for female victims, though these public areas were still prevalent. This also implies that women are more likely to be victims in an enclosed area that they potentially inhabit, while men are more likely to be killed in public. Therefore, we did determine that, while the common weapons are mostly shared across genders, there could be some interesting differences in homicide locations by gender.
The dataset appears to have limitations in consistency and completeness across observations, including inconsistencies in recorded descriptions, which could be due to the human error involved in both writing and transcribing police reports. We focused solely on homicide related crime, as it was challenging to explore and compare different types of as descriptions varied from concise keywords to detailed observations. There also appears to be some missing values, which limited our ability to provide a more comprehensive understanding of broader trends as incomplete observations may lead to underrepresentation of some cases.
For future research, we would like to examine a wider range of crime types in Los Angeles to be able to explore trends across categories. Given that this dataset focuses on a singular city, it would be interesting to expand upon different regions of the United States. This could allow us to make regional comparisons and focus on unique local differences. For example, a potential research question can be expanded to examine whether types of crime and victim age demographics vary across regions, to explore if there are external socio-economic or environmental contributing factors.