Crime and public safety are critical issues facing communities across the United States. By analysing the patterns behind criminal activity, policymakers and community leaders can develop more effective strategies to prevent crime, allocate resources, and promote safer communities.
Especially in a major city like LA, by understanding the geographic, temporal, and demographic patterns of criminal activity, this project aims to uncover insights that could enhance public safety. Proposed solutions might include new policy and resource allocation strategies, some of these strategies and how our research applies to them are discussed in the conclusion section. However, we cannot overstate the importance of equitable law enforcement and the real issues of overpolicing that face some neighborhoods.
LA Crime Data from 2020 to Present: https://data.lacity.org/Public-Safety/Crime-Data-from-2020-to-Present/2nrs-mtv8/about_data
Supplementary dataset: US census data from tidycensus
The LA Crime dataset contains detailed information about criminal incidents, including the type of crime, location, time of occurrence, victim demographics, and premise descriptions etc. Each record comes from original crime reports and includes geographic coordinates. Overall, the dataset provides a comprehensive view of criminal activity across Los Angeles’s police divisions. While some location fields may contain missing data (noted as 0°, 0°), and there might be minor inaccuracies due to the paper-to-digital transcription process, the dataset remains a valuable resource for understanding crime patterns in Los Angeles.
To guide our exploration of this dataset, we focus on three key research questions, all underscored by understanding how these factors work geographically:
To begin, we examined the distribution of crime types across Los Angeles’ LAPD divisions. The histogram below shows the frequency of different crime codes.
We see significant variation in the occurrence of different crimes, with some crimes being so infrequent that they are almost invisible on this scale. There appears to be two main groups of crime counts, both centered around 30,000 occurrences. This threshold seems to mark a key division between the more common crimes and the less frequent ones, with crimes below this threshold being considered rare.
Since the dataset contains too many unique crimes to cover comprehensively, we decided to focus on the top 10 most frequent crimes. These crimes are recorded in two ways: as a description (text) and as a numerical crime code. For clarity in the visualizations of this question, we chose to use the crime codes for the most part. Below is a key that shows the crime types associated with each code.
## # A tibble: 10 × 3
## Crm.Cd Crm.Cd.Desc count
## <int> <chr> <int>
## 1 510 VEHICLE - STOLEN 111116
## 2 624 BATTERY - SIMPLE ASSAULT 74369
## 3 330 BURGLARY FROM VEHICLE 61782
## 4 354 THEFT OF IDENTITY 60764
## 5 740 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS) 59762
## 6 310 BURGLARY 57735
## 7 230 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT 52406
## 8 440 THEFT PLAIN - PETTY ($950 & UNDER) 52136
## 9 626 INTIMATE PARTNER - SIMPLE ASSAULT 46310
## 10 420 THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER) 41014
Below is a visualisation of the distribution of these top ten crimes.
Given these top ten crimes, we wanted to understand how they were distributed spatially.
The choropleth map reveals significant variation in crime rates across Los Angeles’s police divisions. Areas shown in red indicate higher crime concentrations, while green areas represent lower crime rates. One district (central) particularly stands out with notably higher crime rates, despite bordering areas with relatively lower crime incidents. This stark contrast suggests that crime hot spots can exist in close proximity to “safer” areas, highlighting the importance of understanding local factors that might contribute to these patterns.
A corresponding anova test reveals a p value <2*10^16, meaning we can reject the null hypothesis and conclude that at least one area has significantly different levels of crime than the others.
## Df Sum Sq Mean Sq F value Pr(>F)
## AREA.NAME 20 3.704e+07 1851860 81.33 <2e-16 ***
## Residuals 617373 1.406e+10 22769
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We will be referring to these areas sometimes by name and sometimes visually, thus find below a reference table for convenience.
LAPD Division | Number of Crimes |
---|---|
central | 41974 |
77th street | 36852 |
pacific | 34435 |
southwest | 34013 |
southeast | 32034 |
hollywood | 31885 |
olympic | 31547 |
newton | 30751 |
rampart | 29299 |
wilshire | 28034 |
van nuys | 27336 |
west valley | 26739 |
northeast | 26666 |
harbor | 25940 |
mission | 25565 |
devonshire | 25481 |
topanga | 24347 |
hollenbeck | 24155 |
foothill | 21726 |
As stated earlier, crime rates differ across locations. Similarly the types of crimes also vary by area. For instance, more affluent neighborhoods may see crimes like home invasions and burglaries, while less affluent areas might experience higher rates of murder and assault. In this analysis, we won’t specifically analyze whether certain crimes are more prevalent in wealthier neighborhoods. Instead, our focus will be on identifying the locations where specific crimes are most likely to occur within our area of interest.
We will now visualize our data on a map of all LAPD divisions, showcasing 10 separate graphs, each focusing on a different crime type. In the third question of this analysis.
This choropleth map allows for a more detailed exploration of the findings from the previous heat map. It provides a clearer view of the spatial distribution of each crime across different locations. For instance, crimes 230, 310, 420, and 740 appear to be concentrated in the central division of the map, while crimes 330, 354, 510, and 624 seem to be more prevalent in the division at the bottom. We can see certain areas in that are green for every crime. Those areas are the ones you would most likely want to move to while the ones that are always red including the two areas that were described earlier in this paragraph are areas you might want to avoid.
The stacked bar charts below provide a visual representation of the data shown in the table. The first chart displays the total number of crimes in each location, with colors indicating the percentage contribution of each crime type to the overall total. The second chart shows the proportion of total crimes in each location that is attributed to each specific crime type. With this chart, you can get a good understanding of what crimes are more common by location. In the second bar chart, it seems theft related crimes are most common across all locations.
After thoroughly examining the data, we are now able to answer our initial question. Theft appears to be the most prevalent crime across all locations, with 6 out of the 10 most common crimes being theft-related. In every location, theft-related crimes account for more than 50% of the total crimes, without exception. Based on this, we can conclude that, regardless of other factors (such as gender or race), theft is the most likely crime a person could fall victim to in any LAPD jurisdiction. Restating our findings, a random individual who becomes a crime victim in any area under the jurisdiction the LAPD is most likely to experience a theft-related crime. Therefore, if you were to take a casual walk in any of these areas, it would be wise to avoid carrying any object of high value.
## # A tibble: 21 × 2
## AREA.NAME Total_Percentage_For_Theft
## <chr> <dbl>
## 1 77th street 53.9
## 2 central 57.6
## 3 devonshire 68.8
## 4 foothill 62.6
## 5 harbor 59.1
## 6 hollenbeck 60.2
## 7 hollywood 59.9
## 8 mission 62.0
## 9 n hollywood 65.8
## 10 newton 59.8
## # ℹ 11 more rows
Analyzing temporal patterns of crimes by area and time of day throughout the years can be invaluable information in learning the growing trends of criminal behavior along some specific channels. This leads us to our second research question, where the data being examined is the frequency of crime given area and time of day along yearly channels.
The crime distribution graphs from 2020 to 2024 display high levels of criminal activity concentrated in specific Los Angeles neighborhoods, with 77th Street, Central, and Devonshire repeatedly emerging as the top crime hotshots apart from 2020. Over this period, these areas have seen a stead increase in the scale of crime followed by slower but more consistent increase in crime from the drop off in 2021 (most likely due to lack of data as seen in the graph). 77th Street consistently reported the highest incidents, staying as one of the most crime ridden areas throughout the years.
Mid-tier neighborhoods like Hollywood and North Hollywood have also shown notable increases, suggesting a gradual expansion of high-crime zones. In contrast, other areas, such as Wilshire, have only recently begun to experience significant growth in crime levels, signaling potential shifts in criminal activity. This underscores the need for targeted interventions in these areas.
We see significant month-to-month fluctuations in crime levels across various areas. Peaks and valleys are evident throughout the data, highlighting that certain periods of the year are more prone to criminal incidents than others.
Spatially, high-crime areas such as 77th Street and Central consistently report higher crime volumes compared to lower-crime areas like Wilshire or Van Nuys, though recent data suggests a potential narrowing of this disparity, although this is most likely a result of dwindeling data points as the data set approached the current day.
However the apparent shift in crime seen before this effect takes place from February to specially around the summer time may indicate a redistribution of crime across neighborhoods, altering the traditional hotspots of activity. Seasonal trends are also apparent in certain regions, with recurring spikes in crime aligning with specific times of the year, offering potential insights into when intervention strategies may be most effective. Together, these temporal and spatial trends provide a nuanced understanding of Los Angeles’ crime landscape, helping to pinpoint the most dangerous locations and periods for further investigation and action.
To further investigate the timing of criminal events, we created a series of mosaic plots examining the relationship between crime type and time of day.
We notice that Vehicle theft (code 510) remains concentrated at night, suggesting the need for increased nighttime surveillance. Identity theft (code 354) on the other hand, peaks during business hours, especially in the afternoon, likely due to heightened digital activity, while simple assaults (code 624) are more common in the evening, potentially linked to social interactions or end-of-day tensions.
Over time, crime patterns have shifted from a more even distribution in 2020 to more concentrated peaks by 2022-2023, allowing for more strategic law enforcement deployment. While property crimes maintain consistent patterns, the growing predictability of other crimes highlights the potential for time-specific prevention strategies, such as targeted public awareness campaigns and security measures during high-risk periods.
Understanding victimization patterns across Los Angeles is crucial for developing targeted crime prevention strategies and public safety initiatives. This leads us to our third research question, where we examine, age sex and descent and how they occur spatially.
The bar chart comparing victim sex across different police divisions reveals several key patterns:
1. Most divisions show a consistently higher proportion of male victims
2. The gender distribution varies across different regions but not greatly (all between 40% - 60%)
3. Some divisions show a more balanced distribution between male and female victims
This suggests that victimization risk isn’t heavily skewed by gender in any particular area. The consistency of this pattern across divisions suggests that gender-based crime prevention strategies may need to be similarly balanced across the city
We can also explore the age of victims by area.
The age heatmap provides additional insights:
The heatmap pattern suggests that while working-age adults are most frequently victimized, there are specific areas where youth and elderly protection should be prioritized.
We can then explore the demographics distribution. In particular, leveraging a standardized unit known as victimization rate per 100,000 residents. In order to normalize for different demographic counts, we first get census data.
This normalization is crucial because it accounts for the different population sizes of racial groups across areas. Without this normalization, raw victim counts could be misleading since areas with larger populations of a particular group would naturally have more victims from that group.
We can conduct an anova test to see if these differences in the victimization rates across areas are statistically significant.
## [1] "ANOVA results:"
## Df Sum Sq Mean Sq F value Pr(>F)
## Descent 3 3.974e+15 1.325e+15 6.585 0.000538 ***
## Residuals 72 1.448e+16 2.012e+14
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With our p value less than 0.05 (0.000538) we can reject the null hypothesis and conclude that there are indeed statistically significant differences in victimization rates across these demographics.
We notice that Hispanic residents face the highest median victimization rates at around 1,200 per 100,000 residents, while Black residents show the second-highest rates at approximately 800 per 100,000. White and Asian residents show notably lower rates at around 400 and 200 per 100,000 respectively
Additionally, the faceted plots show that victimization rates vary significantly by area for all demographic groups. Some areas consistently yield higher rates across multiple demographic groups, suggesting area-specific risk factors.
It is noteworthy how the variation in rates is particularly pronounced for Black and Hispanic residents, keeping in mind that we’ve controlled for population sizes.
Throughout this analysis we’ve unconvered several key insights that can inform public safety strategies and community engagement efforts. Importantly:
Crime patterns exhibit significant spatial variation, with certain LAPD divisions consistently experiencing higher concentrations of criminal activity across multiple offense types.
Temporal analysis reveals distinct patterns in the timing of different crimes, highlighting the need for tailored prevention strategies that address the unique risk factors and prevention opportunities associated with specific offense types and time periods.
Demographic analysis shows stark disparities in victimization rates, with certain residents facing disproportionately higher risks of becoming crime victims. Addressing these inequities will require culturally competent, community-based approaches that acknowledge and address the systemic factors contributing to these disparities.
Further analysis could examine the intersection of demographic factors (e.g., age and race combined), investigating the relationship between victimization rates and socioeconomic factors. Studying the effectiveness of existing demographic-specific crime prevention programs
Additionally, our dataset is restricted simply to LAPD divsions, further recommendations could take into account other police precints/neighborhoods within LA and provide even more detailed analyses/recommendations.
Overall, by leveraging some of these insights, we can work together to develop strategies that address crime and promote safer communities in Los Angeles.