Ire Alarape, Eyob Dagnachew, Tchegnon Adjagbodjou

INTRODUCTION

Crime and public safety are critical issues facing communities across the United States. By analysing the patterns behind criminal activity, policymakers and community leaders can develop more effective strategies to prevent crime, allocate resources, and promote safer communities.

MOTIVATION

Especially in a major city like LA, by understanding the geographic, temporal, and demographic patterns of criminal activity, this project aims to uncover insights that could enhance public safety. Proposed solutions might include new policy and resource allocation strategies, some of these strategies and how our research applies to them are discussed in the conclusion section. However, we cannot overstate the importance of equitable law enforcement and the real issues of overpolicing that face some neighborhoods.

DATASETS

LA Crime Data from 2020 to Present: https://data.lacity.org/Public-Safety/Crime-Data-from-2020-to-Present/2nrs-mtv8/about_data

Supplementary dataset: US census data from tidycensus

The LA Crime dataset contains detailed information about criminal incidents, including the type of crime, location, time of occurrence, victim demographics, and premise descriptions etc. Each record comes from original crime reports and includes geographic coordinates. Overall, the dataset provides a comprehensive view of criminal activity across Los Angeles’s police divisions. While some location fields may contain missing data (noted as 0°, 0°), and there might be minor inaccuracies due to the paper-to-digital transcription process, the dataset remains a valuable resource for understanding crime patterns in Los Angeles.

RESEARCH QUESTIONS

To guide our exploration of this dataset, we focus on three key research questions, all underscored by understanding how these factors work geographically:

  1. What are the types of crimes and which are most common by location?
  2. What are the most dangerous times by location?
  3. What are the most vulnerable demographic/people factors by location?

EDA

Distribution of crime type

To begin, we examined the distribution of crime types across Los Angeles’ LAPD divisions. The histogram below shows the frequency of different crime codes.

We see significant variation in the occurrence of different crimes, with some crimes being so infrequent that they are almost invisible on this scale. There appears to be two main groups of crime counts, both centered around 30,000 occurrences. This threshold seems to mark a key division between the more common crimes and the less frequent ones, with crimes below this threshold being considered rare.

Since the dataset contains too many unique crimes to cover comprehensively, we decided to focus on the top 10 most frequent crimes. These crimes are recorded in two ways: as a description (text) and as a numerical crime code. For clarity in the visualizations of this question, we chose to use the crime codes for the most part. Below is a key that shows the crime types associated with each code.

## # A tibble: 10 × 3
##    Crm.Cd Crm.Cd.Desc                                              count
##     <int> <chr>                                                    <int>
##  1    510 VEHICLE - STOLEN                                        111116
##  2    624 BATTERY - SIMPLE ASSAULT                                 74369
##  3    330 BURGLARY FROM VEHICLE                                    61782
##  4    354 THEFT OF IDENTITY                                        60764
##  5    740 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)  59762
##  6    310 BURGLARY                                                 57735
##  7    230 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT           52406
##  8    440 THEFT PLAIN - PETTY ($950 & UNDER)                       52136
##  9    626 INTIMATE PARTNER - SIMPLE ASSAULT                        46310
## 10    420 THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER)          41014

Below is a visualisation of the distribution of these top ten crimes.

Geographic Distribution of Crime

Given these top ten crimes, we wanted to understand how they were distributed spatially.

The choropleth map reveals significant variation in crime rates across Los Angeles’s police divisions. Areas shown in red indicate higher crime concentrations, while green areas represent lower crime rates. One district (central) particularly stands out with notably higher crime rates, despite bordering areas with relatively lower crime incidents. This stark contrast suggests that crime hot spots can exist in close proximity to “safer” areas, highlighting the importance of understanding local factors that might contribute to these patterns.

A corresponding anova test reveals a p value <2*10^16, meaning we can reject the null hypothesis and conclude that at least one area has significantly different levels of crime than the others.

##                 Df    Sum Sq Mean Sq F value Pr(>F)    
## AREA.NAME       20 3.704e+07 1851860   81.33 <2e-16 ***
## Residuals   617373 1.406e+10   22769                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We will be referring to these areas sometimes by name and sometimes visually, thus find below a reference table for convenience.

Crime distribution by area reference table

Reference: Crime Counts by LAPD Division
LAPD Division Number of Crimes
central 41974
77th street 36852
pacific 34435
southwest 34013
southeast 32034
hollywood 31885
olympic 31547
newton 30751
rampart 29299
wilshire 28034
van nuys 27336
west valley 26739
northeast 26666
harbor 25940
mission 25565
devonshire 25481
topanga 24347
hollenbeck 24155
foothill 21726

RESEARCH QUESTION 1

As stated earlier, crime rates differ across locations. Similarly the types of crimes also vary by area. For instance, more affluent neighborhoods may see crimes like home invasions and burglaries, while less affluent areas might experience higher rates of murder and assault. In this analysis, we won’t specifically analyze whether certain crimes are more prevalent in wealthier neighborhoods. Instead, our focus will be on identifying the locations where specific crimes are most likely to occur within our area of interest.

We will now visualize our data on a map of all LAPD divisions, showcasing 10 separate graphs, each focusing on a different crime type. In the third question of this analysis.

This choropleth map allows for a more detailed exploration of the findings from the previous heat map. It provides a clearer view of the spatial distribution of each crime across different locations. For instance, crimes 230, 310, 420, and 740 appear to be concentrated in the central division of the map, while crimes 330, 354, 510, and 624 seem to be more prevalent in the division at the bottom. We can see certain areas in that are green for every crime. Those areas are the ones you would most likely want to move to while the ones that are always red including the two areas that were described earlier in this paragraph are areas you might want to avoid.

Crime by area facetted by crime type

The stacked bar charts below provide a visual representation of the data shown in the table. The first chart displays the total number of crimes in each location, with colors indicating the percentage contribution of each crime type to the overall total. The second chart shows the proportion of total crimes in each location that is attributed to each specific crime type. With this chart, you can get a good understanding of what crimes are more common by location. In the second bar chart, it seems theft related crimes are most common across all locations.

After thoroughly examining the data, we are now able to answer our initial question. Theft appears to be the most prevalent crime across all locations, with 6 out of the 10 most common crimes being theft-related. In every location, theft-related crimes account for more than 50% of the total crimes, without exception. Based on this, we can conclude that, regardless of other factors (such as gender or race), theft is the most likely crime a person could fall victim to in any LAPD jurisdiction. Restating our findings, a random individual who becomes a crime victim in any area under the jurisdiction the LAPD is most likely to experience a theft-related crime. Therefore, if you were to take a casual walk in any of these areas, it would be wise to avoid carrying any object of high value.

## # A tibble: 21 × 2
##    AREA.NAME   Total_Percentage_For_Theft
##    <chr>                            <dbl>
##  1 77th street                       53.9
##  2 central                           57.6
##  3 devonshire                        68.8
##  4 foothill                          62.6
##  5 harbor                            59.1
##  6 hollenbeck                        60.2
##  7 hollywood                         59.9
##  8 mission                           62.0
##  9 n hollywood                       65.8
## 10 newton                            59.8
## # ℹ 11 more rows

RESEARCH QUESTION 2

Analyzing temporal patterns of crimes by area and time of day throughout the years can be invaluable information in learning the growing trends of criminal behavior along some specific channels. This leads us to our second research question, where the data being examined is the frequency of crime given area and time of day along yearly channels.

Cumulative distribution of crime frequency given area per year

The crime distribution graphs from 2020 to 2024 display high levels of criminal activity concentrated in specific Los Angeles neighborhoods, with 77th Street, Central, and Devonshire repeatedly emerging as the top crime hotshots apart from 2020. Over this period, these areas have seen a stead increase in the scale of crime followed by slower but more consistent increase in crime from the drop off in 2021 (most likely due to lack of data as seen in the graph). 77th Street consistently reported the highest incidents, staying as one of the most crime ridden areas throughout the years.

Mid-tier neighborhoods like Hollywood and North Hollywood have also shown notable increases, suggesting a gradual expansion of high-crime zones. In contrast, other areas, such as Wilshire, have only recently begun to experience significant growth in crime levels, signaling potential shifts in criminal activity. This underscores the need for targeted interventions in these areas.

Time Series Distribution of rates of crime given area

We see significant month-to-month fluctuations in crime levels across various areas. Peaks and valleys are evident throughout the data, highlighting that certain periods of the year are more prone to criminal incidents than others.

Spatially, high-crime areas such as 77th Street and Central consistently report higher crime volumes compared to lower-crime areas like Wilshire or Van Nuys, though recent data suggests a potential narrowing of this disparity, although this is most likely a result of dwindeling data points as the data set approached the current day.

However the apparent shift in crime seen before this effect takes place from February to specially around the summer time may indicate a redistribution of crime across neighborhoods, altering the traditional hotspots of activity. Seasonal trends are also apparent in certain regions, with recurring spikes in crime aligning with specific times of the year, offering potential insights into when intervention strategies may be most effective. Together, these temporal and spatial trends provide a nuanced understanding of Los Angeles’ crime landscape, helping to pinpoint the most dangerous locations and periods for further investigation and action.

To further investigate the timing of criminal events, we created a series of mosaic plots examining the relationship between crime type and time of day.

Mosiac plot of expect amount of crime given time of day for each year

We notice that Vehicle theft (code 510) remains concentrated at night, suggesting the need for increased nighttime surveillance. Identity theft (code 354) on the other hand, peaks during business hours, especially in the afternoon, likely due to heightened digital activity, while simple assaults (code 624) are more common in the evening, potentially linked to social interactions or end-of-day tensions.

Over time, crime patterns have shifted from a more even distribution in 2020 to more concentrated peaks by 2022-2023, allowing for more strategic law enforcement deployment. While property crimes maintain consistent patterns, the growing predictability of other crimes highlights the potential for time-specific prevention strategies, such as targeted public awareness campaigns and security measures during high-risk periods.

RESEARCH QUESTION 3

Understanding victimization patterns across Los Angeles is crucial for developing targeted crime prevention strategies and public safety initiatives. This leads us to our third research question, where we examine, age sex and descent and how they occur spatially.

Sex by area stacked bar chart

The bar chart comparing victim sex across different police divisions reveals several key patterns:

1. Most divisions show a consistently higher proportion of male victims

2. The gender distribution varies across different regions but not greatly (all between 40% - 60%)

3. Some divisions show a more balanced distribution between male and female victims

This suggests that victimization risk isn’t heavily skewed by gender in any particular area. The consistency of this pattern across divisions suggests that gender-based crime prevention strategies may need to be similarly balanced across the city

We can also explore the age of victims by area.

Age by area heat map

The age heatmap provides additional insights:

  • The 25-39 age group shows the highest concentration of victims across most divisions.
  • Central and 77th Street divisions show notably high concentrations across multiple age groups.
  • Younger (Under 18) and older (60+) age groups generally show lower victimization rates, though there are some concerning hotspots.

The heatmap pattern suggests that while working-age adults are most frequently victimized, there are specific areas where youth and elderly protection should be prioritized.

Demographic

We can then explore the demographics distribution. In particular, leveraging a standardized unit known as victimization rate per 100,000 residents. In order to normalize for different demographic counts, we first get census data.

This normalization is crucial because it accounts for the different population sizes of racial groups across areas. Without this normalization, raw victim counts could be misleading since areas with larger populations of a particular group would naturally have more victims from that group.

We can conduct an anova test to see if these differences in the victimization rates across areas are statistically significant.

## [1] "ANOVA results:"
##             Df    Sum Sq   Mean Sq F value   Pr(>F)    
## Descent      3 3.974e+15 1.325e+15   6.585 0.000538 ***
## Residuals   72 1.448e+16 2.012e+14                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With our p value less than 0.05 (0.000538) we can reject the null hypothesis and conclude that there are indeed statistically significant differences in victimization rates across these demographics.

We notice that Hispanic residents face the highest median victimization rates at around 1,200 per 100,000 residents, while Black residents show the second-highest rates at approximately 800 per 100,000. White and Asian residents show notably lower rates at around 400 and 200 per 100,000 respectively

Additionally, the faceted plots show that victimization rates vary significantly by area for all demographic groups. Some areas consistently yield higher rates across multiple demographic groups, suggesting area-specific risk factors.

It is noteworthy how the variation in rates is particularly pronounced for Black and Hispanic residents, keeping in mind that we’ve controlled for population sizes.

CONCLUSIONS

Throughout this analysis we’ve unconvered several key insights that can inform public safety strategies and community engagement efforts. Importantly:

Further analysis could examine the intersection of demographic factors (e.g., age and race combined), investigating the relationship between victimization rates and socioeconomic factors. Studying the effectiveness of existing demographic-specific crime prevention programs

Additionally, our dataset is restricted simply to LAPD divsions, further recommendations could take into account other police precints/neighborhoods within LA and provide even more detailed analyses/recommendations.

Overall, by leveraging some of these insights, we can work together to develop strategies that address crime and promote safer communities in Los Angeles.