Introduction

This report looks into the 2022 public transit data collected by the National Transit Database, providing a detailed look at urbanized area ridership, mode usage, and temporal trends in the United States. The datasets feature a range of visual analytics, including bar graphs, heatmaps, geographic mappings, and time series decompositions. Central to our analysis is the concept of Unlinked Passenger Trips (UPT), which refers to instances where passengers board public transportation vehicles like buses and trains. A UPT is counted each time a passenger boards such a vehicle, regardless of the number of vehicles used to travel from the origin to the destination.

Our research is driven by pivotal questions as a result of the disruptions caused by the global pandemic. We aim to assess the pandemic’s overarching impact on transit ridership and discern which transportation modes have declined to the point of potential obsolescence, and which have surged in popularity. Additionally, we explore the intricate relationship between transit agencies, the types of services they offer (TOS), and the resultant unlinked passenger trips over the observed period. These questions guide our exploration of the data, as we seek to understand the current state and future trajectory of public transit in the post-pandemic landscape.

How has the pandemic impacted tranist riderhsip?

First, we wanted look at how the COVID-19 pandemic impacted public transit ridership in the United States. This requires us to examine the UPT over time. We defined the COVID-19 pandemic as between the declaration of a global pandemic by the World Health Organization (WHO) on 11 March, 2023 and the ending of the US’s Public Health Emergency (PHE) on 11 May, 2023.

In the above graph, we see that prior to the COVID-19 Pandemic, the number of UPT stayed between 10 and 12 billion trips per month. At the onset of the pandemic, this dropped precipitously to low of 3.98 billion in March of 2021. Since then, the number of UPT has been steadily increasing to 7.87 billion in September of 2023. However, this remains fare below ridership before the pandemic.

Next, we uses a seasonal decomposition to determine how much of the drop caused by the COVID-19 pandemic was due to an underlying trend. As expected, the trend shows just a smoothed version of the observed ridership data. However, it is also notable that beginning in 2020 the shape of the seasonal variations in ridership have changed. Future work may seek to explain why this change occurred.

In this graph, we sought to look at how widespread this drop in ridership was. To do so, we took the annual UPT data for 2019 and compared it 2022 in each Urbanized Area (UZA). Only 9 Urbanized Areas had more UPT in 2022 than in 2019 of the 278 studied. The plot shows that the ridership losses (colored in shades of red) dominate public transit systems across the country and across system sizes.

Which mode of transportion is the most popular?

After examining the impact of COVID-19 on UPTs, we can see a large drop in ridership which can have subsequent effects on the usage of various types of transportation. Hence, the next question we would like to answer is which modes of transportation are the most – and also the least – popular. This will help us pinpoint modes of transportation to spend more time, money, and energy on and which ones we can slowly discontinue in order to save on financial and operating costs, especially considering the effect of COVID-19 on ridership.

To begin, we wanted to look at the proportion of transportation modes that were the most and least popular, categorized by whether or not they were active/inactive.

We accomplished this by creating a side by side, stacked and faceted proportions plot as seen above.

When examining the Bus category, we can see that there are similar proportions of active and inactive Monorail and Automated Guideline (MG) and Demand Response (DR) transportation modes, which means that no changes need to be made – there is a good percentage in use and a good percentage out of use. Demand Response does not operate over a fixed route, but rather serves a broad area. The service area for DR is the area encompassing the origin to destination points wherever people can be picked up and dropped off. We see a similar scenario for ferryboats (FB) as well.

When looking at the Other category, we can see that the active section is dominated by Commuter Rail (CR) modes of transportation, and so putting more funds in that sector would be advised as it is the most popular. It might also be wise to expand funding in less represented areas of active modes of transport. When looking at inactivity, we can see there is a split between CR and Over Road (OR), but we will focus on OR because CR is already widely used. OR consists of transportation modes where luggage compartments are under the seating arrangements. Because there is a large proportion of inactive ORs, it would be wise to look into their discontinuation.

Lastly, when looking at the Rail category, we can see that a lot of the same colored bars take up a good proportion in both the active and inactive categories. For example SR or LR. However, we can also see that there are some that only appear in the inactive category or have a large representation in the inactive category. Monorail Automated Guideways (MG) and Automated Guideway (AG) hold high proportions in the inactive category, but smaller proportion in the active category. Hence, it would be wise to consider discontinuing this mode of transportation.

Next, we would like to further this study of popular modes of transport in a more general sense by categorizing the 15+ modes into 3 general modes, which are bus, ferry, rail, and then an other mode. Then we will assign each mode to a type of service, and then from this, we can create a visualization seeing the frequency of each combination.

We can see from the heat map that all modes of taxi (TX) and transmission network (TN) except for buses are 0, which means these don’t exist. This is either due to the fact that the combination is impossible (you can’t have ferry taxis) or simply because they aren’t used. The most used transport forms are buses, and directly operated (DO) types of service and public transport (PT) are the most common. The least common are TN and TX, and so it would be wise to put more resources into PT and Do instead of the other two. PT and DO types of service for ferries, rails, and the other modes of transport are far less common but still prevalent, and so they may be avenues that are worth exploring because they are still used by some percent of the population. The biggest takeaways, however, are that PT and DO are the most popular, and the buses are also more widely used than the rest. Hence, with this information, further research can be done on further courses of action – whether to increase the input into initiatives that fund these transportation methods, or to look into other, less used areas that can prove to be worthwhile expansions.

Next we compared the Vehicle Revenue Miles (VMR), or how many miles public transit vehicles traveled while in service, to the see how that was related to the bus, ferry, and rail modes.

In this plot, the bus has both the highest UPT and VRM while there are low values for both for ferry and other. Interestingly, while rail has almost has much UPT has buses, it has substantially less VRM. To look into this further, we looked at the linear regression between UPT and VRM for both buses and rail vehicles.

## 
## Call:
## lm(formula = UPT ~ VRM, data = data2022 %>% filter(BFT == "Bus"))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -71123818     51621   1288505   1460436 262303762 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.460e+06  1.764e+05  -8.278 2.23e-16 ***
## VRM          2.393e+00  4.008e-02  59.706  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7710000 on 2069 degrees of freedom
## Multiple R-squared:  0.6328, Adjusted R-squared:  0.6326 
## F-statistic:  3565 on 1 and 2069 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = UPT ~ VRM, data = data2022 %>% filter(BFT == "Rail"))
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -342958825    4682232    9475911    9504228  156513666 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -9.504e+06  4.395e+06  -2.162   0.0328 *  
## VRM          4.867e+00  1.282e-01  37.979   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 45380000 on 109 degrees of freedom
## Multiple R-squared:  0.9297, Adjusted R-squared:  0.9291 
## F-statistic:  1442 on 1 and 109 DF,  p-value: < 2.2e-16

In these linear regressions, we see by the R-squared value that for both modes of transporation UPT and VRM are highly correlated. By looking at the coefficient VRM in each regression line, one sees that the slope of the regression line is almost twice as steep for rail than for buses. This shows that increased VRM nets increased UPT on rail systems much faster than for buses. Therefore, this could account for some of the discrepancies between the VRM for the two modes.

Overall, our results are a great starting foundation for further analysis into the popularity of modes of transport. Subsequent analysis should determine the rider’s desired modes of transportation and which ones are liked and which are disliked by the general masses.

What is the relationship between type of service and ridership?

The provided ggplot2 code generates a bar plot illustrating the distribution of ridership status across different types of service. The x-axis represents the various types of service (TOS), while the bars are color-coded based on the transportation mode (3 Mode). Each bar is divided into segments corresponding to different ridership statuses.

Analyzing the graph, it is evident that the distribution of ridership status varies across different types of service. The plot allows us to visually compare the counts of ridership status categories within each type of service. The dodge position of bars makes it easy to distinguish between different transportation modes for a given type of service.

Specifically, DO(Directly Operated) and PT(Purchased Transportation) makes up most of the the operating lines, while TX(taxi) and TN(Transportation Network Company) have relatively few lines. It is also noticeable that DO and PT have all types of transportation while TX and TD only have bus routes. Overall, bus is the dominate transportation across all types of service.

The provided ggplot2 code generates a bar plot depicting the total unlinked passenger trips across various types of service, with bars color-coded based on the transportation mode. The x-axis represents different types of service (TOS), while the y-axis shows the total unlinked passenger trips, presented on a linear scale.

Analyzing the graph reveals insights into the distribution of total unlinked passenger trips across different types of service and transportation modes. The plot provides a visual comparison of the total passenger trips for each category, allowing for an assessment of their relative magnitudes.

Specifically, the graph highlights notable patterns within the data. The majority of total unlinked passenger trips are associated with types of service labeled as DO (Directly Operated)(more than 85%) and PT (Purchased Transportation)(about 10%). These two categories dominate the total passenger trips, suggesting that they play a significant role in overall ridership.

Furthermore, when examining the transportation modes within each type of service, it is observed that DO has a large proportion of unlinked passenger trips made up by rails(about 40%) while total unlinked passenger trips for PT is dominated by bus.

In summary, the graph effectively communicates the distribution of total unlinked passenger trips, highlighting the dominance of DO and PT across different types of service and the prevalence of bus transportation. The visual representation aids in quickly identifying patterns and trends within the data.

Conclusion

The analysis of the 2022 public transit data sheds light on the nuanced aftermath of the pandemic on transportation ridership, revealing trends of recovery and change. However, several questions remain that warrant further investigation to fully comprehend the evolving transit landscape. Future research could explore the long-term behavioral changes in commuters’ preferences, ascertaining whether the shift away from certain modes is permanent or if there will be a resurgence as the pandemic’s impact diminishes. Additionally, an in-depth analysis of the economic implications for transit agencies resulting from these shifts in ridership could provide valuable insights for strategic planning and financial sustainability. Another area of study would be the role of emerging transportation technologies and services, such as ride-sharing and autonomous vehicles, in shaping future ridership patterns. By addressing these questions, subsequent studies can build upon the current findings to offer predictive insights and inform more resilient and adaptive public transportation systems in the United States.

Analysis of Trends in US Public Transit Ridership Data

36-315 Final Project HTML

Charlie Murphy, Robert Dai, Matthew Dai, Steven Liu

2023-12-11

Introduction

How has the pandemic impacted tranist riderhsip?

Which mode of transportion is the most popular?

What is the relationship between type of service and ridership?

Conclusion