Analysis
We wanted to explore the difference between Customers and Subscribers to analyze how the COVID19 pandemic affected riders. To do this we chose to look at the number of rides started by each type of customer. For simplicity we consider 2019 to be pre-pandemic, and all of 2020 to be part of the pandemic.
The above graph shows the conditional distribution of the number of rides started by each user type, given the year in which the rides were started. We can see that for both 2019 and 2020, there were many more Subscriber rides than customer rides. The number of Customer rides slightly increased from 2019 to 2020, but there was a large drop in the number of Subscriber rides. This suggests that BlueBikes lost a lot of subscribers during the pandemic, or that subscribers just generally took fewer trips.
We know that fewer rides were taken in 2020 compared to 2019, but we wanted to see if there were any spatial differences in where the rides were taking place across the two years to further analyze our first research question. To do this we examine the number of rides started at each unique docking station.
The above map has a circle centered at each unique bike station that had at least one ride start from it. The size and color of the circle is proportional to the square root of the number of rides started at that station in the given year. In downtown Boston, where the most used bike stations are, we can see that the stations were used a much more frequently in 2019 compared to 2020. We can also see that there are a number of stations present in 2020 but not in 2019, especially in the areas farther from downtown, and that a number of more rural stations actually had more riders in 2020.
Since the start and end of each ride is timestamped, we can do more fine-grained time series analysis than just comparing the years in order to further answer the first research question. To see how the number of rides changed throughout the two years, we can examine the number of rides started on each day.
We have shown the number of rides started each day in black, along with a 14-day rolling average in blue. We can see that the number of daily rides is fairly low in the winter months, and peaks in late summer. The red line denotes the day the Governor of Boston issued a Stay-at-Home order to the city in order to mitigate the spread of COVID-19, and we can see there is a dramatic fall in the number of rides following it. Most of the days with the fewest number of rides occurred in the month following the shutdown. It appears that the number of rides struggled to come back from the dip, as the 2020 summer peak in the rolling average was noticeably less than the one in 2019.
We would like to answer the difference between how customers and subscribers interact with bikes through the use of a density graph of trip duration.
From our data, we see that the distribution of trip duration is fairly different depending on the user type. The trip duration for customers is generally less than 30 minutes, and almost no customers use their bikes for more than an hour. Subscribers also often use their bikes less than 30 minutes, but a significant portion of the distribution is skewed to the right because there are many people who use a bike for longer than 30 minutes. We conducted a KS test, and found that the distributions are significantly different at alpha equal to 0.05. This graph helps reveal that there are significant differences in the trip duration between customers and subscribers.
Our approach to solve the third research question is to analyze consumer behavior by calculating the changes in longitude and latitude of every trip. If the majority of these trips fall within a very small range of longitude or latitude, we would recommend BlueBikes to refrain from setting up stations that are smaller than this range (being too close to each other) since that will increase the operational cost and take up extra public space.
The first histogram above shows that the majority of the change in longitude is within 0.025. We took out extreme values that were probably the results of errors. The second histogram above shows that the majority of the change in latitude is within 0.015. Similarly, we also took out outliers that looked very suspicious.
The ggplot above confirms that most trips fall within these ranges because it is a spherical shape that has a center (even though it is pretty big). This is to make sure that it is not the case when longitude gets smaller, latitude gets too extreme or when latitude gets smaller, longitude gets too extreme.
Therefore it is safe to say that, in most scenarios, the difference in longitude between each station should be around 0.025, and the difference in latitude should be around than 0.015, as these numbers would satisfy most people’s needs while minimizing the amount of stations present, saving more space and decreasing business cost.
Main Conclusions
We have seen that BlueBikes saw a sharp drop in users at the start of the COVID-19 pandemic. The number of rides was not able to completely recover after Boston’s lockdown, although business was pretty strong considering the various COVID surges and new variants. There was a large decrease in Subscribers during the pandemic, but a slight increase in Customers, and we found that Subscribers generally took shorter rides than Customers. We were also able to analyze the distances between start and end stations to provide guidance on the optimal location for new docking stations. In the future we would like to repeat this analysis with 2021 data to see how the BlueBikes business has performed as the pandemic has continued.