Group Members: Zeke Rong (zer), Tina Luo (tinal), Sohini Gupta (sohinig), Lakshmi Tumati (ltumati)
This project is aimed to determine the change of the shared bike usage in London, and what are the factors that could affect the usage. The data is from kaggle(https://www.kaggle.com/hmavrodiev/london-bike-sharing-dataset), which contains the information on the hourly count for the shared usage from 2015 Jan to 2017 Jan. Along with hourly count, the data also includes the climate information such as temperature, humidity, weather code. This project contains three different research questions in approaching our goal:
Do seasons, different temperatures, holiday, or day of week have a relationship with the number of bikes used?
What correlations exist relating the number of bike rides used per day with the different climate factors?
How does the usage of bikes change in different seasons of a year?
The first research question is concerned with bike usage and the time of the year, so we explored bike usage and time of week, season, and temperature.
We first use a density plot to compare the distribution of the number of bikes used in each season and we also compare across the weekday and weekend.
In the density plot, we note that the distribution of log(counts) is fairly similar across seasons given when it is a weekday. From this, we can infer that there is less variation of bike usage on weekdays and that people mainly use bikes as a necessary mode of transportation, such as to work. Looking specifically at the weekend, we observe a lot more nuances across seasons, with each curve having multiple modes. We see that the Summer and Spring distributions have more cases of higher bike counts, which leads us to believe that bikes are used more frequently in the warmer months on the weekends, such as for leisure.
Next, we use a histogram to see how bike usage differs based on the temperature and whether it is a weekend or not.
We see that the distributions for the weekday are fairly symmetric, with spring and summer having higher overall counts. On the other hand, there is much less usage on the weekends across seasons. The main takeaway of this graph is that there is lower bike usage for colder seasons and bikes tend to be used the most when the temperature is at a moderate level given the season.
After discovering the different effects from seasons and temperature on weekdays and weekend, we want to dive deeper about the weather to understand the relation between use of bike with different weather codes. Therefore, we focused on exploring the relationship between weather code and new bike shares given whether it was a holiday or not. We used a faceted density plot for our analysis.
In interpreting the faceted density plot, we can see that the log number of bike shares for the extreme weather codes (i.e. rain, thunderstorm, and snowfall) are vastly different on holidays, compared to normal working days. It appears that, given extreme weather conditions, riders are more likely to use the bike sharing system on normal working days than on holidays. A potential reason for this could be that riders are not required to go to work or run other errands on holidays, allowing them the choice to limit travel on holidays if there are poor weather conditions. The moderate weather codes, on the other hand, had relatively similar distributions for log number of bike shares on both holidays and non-holidays. This implies that, given pleasant weather conditions, riders are equally likely to participate in the bike sharing system on holidays and normal working days.
From the previous disovery, we realized that the usage of bike has a strong connection with the climate factors. Therefore, the second research question focuses on how climate factors related to the number of bike rides used per day. To approach this focus, we used scatter plots to compare the correlation between the different climate factors and the log(count) of number of bike shares.
Both temperature and humidity have a similar magnitude of correlation, however in a different direction. As temperature increases the number of bike shares seems to increase while the opposite occurs for humidity.
The plots also show the distribution of the season variable, which further conveys that there is a larger log(count) in the warmer seasons. Significant results from this graph are that temperature and humidity have opposite correlations with number of bike shares, although at a similar magnitude. We realized this analysis do have their limitation that people would more likely to ride bikes during day time when it is usually warmer. However, our next research question would help us to elimate this limitation and provide further analysis on temperature and bike usage.
It was shown earlier in the project that the usage of bike would be different throughout various seasons. This draws our attention to understand what is the exact distribution of bike usage in different seasons, to discover that, we decided to draw a time series plot.
The graph presents the overall trend of the number of trips across the two years in the data. Based on the graph, we are able to see that there is more bike usage during the summers and there are less usages in the winter. The highest number of trips usually happened in July and August, where there are two days the usage rises to more than 60000 per day. However, there are less usages in the winter that the usage of bikes would drop to less than 10000 sometimes in January.
To better explore the trend of the shared bike usage, we decided to plot a seasonal decomposition to better understand not only the visible trend, but also the seasonality and noise among the different seasons.
The above plot presents us not only the overserved data and trend, but also on the seasonality along with the noise in the data. The trend is the same as what we discovered before as there is more usage during summers and less usages during winter. For the seasonality, it shows that there is a regular variation in the usage of bike throughout the two years, which have about 50-60 up and downs in one year, which we could believe the up-and-down represent the change of usage throughout the week. As discovered in the project, there seems to be a higher usage in weekdays than weekends, which would be what the seasonality in the data presents. Lastly, the plot also shows us the noise of the data, as there are two occasions that there are extremely more bike usages, which are the summer of 2015, and the holiday season(December) in winter 2016.
From our analysis of the London bike sharing dataset, we found that the time of the year it is, such as the season, temperature, humanity, and day of the week have some form of relationship with the number of bikes used. The number of bike rides also had a different relationship with weather conditions on break days and normal working days, with riders being more sensitive to poor weather conditions on break days. Throughout the year, there is more usage of bikes in summers, especially in July, but there is less usage of bike in winters, at the lowest usage usually in January. For humidity, there is more usage of bikes when it is less humid outside.
The result of this project would be beneficial for shared bike owners, such as government or transportation companies. The information would be helpful to understand the potential change of users in different circumstances. As they could put more bikes into usage or take some bikes back to restore or repair in some different time throughout the year.