With the popularity of the bike-sharing industry worldwide brings up the convenience for renting a bike at a particular location and returning it back at another places, the bike-sharing system provides sustainable alternative for short-distance trip, as well as solve the last mile problem, in general. However, does it the whole picture? Do people living in different regions use shared bikes for different purposes? If so, what factors affect their different choices?
To address these questions, our project is interested in exploring the performance of bike-sharing industry in different regions and how external factors, including environmental conditions and cultures, affect people’s behavior of renting bikes.
Specifically, our project digs into these problems by first exploring the difference in different locations and examining the variables that are potentially be influential for people’s choice of using shared bikes. Then, we explore the variables that change as time passed. Finally, we utilize these observations to explore the relationship between the variables that are potentially influential and the total number of bikes in different regions and compare their different performance.
Our datasets are obtained from UCI and contain the information of the hourly and daily count of rental bikes between years 2011 and 2012 he corresponding weather and seasonal information, etc. in D.C. and in Seoul within years 2017 and 2018, respectively.
In order to examine the data from the two original datasets we found, as described above, we combine them together by cleaning the data. We first pick variables that are in alignment in both data sets. The dataset from Seoul contains many other variables like dew point temperature, and functional days, which we will not explore. We change variables like date and wind speed so that they can have the same unit. We also multiply humidity from D.C. by 41 because the original data sets divided the number with 41 to normalize the data. The data from Seoul also record rides count for each hour, instead of each day. So we add up rides in the same day, find the mean of temperature, humidity, and wind speed, to compress Seoul’s data into the same format as D.C.’s data. After our manipulation we obtained the “combined” data set and mainly used it for this project.
In particular, we are interested in exploring the relationships between the total number of rental bikes rides and temperature, holidays, seasons, wind speed, and humidity with respect to different locations. Specifically, we are using the following variables in our project:
cnt
: Numer of rides recorded on a particular date.Date
: month-day-year (m/d/yyyy)Seasons
: Winter, Spring, Summer, FallTemperature
: Temperature in CelsiusWind speed
: Wind speed in m/sHumidity
: Humidity in %Holiday
: Holiday/No holidayLocation
: Seoul/D.C.We attempt to address these three research questions with this data set:
Temperature
, Wind speed
, and
Humidity
change over the year for both locations?We would like to explore how the number of rides changes over time and how the number of rides is influenced by or related to the weather conditions, such as temperature and wind speed. To achieve this, we create time series plots for the Seoul and D.C. subsets of the original shared bike data. For each location, we also create the time series plots for the number of rides, the temperature, and the wind speed across the date the data were collected.
The time series plots are plotted as below:
As we can see from the plot from the top left corner, the general trend of number of rides in Seoul increases from January 2018 to July 2018 and decreases since then until the end of 2018. To see why this trend happens, we also plot the time series plots for temperature and wind speed across date. From the mid-left plot, we can see that the temperature in Seoul increases from January 2018 to August 2018, and decreases since then until the end of the year. Also, from the bottom-left plot, the oscillations of the wind speed in Seoul is large from January 2018 to April 2018, and is smaller from April 2018 to October 2018, and becomes larger since then until the end of year. Through these observations, we can see that as the weather in Seoul gets warmer and the wind speed decreases, people tend to ride more bikes. The number of rides in Seoul reaches a peak in July 2018, as the weather is warm and the wind speed is at its minimum. Then the number of rides decreases as it gets to winter and the wind speed increases.
The same reasoning also applies to Washington D.C. As we can see from the right side of the time series plot, the overall trend of temperature follows the general trend of the number of rides in D.C., with a shape of “M”. This makes sense since people tend to ride more bikes when the weather is warm and vice versa. The general trend of wind speed seems random, but it reveals that the oscillations of wind speed is smaller around July 2011 and July 2012 than other dates. This corresponds to the peak of number of rides in D.C., which makes sense that people tend to ride more bikes when there is smaller wind speed.
Hence, from the time series plots above, we observe that people in Seoul and in D.C. seem to reveal the same riding habits that they tend to ride more bikes as the weather gets warmer and there is smaller wind speed and, conversely, they tend to ride less as it gets colder and the wind speed is large. This addresses an aspect of our second research question.
From our analysis above, we found that people in Seoul and in D.C. exhibit different patterns of use in using the shared bikes system. We found that people in Seoul tend to ride more bikes during non-holidays and people in D.C. tend to ride more during holidays. This might suggests that people in Seoul ride more shared bikes for getting to work whereas people in D.C. ride shared bikes more for leisure purposes.
We also found that weather conditions play significant roles in people’s frequency or rides. For example, no matter in Seoul or in D.C., people tend to ride more bikes as the weather gets warmer and there is less wind speed. We also develop a linear regression model between temperature and the number of rides for Seoul and D.C. which can be readily used in estimations and predictions.
We acknowledge that there is a lot of room for improvement for our research. Since we combine two totally separated datasets for Seoul and D.C. into one merged dataset, the sample sizes for both locations are not approximately equal. The original DC dataset includes two years of data from 2011 to 2012, leading to a larger sample size than Seoul. The Seoul dataset is also collected in 2018, which has a approximate 6 years of gap between the two datasets. However, we still think it is meaningful to compare and contrast the shared bike systems in Seoul and in D.C. and we indeed made some notable observations. In future research, we would like to gather datasets that include more variables, such as feedback from customers, which could be beneficial for text analysis.