Data Description

Our dataset contains daily ridership data of the Capital Bikeshare system in Washington D.C. of the years 2011 and 2012. There are 731 observations in total, where the data of each day is recorded as a single observation with 14 variables, namely:

  1. dteday: date (format: yyyy-mm-dd)

  2. season: season (1: winter, 2: summer, 3: fall, 4: spring), year (yr, 0: 2011, 1: 2012)

  3. mnth: month (1: January, 2: February, 3: March, 4: April, 5: May, 6: June, 7: July, 8: August, 9: September, 10: October, 11: November, 12: December)

  4. holiday: whether or not the day is a public holiday (holiday, 0: not a holiday, 1: holiday)

  5. weekday: day of week (0: Sunday, 1: Monday, 2: Tuesday, 3: Wednesday, 4: Thursday, 5: Friday, 6: Saturday)

  6. workingday: whether or not the day is a working day, in other words, the day is neither a weekend day nor public holiday (0: not a working day, 1: working day)

  7. weathersit: weather condition (1: clear / partly cloudy, 2: cloudy / misty, 3: light rain / light snow)

  8. temp: normalized temperature in Celsius (calculated as follows: (t-tmin)/(tmax-tmin), where tmin=-8 and tmax=+39)

  9. atemp: normalized feeling temperature in Celsius (calculated as follows: (t-tmin)/(tmax-tmin), where tmin=-16 and tmax=+50)

  10. hum: normalized humidity (units: percentage divided by 100)

  11. windspeed: wind speed (units: miles per hour divided by 67)

  12. casual: daily ridership count of non-registered users

  13. registered: daily ridership count of registered users

  14. cnt: total daily ridership count including both registered and non-registered users

  15. yr: year (0: 2011, 1:2012)

Our dataset was uploaded by Mark Kaghazgarian onto Kaggle, and may be found in the following link: https://www.kaggle.com/marklvl/bike-sharing-dataset.

Research Questions

We would like to explore the customer base by looking at who rents bikes and when bike rentals occur. Using this dataset, we would like to answer three main questions:

Research Question 1: Who are our main customers?

The first question we want to address is who our main customers are by examining ridership and whether users are registered or casual riders.

First, we can see how ridership is distributed in several different ways.

This first graph shows the overall distribution of ridership across both groups throughout every month of the year. We see a relatively normal distribution with a peak in the summer, which would make sense with bikes being used more in the summer. This is not unexpected and was done to see if there would be any huge anomalies with the data, which we do not see.

Next, we will examine how the proportion of registered and non-registered bikes are distributed across each month.

It looks like an overwhelming proportion of rentals are made by those registered instead of casual users, which is not unexpected. It seems that the proportion of casual and registered users changes given the month of the year, with a larger proportion of casual users in non-winter months. This makes sense, since most people wouldn’t go biking for fun in the winter.

To examine the relationship betwewen users and months further, we will do a proportion test between the proportion of registered rides and the months of the year.

## 
##  12-sample test for equality of proportions without continuity
##  correction
## 
## data:  allData$count.x out of allData$total
## X-squared = 41705, df = 11, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
##     prop 1     prop 2     prop 3     prop 4     prop 5     prop 6     prop 7 
## 0.08924429 0.09886225 0.19414643 0.22595078 0.22697672 0.21339023 0.22657618 
##     prop 8     prop 9    prop 10    prop 11    prop 12 
## 0.20512594 0.20325095 0.18538740 0.14363637 0.10279289

This proportion test gave us a p-value of virtually 0, which implies that we can reject the null hypothesis that there is a constant proportion of non-registered riders given the month. This shows that there seems to be a difference in who is riding bikes given what month it is.

Finally, we have a contour plot that is meant to show the groupings between registered and non-registered rides.

As we can see in our graph, we have five different clusters. Based on the clusters, there seems to be a change after one year of the bike riding service coming out which impacted how many rides there would be from the different types of user. It looks like 2012, we had more riders in general, and the proportions stayed the same in terms of rides being from registered versus non-registered users. The two upper clusters seem to just be the same proportion in different years and different counts, and the three bottom clusters seem to be on the same linear line too, with similar proportions, but different total counts. This could be due to the bike program gaining popularity, and further, tells us that the bike riders are using it at similar proportions across the two years and that the bikes are not just being exclusively used by one type of rider after the first year.

Research Question 2: When do people use bike sharing?

Here, we would like to examine when people use bike sharing in order to know when to increase and decrease the number of available bikes. First, we will look at a time series plot of the number of bikes rented per day, per week, per month, and per season.

It appears that bike rentals are higher in non-winter months, peaking in the spring and fall. Overall, there are more rentals in 2012 than there are in 2011. Looking at the seasonal graph, the rise and fall of bike rentals throughout the year is apparent.

Next, we will take a look at whether the workday (non-weekends and non-holidays) affects how many bikes are rented, and take a closer look at how season changes the number of bikes rented.

In the plot above, we can see that the boxplots for winter have a much lower mean number of bike rentals than summer, spring, and fall. There are also more rentals on workdays in the winter than on non-workdays, which makes sense since people are unlikely to want to bike unless they need to commute to work or school. The means for the other seasons are similar, although they seem slightly higher in the fall. In general, there are more rentals on workdays than on non-workdays except in the summer – presumably because of summer vacation.

Overall, bike rentals greatly decrease during winter and increase during the summer, spring, and fall. Bike rentals during workdays are more common than during non-workdays for every season except summer.

Research Question 3: Under what conditions do people use bike sharing?

The final question we are investigating is under what conditions do people use the bike sharing system. By analyzing how weather, windspeed, humidity, and temperature affect the number of users, we can better understand how these variables are associated with bike share users, as well as each other.

Based on this density curve, we can observe the density of the total number of bike rentals by the weather condition. Given that the weather is clear / partly cloudy, the density distribution of total bike rentals is trimodal with the largest mode around 4500 rentals, followed by a peak around 7000 rentals, and then around 2000 rentals.

Given that the weather is cloudy / misty, the density distribution of total bike rentals is bimodal with the largest mode around 4000 rentals, followed by a peak at around 1800 rentals.

Given that the weather is lightly raining / snowing, the density distribution of total bike rentals is right skewed, with 2 modes. The largest peak with the highest relative frequency is around 2000 rentals, followed by a much smaller peak at 4500 rentals.

Clearer weather is associated with higher relative frequency of rentals.

As windspeed increases, both gamma 1 and gamma 2 tend to decrease. Bike rentals in clear weather tend to be associated with higher wind speeds. As humidity increases, both gamma 1 and gamma 2 tend to increase. Bike rentals in cloudy weather and light rain/snow weather tend to be associated with higher humidity. As month increases, both gamma 1 and gamma 2 tend to increase. Bike rentals in clear and cloudy weather tend to be associated with a later month. As normalized temperature increases, gamma 1 increases and gamma 2 decreases. Bike rentals in clear weather tend to be associated with a higher temperature. As count of total rental bikes increases, gamma 1 increases and gamma 2 decreases. Bike rentals in clear weather tend to be associated with a larger count of total rental bikes.

windspeed is negatively correlated with hum, mnth, temp, and cnt. hum is positively correlated with mnth, and approximately uncorrelated with temp and cnt. mnth is positively correlated with temp and cnt. temp is positively correlated with cnt. Based on the length of each of the variable’s lines, they are approximately equal meaning they are about equally related to the principal components.

Overall, the clearer the weather, the more bike rentals. Windspeed and temperature seem to be positively associated with clearer weather, while humidity is associated with more rainy/snowy weather, which makes sense because of precipitation. All of this information can suggest that the ideal condition for bike share users is clear weather with high temperature and windspeed, and low humidity.

Conclusion

Overall, we find higher daily ridership in summer months than in winter months, where the increases and decreases are gradual by month. We also find a larger proportion of ridership by casual users in summer months than in winter months. The overall ridership increased in the year 2012 compared to the year 2011. Whether a day is a working day or not is correlated with ridership. Summer is the only month where the ridership on non-working days exceed the ridership on working days. Weather is another factor that has a detrimental impact on ridership. We find a significantly smaller ridership in days where there is light rain or light snow. However, whether the weather is clear, partly cloudy, cloudy, or misty has a much smaller impact on ridership.