Short Data Description

For our final project, we are exploring data on Yellowstone National Park from Kaggle. This dataset captures the monthly number of visitors to the park from 12/31/1985 to 11/30/2016. There are 372 rows in this data set, each of which correspond to a month in the observation period. In addition to raw visitation numbers, the dataset also includes detailed statistics on factors that might influence the monthly visitors such as the weather during this time period, as well as some economic indicators such as unemployment, Consumer Price Index, and Consumer Sentiment Index. There are no original categorical variables in this data set, however we are able to create categorical variables through binning the quantitative variables.

Motivation and Research Questions

Over 4 million people visit Yellowstone each year, so the focus of our analysis is to determine which factors may influence the number of people who visit the park and identify trends in visitation over the years. We focused primarily on climate and macroeconomic factors. First we want to determine how they each affect visits independently, how they combine to affect visits collectively, and then assess changes in visitation as a function of time. This allows us to determine how these external factors affect visitation and provides insight into a bigger picture regarding visits to Yellowstone National Park.



1: How Does Weather Impact Visitorship?

With all temperature groups plotted together, it is hard to discern any trend between precipitation and number of recreation visits but, within the facets, we see that the warmer the average temperature, the more people visit Yellowstone. In the “Cold” weather group, number of visits appears to increase as total precipitation increases. In the “Warm” weather group, number of visits appears to slightly decrease as precipitation increases. The “Freezing” weather group does not appear to show any correlation between precipitation and number of visits.


This plot shows the slight negative correlation between increasing amounts of precipitation and the number of park visitors. While the majority of points seem relatively unaffected for both precipitation types, we can see that the points corresponding to the highest visitation numbers across different precipitation amounts show a strong negative correlation with increases in precipitation.



2: How Do Economic Conditions Impact Visitorship?

As can be seen here, the correlation between economic factors and visitorship is weak. Since the estimates are so low in magnitude and the differences are small, we chose unemployment as our economic variable of interest despite the fact that consumer price index has the highest correlation since we think it is more indicative of the economy as a whole and more easily understood by the general public. The relationship between consumer sentiment and unemployment rate is stronger suggesting that they pick up on similar trends in the economy.




3: How Do The Effects of Weather and Economic Conditions on Visitorship Compare?


This boxplot picks up on similar trends as the weather scatterplot but contrasts the impact of weather versus unemployment. As can be seen, there’s a large difference in mean visitorship for differrent temperatures but a smaller difference between mean visitorship grouped by unemployment.


This PCA plot shows how park visitation varies in the same direction as weather conditions, and that this direction is orthogonal to the direction of variance in economic factors. This tells us that variations in temperature and precipitation amounts both have sizable impacts on park visitation, wheras variance in economic conditions is uncorrelated with variance in visitation. These findings support our analysis that weather conditions are the primary drivers of visitation fluctuations, whereas economic factors have negligible impact on visitation.



4: How Have the Impacts of External Factors Changed over Time?

This time series plot shows the number of park visitors over time. We can see a strong seasonality trend, as visitorship in the on-season dominates visitorship in the off-season. This correlates with our earlier findings regarding temperature and visitation since we know visitation is higher in the summer when the mean temperature is higher and lower in the winter when mean temperatures are lower. Next we’ll examine seasonal decompositions (SDCs) with focuses on weather and unemployment rate.


As can be seen in the weather time series plot, visitation follows a seasonal pattern which aligns with our findings that visitation increases in warmer months and decreases in colder months. Unemployment rate, on the other hand, does not appear to have a relationship with the shifts in visitorship over time. Low unemployment is linked with slightly higher visitation as we saw in previous graphs but, unemployment in general does not align with seasonality.



Conclusions

Overall, it seems that weather is the primary factor that affects the number of visitors to Yellowstone. Significantly more people visit the park in the summer months when the weather is warmer than when it is colder. Although an increasing amount of precipitation causes fewer people to visit the park, the effects of rain and snow are dominated by the effect of temperature. We also found that economic indicators are largely uncorrelated with park visitation. Specifically, we examined the relationship between visitation and each of the Consumer Price Index, Consumer Sentiment Index, and the Unemployment rate, and none of these three indicators displayed a significant correlation with the number of visitors to Yellowstone. This conclusion raises the question of if there are any economic factors that affect visitation, beyond just the ones in our dataset. Our economic variables are macro-focused, and it would be interesting to see how these compare with other factors more specific to Yellowstone, such as the hypothetical cost for a family of four to visit the park at any time of year, and how these more specific factors correlate with park visitation.