Data Description

The data set we will be exploring in this project includes data on roller coasters from all around the world. The data provides information on 17 unique characteristics of each roller coaster of which we will be analyzing 12 main features for the purposes of our research. Notable variables include:

We will be mainly focusing on 5 categorical and 7 quantitative variables.

Main Research Questions

In this paper, we analyze relationships between multiple characteristics of roller coasters to identify which types are most popular and are most frequently made. Some particular topics we research include types of materials the roller coasters use and how that compares to the seating types, the relationship between height and the number of inversions, height ranges versus speed and how that impacts roller coaster rankings, the location/frequency of award-winning roller coasters, and how height, speed, length, and number of inversions of the roller-coasters explain their award records. Using these research questions, we can provide a multi-faceted analysis on various factors that impact popular rides to experience and also manufacture.

Graphical Analysis

Research Question 1

First, we want to learn about the most common height ranges in roller coasters and how that compares to the speed. To delve further into how these factors impact roller coaster popularity, we will compare these ranges with the ranks of said roller coasters. For this question, we will be looking at the variables speed, height, and rank.

A new variable called speedLevel was created to categorize the roller coasters as slow, medium, and fast. Slow roller coaster speeds range from 0 mph to 60 mph, medium roller coaster speeds range from 60 mph to 100 mph, and fast roller coasters are above 100 mph. One outlier was hidden to zoom into the graph and have a clearer view of the distribution. From the graph, we learn that the frequency of shorter roller coasters is much higher than the frequency of taller roller coasters. With regards to speed, we are evidently able to see that as the height of the roller coaster increases, so does speed. Shorter roller coasters are categorized as slower speeds while fast speeds are only seen in taller roller coasters.

In the above graph, we see that the y-axis is flipped to show that higher ranks or better roller coasters are represented by lower numbers. The graph shows that slow roller coasters are not frequently ranked considering only one data point falls into that speed level. However, surprisingly, both medium and fast roller coasters have received high and low ranks with a nearly even spread across all rank values. We also see that an increase in the height of a roller coaster does not necessarily strongly correlate to a better rank while speed and rank seemingly have a strong positive correlation. This can be further tested using Pearson correlation.

Research Question 2

Next, we wanted to examine the types of materials the roller coasters use and how that compares to the seating types. Specifically, we would like to look at the marginal distribution of the material types and the independence of each of the material types based on seat type. We used a mosaic plot to create a visualization of the relationship between them.

According to the mosaic plot, the marginal distribution of the roller coasters show that steel roller coasters are the most common. None of the seat types seem to be equally distributed across all the material types, showing that the seat type is designed for the type of material. According to the Pearson Residual values, there was a significantly high number of roller coasters with the sit down seat type and steel material type and sit down seat type and wooden material type under independence. In addition, we learn that roller coasters have unknown material types with alpine and unknown seat types more than expected under the independence. Since alpine are swivel seat types that are not common, and we don’t know what classifies under unkown, we could assume that these seat types for specific types of roller coasters that are not the common thrill rides.

According to the mosaic plot above, the material-seating pair that most roller coasters had was the steel material type with the sit down seating type, which makes sense as it is the most common. Less frequent seating types like bobsleigh, 4th dimension, stand up, and wing all mostly use the steel material type. The wooden material type is much less frequently used for roller coasters than the unknown types, which could be due to the safety issues that rise from wooden-built coasters. The hybrid material type was only used for the sit down seating type, which is the most common seating type. Meanwhile, the unknown material type was common with many different seating types, but that could be due to the fact that the unknown types could classify in our other three categories for material type.

Research Question 3

After examining the design characteristics of award-winning roller coasters, we are also interested in learning where these award-winners are located.

From the above choropleth map, we can see that the state with the most awards is Pennsylvania (77 awards), followed closely by Ohio (69 awards). The rest of the states all have below 40 awards. However, the states with the most awards tend to be the states with the most award-winning roller coasters as well (not presented here), so we also examine the average number of awards per roller coaster:

From the above map, we can see that North Carolina has the highest number of awards per roller coaster (8), followed by New Jersey (6.33) and Illinois (5.67). Pennsylvania, which has the greatest total number of awards, only has an average number of awards of 3.21.

Research Question 4

In this section, we seek to address the following research question: How does the roller coaster’s speed, height, length, and number of inversions relate to whether the roller coaster has won awards?

We first make a pairwise scatter plot for the quantitative variables, colored by whether the roller coaster has won awards:

We see that speed, height, and length are positively correlated and such correlation is strong among the three variables. Additionally, the number of inversions is also positively correlated with speed, height, and length, though the correlations are rather weak. By reading the diagonal plots, we see that in general, roller coasters with awards tend to have greater speed, height, and length, while the number of inversions of the roller coaster does not seem to be correlated with whether it has award.

To visualize the relationship between award status and the four quantitative variables, we first use PCA to perform dimension reduction, and then make a biplot of the first two principal components, colored by whether the roller coaster has received awards:

We see that roller coasters that received the award tend to have smaller PC1 values, but the award status does not seem to correlate with PC2. From the direction of the arrows, we see that roller coasters with greater length, speed, height, and number of inversions have smaller PC1, and thus are more likely to have awards. This is consistent with our previous observation that these attributes are positively correlated with the likelihood of winning awards.

Main Conclusions and Takeaways

Through multivariate graphical analysis, we were able to identify multiple relationships between popularity and characteristics of roller coasters. In regards to our first research question, we were able to conclude that height and speed have a strong positive correlation. However, while speed may impact the highest rank a roller coaster has received, height seemingly does not. For the next research question, the mosaic plots showed that the type of material does impact seat type. The most popular material-seating pair was the steel material type with the sit down seating type. This follows with what would seemingly manufactured frequently for roller coasters. Next, we were able to explore the locations of the award-winning roller coasters. Within the US, we were able to learn that Pennsylvania and Ohio have the most award-winning roller coasters, but those in North Carolina have the most awards per roller coaster. For our final research question, we were able to learn that the height, speed, and length of the roller coasters is strongly positively correlated. Also, roller coasters with greater speed, height, and length is more likely to receive awards. This is plausible since larger roller coasters are likely to provide better user-experience (e.g., more stimulus with greater speed).