Main Research Questions
In this paper, we analyze relationships between multiple
characteristics of roller coasters to identify which types are most
popular and are most frequently made. Some particular topics we research
include types of materials the roller coasters use and how that compares
to the seating types, the relationship between height and the number of
inversions, height ranges versus speed and how that impacts roller
coaster rankings, the location/frequency of award-winning roller
coasters, and how height, speed, length, and number of inversions of the
roller-coasters explain their award records. Using these research
questions, we can provide a multi-faceted analysis on various factors
that impact popular rides to experience and also manufacture.
Graphical Analysis
Research Question 1
First, we want to learn about the most common height ranges in roller
coasters and how that compares to the speed. To delve further into how
these factors impact roller coaster popularity, we will compare these
ranges with the ranks of said roller coasters. For this question, we
will be looking at the variables speed, height, and rank.

A new variable called speedLevel was created to categorize the roller
coasters as slow, medium, and fast. Slow roller coaster speeds range
from 0 mph to 60 mph, medium roller coaster speeds range from 60 mph to
100 mph, and fast roller coasters are above 100 mph. One outlier was
hidden to zoom into the graph and have a clearer view of the
distribution. From the graph, we learn that the frequency of shorter
roller coasters is much higher than the frequency of taller roller
coasters. With regards to speed, we are evidently able to see that as
the height of the roller coaster increases, so does speed. Shorter
roller coasters are categorized as slower speeds while fast speeds are
only seen in taller roller coasters.

In the above graph, we see that the y-axis is flipped to show that
higher ranks or better roller coasters are represented by lower numbers.
The graph shows that slow roller coasters are not frequently ranked
considering only one data point falls into that speed level. However,
surprisingly, both medium and fast roller coasters have received high
and low ranks with a nearly even spread across all rank values. We also
see that an increase in the height of a roller coaster does not
necessarily strongly correlate to a better rank while speed and rank
seemingly have a strong positive correlation. This can be further tested
using Pearson correlation.
Research Question 2
Next, we wanted to examine the types of materials the roller coasters
use and how that compares to the seating types. Specifically, we would
like to look at the marginal distribution of the material types and the
independence of each of the material types based on seat type. We used a
mosaic plot to create a visualization of the relationship between
them.

According to the mosaic plot, the marginal distribution of the roller
coasters show that steel roller coasters are the most common. None of
the seat types seem to be equally distributed across all the material
types, showing that the seat type is designed for the type of material.
According to the Pearson Residual values, there was a significantly high
number of roller coasters with the sit down seat type and steel material
type and sit down seat type and wooden material type under independence.
In addition, we learn that roller coasters have unknown material types
with alpine and unknown seat types more than expected under the
independence. Since alpine are swivel seat types that are not common,
and we don’t know what classifies under unkown, we could assume that
these seat types for specific types of roller coasters that are not the
common thrill rides.

According to the mosaic plot above, the material-seating pair that
most roller coasters had was the steel material type with the sit down
seating type, which makes sense as it is the most common. Less frequent
seating types like bobsleigh, 4th dimension, stand up, and wing all
mostly use the steel material type. The wooden material type is much
less frequently used for roller coasters than the unknown types, which
could be due to the safety issues that rise from wooden-built coasters.
The hybrid material type was only used for the sit down seating type,
which is the most common seating type. Meanwhile, the unknown material
type was common with many different seating types, but that could be due
to the fact that the unknown types could classify in our other three
categories for material type.
Research Question 3
After examining the design characteristics of award-winning roller
coasters, we are also interested in learning where these award-winners
are located.
From the above choropleth map, we can see that the state with the
most awards is Pennsylvania (77 awards), followed closely by Ohio (69
awards). The rest of the states all have below 40 awards. However, the
states with the most awards tend to be the states with the most
award-winning roller coasters as well (not presented here), so we also
examine the average number of awards per roller coaster:
From the above map, we can see that North Carolina has the highest
number of awards per roller coaster (8), followed by New Jersey (6.33)
and Illinois (5.67). Pennsylvania, which has the greatest total number
of awards, only has an average number of awards of 3.21.
Research Question 4
In this section, we seek to address the following research question:
How does the roller coaster’s speed, height, length, and number of
inversions relate to whether the roller coaster has won awards?
We first make a pairwise scatter plot for the quantitative variables,
colored by whether the roller coaster has won awards:

We see that speed, height, and length are positively correlated and
such correlation is strong among the three variables. Additionally, the
number of inversions is also positively correlated with speed, height,
and length, though the correlations are rather weak. By reading the
diagonal plots, we see that in general, roller coasters with awards tend
to have greater speed, height, and length, while the number of
inversions of the roller coaster does not seem to be correlated with
whether it has award.
To visualize the relationship between award status and the four
quantitative variables, we first use PCA to perform dimension reduction,
and then make a biplot of the first two principal components, colored by
whether the roller coaster has received awards:

We see that roller coasters that received the award tend to have
smaller PC1 values, but the award status does not seem to correlate with
PC2. From the direction of the arrows, we see that roller coasters with
greater length, speed, height, and number of inversions have smaller
PC1, and thus are more likely to have awards. This is consistent with
our previous observation that these attributes are positively correlated
with the likelihood of winning awards.
Main Conclusions and Takeaways
Through multivariate graphical analysis, we were able to identify
multiple relationships between popularity and characteristics of roller
coasters. In regards to our first research question, we were able to
conclude that height and speed have a strong positive correlation.
However, while speed may impact the highest rank a roller coaster has
received, height seemingly does not. For the next research question, the
mosaic plots showed that the type of material does impact seat type. The
most popular material-seating pair was the steel material type with the
sit down seating type. This follows with what would seemingly
manufactured frequently for roller coasters. Next, we were able to
explore the locations of the award-winning roller coasters. Within the
US, we were able to learn that Pennsylvania and Ohio have the most
award-winning roller coasters, but those in North Carolina have the most
awards per roller coaster. For our final research question, we were able
to learn that the height, speed, and length of the roller coasters is
strongly positively correlated. Also, roller coasters with greater
speed, height, and length is more likely to receive awards. This is
plausible since larger roller coasters are likely to provide better
user-experience (e.g., more stimulus with greater speed).