Olympic Medals

Marc Edwards, Sean Price, Caleb Yoder

May 6th, 2020

Introduction to the Data

Research Questions

There are several questions we hope to answer with the analysis and visuals we created for our Olympic dataset.

These are the questions that led us to perform the analysis that is shown ahead. By answering these questions, we hope to better understand the less apparent and less known information about an athlete that can have an effect on them winning an Olympic medal or none at all.

Part I

First, we will explore graphs and visualizations that show the quantity of medals won by countries based on a number of oberservations including over time and sex and season. Then using a wordcloud, each season will be analyzed to see which countries find a majority if their success in. These graphs and visualizations will pave the way for further analysis later in the presentation.

Graph #1: Time-series, with Countries’ Medals Over Time, Faceted by ‘Season’

This time series plot depicts the medals won per Olympics of the seven winningest countries. Several trends apparent here are that there are countries such as the United States, Great Britain, and France that have large collections of medals, due to consistent performance over the past 120 years. There are also countries that had short stints of athletic dominance as seen by the Soviet Union. There are also countries such as Canada that have not found much success in the Summer Olympics but have found recent success in the last 40 years in the Winter Olympics.

Graph #2: Stacked Chart of Nations (NOC) and Medals Won

## 
##  Pearson's Chi-squared test
## 
## data:  table(medals_500$NOC, medals_500$Season)
## X-squared = 3849.9, df = 22, p-value < 2.2e-16

Graph #2 (cont.)

This graph visualizes the number of medals won by each Nation (NOC) during the Olympics. A stacked bar chart best shows the number of each type of medals won by each nation and have it facetted by Season and Sex. In the Summer Games, the United States clearly has the most medals of any nation by a large quantity for both men and women. The overall distrubution is relatively the same for both males and females for the Sumemr Games.

As for the Winter Games, there have been a lot less medals awarded but the same can be said in terms of the distribution of the medals by nation for both sexes, for the most part. The United States, Canada, Norway, and Finland appear to have the most medals for the nations in these for both males and females but the Soviet Union has more medals for males and less for females.

Using a chi-square test for independence, we wanted to test the independence of the winning nations to the Season in which the medals were won (ie: Is there an effect of the Season on the nations’ medal-winning performances?). After testing this at a 0.05 signficance level and getting a p-value of less than 2.2e-16, we can conclude that there is a signficant difference in the performances of the nations based on the Season of the Olympic Games. In context, there are some countries that perform better at Winter games than they do in Summer Games and vice versa.

Graph #3: Side by Side Bar Chart for Top 5 Seasonal Sports

This graph shows the top 8 highest medal-producing nations’ performance relative to one another in the top 5 sports of both the summer and winter games. From this graph, we can see that some nations consistently control certain sports. A few examples include the United States’ domination of both Swimming and Track & Field and France’s large lead in the Fencing medal count. By facetting on summer/winter games, it’s easy to see that winter games rarely have a singular dominant nation the way summer sports do. While these sports often have a few leading nations, the success is spread more evenly across all of the strong-performing nations being measured by the graph. The most even sports across the board seem to be rowing and speed skating.

Graph #4: Word Clouds Comparing the Frequency of NOC based on Season and Overall Commonality

Graph #4 (cont.)

The comparison word cloud on the left shows the National Olympic Committee’s 3-letter country code (NOC) with respect to where there is more occurences in either the Summer or Winter Olympic games. The larger text signifies that country is more present in the type of games than the other one. Canada and Finland are the largest for the Winter games while Great Britain is the largest in the Summer. However, One thing we cannot see signified by the comparison word cloud is the total number of times a NOC shows up. The commonality word cloud shows the overall number of times a country is represented by the size of the text. Looking at the United States (usa), it is smaller in the comparison word cloud while being on the Summer Olympics side but the largest on the commonality cloud, showing it is the most awarded NOC in the dataset.

In the end, if audiences want to see which Season of games a nation performs better at, they should look at the comparison cloud and if they want to see overall results, look at the commonality word cloud.

Part II

The second part of the presentation will focus on the athletes and the attributes that belong to each of them. By focusing on their heights, weights, and ages when they won, we can then begin to draw conclusions about these aspects of an athlete and how it can possibly detrermine their success at the Olympics.

Graph #5: Height and Weight of Medal Winning Athletes

Graph #5 (cont.)

## 
## Call:
## lm(formula = G2$Height ~ G2$Weight)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -61.837  -3.682   0.225   4.068  32.256 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.339e+02  2.097e-01   638.6   <2e-16 ***
## G2$Weight   5.937e-01  2.802e-03   211.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.364 on 23938 degrees of freedom
##   (7965 observations deleted due to missingness)
## Multiple R-squared:  0.6523, Adjusted R-squared:  0.6523 
## F-statistic: 4.491e+04 on 1 and 23938 DF,  p-value: < 2.2e-16

When seeing if there is a correlation between the heights and weights of Olympic medal winners there was found to be an overall correlation of 0.6523, a relatively medium strengh \(R^2\) value. This correlation gives insight into how there is a general relationship between height and weight that corresponds to an ideal body type for an athlete that competes in the Olympics. We expected more variance due to the different body builds required for different sports but were surprised to see how the relationship remains relatively proportional over the last 120 years

Graph #6: Histogram of Age at Time of Winning

Graph #6 (cont.)

This histogram depicts the distribution of Olympic medal winners’ ages at their time of winning. Across the entire dataset, it’s clear that Olympic athletes’ prime medal-winning years fall between ages 20 and 30, with a mode of 23 years. Interestingly, there is no obvious interaction between the medal color and an athlete’s age. So - assuming an athlete is receiving a medal at a specific age - it’s equally likely to be any of the three medals. The most notable trend in this graph is the distinct difference between the shapes of the two curves when facetted by Olympic games season. There is an obvious difference in sheer volume of medals won in summer vs. winter, with a significantly larger number of medals won in summer games. While both graphs share a somewhat normal base shape with a right skew, the graph showing ages of summer games medal winners has a sharp peak from ages 21 to 25, with a significant decline in number of medals won for each additional year after age 23. This graph has an overall median of 25 years. This contrasts clearly with the shape of the curve of winter games medal winners, which appears much flatter, with a relatively even distribution of winners from ages 22 to 28. The overall median age of winter games medal winners is 26 years. The distinction between these two curves indicates that age is a better predictor for medal-winning potential in summer events than in winter events.

Conclusions and Main Takeaways

Overall, we found some interesting results and some that were not surprising to us.

We were not surprised to see that the United States dominates both the Summer and Winter Games with both males and females. The margins were different for the seasons, but the same distrubution for sex typically held. The different graphs and visuals we used provided us a new way to see data that we have seen our entire lives. We were also not surpirsed to see that certain nations relied on certain Season of games for their success as we saw in the word cloud. Typically, cold weather nations like Canada, Sweden, Norway, and Finland rely on the Winter games for their success.

One of the interesting results we found was with the Age variable. Compared to our experiences watching American professional sports where the usual ‘prime’ is in the late 20s through early 30s, the prime age of Olympic athletes winning their medals was in the early to mid 20s. Younger competitors often have the most success winning in their respective sports and events. Another interesting result came when looking at the relationship between the height and weight of the athletes at they time they won their medals. Throughout the last 120 years, the relationship of an athlete’s height and weight has stayed relatively even and we found little to no difference in the heights and weights of the type of medal won across male and female.

Possible Future Work:

In the future, something we can try to do would be to include more nations in our analysis. This can allow for more variability and also lead us to different conclusions. Since we only focused on nations with at least 500 medals, another thing to look forward to is an analysis of nations with less than 500 medals. It would be very interesting to see if the same conclusions we reached in this project would hold with those constraints.

.

Thank you for viewing our presentation.

We are happy to answer any questions you may have during the allotted time period.