Introduction

The Super Bowl is a hallmark of American culture. Every year close to 100 million individuals tune in to watch the three hour extravaganza. Although people tune in for the action on the field, it is not the only aspect of the spectacle that garners attention. A lot of viewers are purely interested in what happens in the time when the game is not played, the advertisements. This year commercials reached a new record level of expensive, with a thirty second advertisement costing $7 million on average. We decided to analyze a Superbowl ads dataset, hoping to understand what factor influences advertisement. Using this dataset, we answer three main research questions:

  1. What factors make a commercial popular?
  2. How do the themes change in the commercials over the years?
  3. Do brands stick to the same themes over the years? If not, which themes are becoming more prevalent, and do they make a commercial more popular?

Through these three questions and the dataset, we analyze Superbowl ad data from this century.

Data Description

The data was collected in 2021 for the twenty previous years of Super Bowl commercials. The selection of observations was done based on the ten brands that aired the most spots over the given time period resulting in 233 observations. Researchers watched the advertisements taking note if certain criteria were present including if it was funny or involved danger. In terms of the variables, we cleaned the data to include the year, brand, all seven themes, view count, like count, and dislike count. We added a new variable that groups years based on half decade in order to analyze trends. Half decade was chosen over decade as the data is only from 2000 to 2020.

For several reasons, we narrowed down the 233 observations to 103. First, we removed any observations that had a record on N/A for any criteria or quantitative variable such as like or view count. The second reason we narrowed our data also points to our largest limitation. As with any video posted on YouTube, the view, like, and dislike count can increase every day. Therefore, taking a snapshot of the data from three years ago, does not represent the videos where they are today. Similarly, some videos have exponentially higher counts which can occur within days or hours. Some videos had counts that were unreasonably low which had the potential to skew the data. Therefore, the decision was made to limit our dataset to observations within the range of 1 million and 100,000 for both view counts.

## # A tibble: 5 × 17
##    year brand     funny show_pr…¹ patri…² celeb…³ danger animals use_sex view_…⁴
##   <dbl> <chr>     <int>     <int>   <int>   <int>  <int>   <int>   <int>   <dbl>
## 1  2018 Toyota        0         0       0       0      0       0       0  173929
## 2  2006 Bud Light     1         0       0       0      1       1       0  142310
## 3  2020 Coca-Cola     1         0       0       1      0       1       0  304254
## 4  2010 Hynudai       0         1       1       0      0       0       0   68458
## 5  2007 Budweiser     1         1       0       0      0       1       1  184689
## # … with 7 more variables: like_count <dbl>, dislike_count <dbl>, title <chr>,
## #   numThemes <int>, half_decade <chr>, like_ratio <dbl>, dislike_ratio <dbl>,
## #   and abbreviated variable names ¹​show_product_quickly, ²​patriotic,
## #   ³​celebrity, ⁴​view_count

Question #2: How do the themes change in the commercials over the years?

This figure shows how often each theme was used each year, as a proportion of all of the commercials that came out that year. It’s clear that the most popular themes throughout the 2000s were commercials that were funny and showed the product quickly. Commercials that were patriotic, and used celebrities were generally the least popular. It is worth noting that there were spikes in the proportion of funny commercials that showed the product quickly in 2003 and 2013.

After observing that ‘funny’ and ‘show_product_quickly’ were the most popular themes, we now focus on how often those specific two themes were used over the span of the dataset. We see that they are often used a similar amount each year, with the exception of 2004 (twice as many funny commercials) and 2014 (0 commercials were funny and 3 showed the product quickly). It is plausible to assume that commercials intending to be funny perhaps would not show the product in the opening seconds of the video, but this data shows otherwise.

Conclusion

Our research yielded many interesting conclusions. Regarding the first research question, we found that an ad having three themes was the most popular, but more successful ads have four themes. Regarding the second research question, we discovered that the themes funny and show-product-quickly were the most popular, and that the two themes appear together more often than one would expect. For the research question, we explored how brands used the theme celebrity over the years. We found that celebrities were used more in recent years (post 2015), but that ads with celebrities did not receive more views than ads without. In addition, the proportion of people who liked an ad did not differ substantially when a celebrity was shown and not shown, but people seemed to dislike ads more when a celebrity was not shown vs. when it was not. Overall, our research into Superbowl ads helped us better understand what type of themes to include, and how to better generate reviews from viewers.

Future Research

While this dataset gave us a good introduction to analyzing Super Bowl commercials, specifically their themes and what makes them successful - there is a lot more research that can be done on this subject. To begin with, it would be worthwhile to see other themes recorded in future datasets, for example - does the commercial relate to football? Do they use popular trends from the year? There are many possibilities in this regard.

When measuring the success of the commercial, we think it would be relevant to compare the views of the commercial to the overall popularity of the football game itself, as we believe that more popular Super Bowls will generally garner more views for their commercials. Additionally, it would be valuable to see how many views the YouTube video gets in the first 24 hours of publishing (as that is typically when the commercials are still freshest in everyone’s mind) instead of to the date of data recording. This would help standardize the counting of views for each commercial.

Additionally, it would also be valuable to have data on commercials outside of the 10 brands in the dataset. Only having 10 brands limits the amount of data points we can work with, and every brand did not have a commercial each year. Being able to see data on other, potentially smaller, brands would open up more opportunities for analysis - as we would be able to better answer the questions of how themes are used and what makes a commercial popular with more data points. We would also be able to compare how smaller brands, who are perhaps making their first Super Bowl commercial, fare in making popular commercials compared to the more established brands.

Appendix