Video games have without a doubt had an impact on pop culture and even the world. Since the 1970s, games like “Pac-Man”, “Super Mario”, “The Legend of Zelda”, “Final Fantasy”, and “Call of Duty” have attracted billions of fans to the video game industry and paved the way for gaming to become mainstream today. Even in recent years, with the emergence of social networks, smartphones and tablets, new categories such as mobile and social games have been introduced to today’s gamers. As of 2020, the global video game market has estimated annual revenues of US$159 billion across hardware, software, and services, three times the size of the 2019 global music industry and four times that of the 2019 film industry.
In this report, we will introduce three of the biggest publishers in the past decade from within our dataset. Based on our online research, we observed that the four biggest publishers in the video game industry were Sony, Tencent, Nintendo, and Microsoft, but because there were hardly any games from Tencent from within the dataset, we elected to select the other three. We will identify each publisher’s most popular video games, and compare/contrast different attributes between the three publishers.
Finally we are interested in better understanding how the popularity of video game platforms changes over time, as well as what constitutes a high-grossing video game. We will also make observations of numerous word clouds and identify the most commonly used words for video game titles.
This dataset is titled Video Games Sales 2019. The dataset is from Kaggle and contains 55,792 video games released from 1970 to 2019. You can find the specific dataset here. Each row corresponds to a game that generated sales in 2019. There are 16 columns in the dataset. Categorical variables include: Rank
, Name
, basename
, Genre
, ESRB-Rating
, Platform
, Publisher
, Developer
, Last_Update
, status
Quantitative variables include: Critic_Score
, User_Score
, Total_Shipped
, Global_Sales
, NA_Sales
, PAL_Sales
, JP_Sales
, Other_Sales
Other variables include: VGChartz_Score
, Year
, url
, Vgchartzscore
, img_url
The time series graph plots the top 5 most popular platforms of all time based on the number of games released per platform. We can see the rise and fall of platforms such the Play Station on the graph as they get replaced by newer consoles. For instance, we can see that the fall of the PS (PlayStation) is partly due to the rise of its replacement (PS2). Notice how the number of games created for the PS decreases as the number of games for PS2 increases simultaneously. We also see the semi consistent rise of the PC.
Next we will explore the many genres that the video games in the dataset has to offer.
The Global_Sales
of video games have a general trend of increasing, reaching a peak, and then decreasing. For this dataset, we decided to look at the top 5 genres that generated the most sales, which are Sports, Action, Shooter, Racing, and Role-Playing. Misc was part of the top Global_Sales
but we decided to not use that because it does not specify which genre we are looking at, so we believe that using the next top genre will give us more insight. We see some peaks of the genres around 2008 - 2011 and then slowly decreasing in later years. Action games had the highest sales in 2011 followed by Sports games in 2008. Something to note about the dataset is that there were many n/a in the Global_Sales
column.
We then ran a statistical test to see if there is a statistically significant different in Global_Sales
among the different genres.
##
## Bartlett test of homogeneity of variances
##
## data: sales by Genre
## Bartlett's K-squared = 51.511, df = 4, p-value = 1.746e-10
##
## One-way analysis of means (not assuming equal variances)
##
## data: sales and Genre
## F = 5.4447, num df = 4.00, denom df = 107.53, p-value = 0.0004965
Since p-value = 0.0004965, which is < alpha = 0.05, we reject the null hypothesis. We conclude that there is a statistically significant difference of sales between the different genres. This further shows that different genres do generate different Global_Sales
.
We then wanted to dive deeper into the distribution of genres of the top 3 publishers: Microsoft, Nintendo, and Sony. We looked at the Global_Sales
of different genres of the three publishers and decided to plot the top 10 genres that generated most sales combined.
Note that the labels on the x-axis is ordered from the highest sale genre for all three companies to the lowest sale genre. In this case, the most sales generating genre by Global_Sales
when looking at all three is puzzle games, followed by action. Of the top 10 genres, the least sales generating genre is simulation. From the graph we can see that puzzle games seem to generate high sales for Microsoft and Nintendo but relatively low sales for Sony. This means that puzzle games does not generate as much Global_Sales
for Sony as it does for the other two publisher.
For Microsoft, out of the top 10 generating genres, action games generates the most global sales followed by shooter and puzzle games. Adventure games generate the least sales for Microsoft. For Nintendo, puzzle games generates the most sales followed by platform games. Shooter, strategy, and simulation games generate the least sales. For Sony, sports games generate the most global sales while strategy and simulation games generate the least.
From the plot we observe from the yellow line that the trend between Critic_Score
and Global_Sales
might not be very strong, but higher scores tend to have higher sales (Action and Shooter games with the high/low score also have corresponding high/low sales), although most data points gather around Critic_Score
= 7 and Global_Sales
= 1. We can tell the relationship between Critic_Score
and Global_Sales
with this plot and also distinguish between different Genre
.
We also want to explore the difference in critic and user scores of the top 5 sales generating genres which are Action, Racing, Role-Playing, Shooter, and Sports.
Now that we have seen a possible relationship between Critic_Score
and a high Global_Sales
, we want to explore if there is a relationship between the Critic_Score
and the User_Score
. We will explore this by plotting box plots and running a t-test.
With the side-by-side boxplots, we can see that the center of Critic_Score
for the top five genre are mostly around 7.5, but the center of User_Score
for the top five genre are all greater than 8. We can tell from these boxplots that the median between Critic_Score
and User_Score
for the top five genre are different.
##
## Welch Two Sample t-test
##
## data: top_genre$Critic_Score and top_genre$User_Score
## t = -10.514, df = 219.56, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.2596148 -0.8619493
## sample estimates:
## mean of x mean of y
## 7.184903 8.245685
We also ran a statistical test to further examine if the mean critic score is different from the mean user score. From the result of the hypothesis test, we observe that the p-value < 2.2e-16, which is less than 0.05. We reject the null hypothesis and have sufficient evidence to conclude that the average Critic_Score
and average User_Score
between the top 5 most popular genres by sales are not the same. This means that users and critics likely have differing opinions on their rating of video games. We also want to note that there are less data available on the User_Score
compared to the data available for the Critic_Score
.
To answer our question, we’ll be implementing a stacked bar plot, with x = Publisher
, y = Total_Shipped
. The bars will be stacked with platforms since they would fall within a Publisher
, like the DS and the Switch under Nintendo for example.
In the past decade, Sony Computer Entertainment has sold more copies of video games than both Microsoft and Nintendo. Nintendo has sold games across more platforms than that of Microsoft and Sony. We can also observe that Microsoft’s Xbox Live platform has sold the most video game copies in the past decade, contributing to most of Microsoft’s numbers. We can see how diverse Nintendo is in terms of platforms for which they have published games for from 2010 to 2019, compared to that of Microsoft or Sony.
This word cloud shows the 100 most common words that appear in the video game titles in the database. Some of the most popular words include game, world, adventure, super, star, and edition. These words are the most commonly used throughout all video games. Next we’ll examine the most widely used words in the video game titles of the top 3 publishers.
From the first word cloud, we can see what names for video games from Sony Computer Entertainment were most common. For example, we can clearly see the words “singstar”, “mlb”, “show”, “nba”. This makes sense, as these are words that make up titles of some of the biggest franchises (MLB the Show, NBA 2K, SingStar) published by Sony.
From the second word cloud, we can see what names for video games from Nintendo were most common. For example, we can clearly see the words “super”, “mario”, “pokemon”, “legend”, “zelda”. This makes sense, as these are words that make up titles of some of the biggest franchises (Super Mario, Pokemon, and The Legend of Zelda) for Nintendo, and honestly all of video games.
From the third cloud, we can see what names for video games from Microsoft were most common. For example, we can clearly see the words “avatar”, “zombie”, “ninja”, “monster”, “microsoft”, “flight”, and simulator". This makes sense, as some of these words make up titles of a couple of Microsoft’s biggest franchises (Xbox Avatar, Microsoft Flight Simulator). The rest of the most common words from the word cloud come from games not necessarily a part of a video game series.
Comparing each of these word clouds, we can see that there were hardly any similarities between one another. At a glance, the only common word we can see between the word clouds is the word “game”, which can be seen in the word clouds for both Nintendo and Microsoft but not in Sony’s.
Through our research questions and from our findings consisting of the plots and statistical analyses, we have noticed some interesting points behind the games from our dataset.
Firstly, the popularity of a platform
rises and falls as it gets replaced by other consoles. We also see a consistent rise in popularity of the PC. The top five genres that generate the most sales are Sports, Action, Shooter, Racing, and Role-Playing games. Also, between those five genres there exists a statistically significant difference in global sales. These genres have a similar trend of increasing to a peak and then decreasing. Among Microsoft, Nintendo, and Sony, we learned that puzzle games generated the highest amount of global sales.
We also found that the trend between critic score and global sales might not be very strong, but higher scores do tend to have higher sales. In addition, we also found that critics and users rated games differently. This is similar to what we expected because video games at their very core are supposed to be all about the player’s experience, and each person’s experience playing a game can be different.
Finally, the most popular words that we found in video game titles from the original dataset were “game”, “world”, “adventure”, “super”, “star”, and “edition”. For each of the three companies, most of the common words we observed in our word clouds were found in the titles of their respective popular video game franchises. There were no clear similarities among the three companies. We did observe that both Nintendo and Sony Computer Entertainment brought forth numerous platforms to release games from the last decade compared to Microsoft. However, most of Microsoft’s sold video game copies came from the Xbox Live.
Thoughtout the years, video games have progressed and we look forward to see the transformation for video games.