The video game dataset is originally from kaggle.com and contains data on video games with sales greater than 100,000 copies. There are 16,598 rows in the dataset, which each correspond to a video game. Each video game is ranked based on overall sales (in millions), and sales data is specifically available for North America, Europe and Japan. All other sales are grouped in an “other” category, and there is additionally a column with global sales for each video game. Other columns in the dataset detail a video game’s name, platform of release, year of release, genre, and publisher. Platforms of release include platforms such as the Wii, PS4, or PC; there are 31 unique platforms in the dataset. The year of release is unavailable for 271 video games in the dataset; for the other video games, the year of release ranges from 1980 to 2020. There are 12 unique genres (including a miscellaneous category) and 579 unique publishers in the dataset, with the most common publisher being Electronic Arts.
Through our research, we hope to analyse the key factors in what makes a video game successful and how this success varies throughout the world. To complete our analysis, we will address the following questions:
How do specific game attributes such as name, genre, and publisher affect rankings of video games?
Which genres are more financially successful in North America? Are these genres consistently successful in diferent regions?
How do sales of video games differ across regions? How have sales been across time?
Here we analyze our first research question and study if the success of a video game is related to the words used in the title of the video game. In the graph below, we look at words used in the titles of the 1000 best selling video games.
Wordcloud of video game names in the top 1000 ranked video games
The graph above is important because it shows the most common words used in the names of the highest selling videogames. We can see that games with “adventure”, “war”, “super”, etc. are common in many of the highest selling video games. In some cases, this is indicative of a popular series of video games like “Mario” being common because of all the Super Mario video games. Overall, this graph gives a lot of insight into what kinds of games are the most popular.
Our project deals with analyzing what factors make video games successful, and one of the factors that could play a role in video game success is the title of a video game. We analyze video game titles with a sentiment analysis to see if popular video games have more positive or negative words in their title. We create two word clouds below, one looking at the types of positive words in popular video game titles and the other looking at the types of negative words in video game titles.
The first word cloud shows that “super” and “hero” appear frequently in video game titles; other positive words include “magic,” “love,” “grand,” and “marvel.” There appear to be more video game titles with negative words, such as “dead,” “monster,” “dark,” “madden,” and “evil.” This could suggest that video games titles containing negative words are more attractive to video game users, or that specific video game genres are more appealing to video game users and the titles happen to reflect those genres.
We aim to investigate our second research quetion: which video game genres are more financially successful in North America and also determine whether these same genres are also successful elsewhere in the world.
To do so, we create side-by-side boxplots of North American sales for each genre. The sales were log transformed to account for the few sales that are much larger in magnitude than the others and to make it easier to see the overall trend in North American sales by genre.
The graph above indicates that video games within the Platform genre may be more successful on average than other genres and that the Adventure and Strategy genre is less successful than other genres. Every boxplot has one or more outliers, indicating that each genre has one or more successful video games, so video game success is not entirely dependent on the genre trend. Several video games in the Sports genre appear to have been highly successful compared to other video games.
We continue analysis of genre success by looking at whether video game success by genre varies by region. To do this, we create a density plot of sales specifically for games within the Platform
genre, which was the one of the most successful genres identified in the previous section.
We see that the distribution of log(sales) is fairly different among the different regions. The distribution of log(sales) also tends to be skewed to the right. We see that Global
sales and North American
sales are the most similar, which makes sense because North American sales tend to make up the majority of Global
sales. Visually, North American video games sales are most similar to European video game sales, which could suggest that video games that are popular in North America also tend to be popular in Europe.
Here we analyze our third research question: How do sales differ accross regions? To do this, we first create a pairs plot to determine the relatioship between sales in different regions.
Pairs plot of sales across each region
The pairs plot shows the ditribution of the log(sales) in North America, the EU, Japan, Other regions, and globally. These distributions show that generally, there are a lot of games in each region that have very low sales. However, ignoring these low selling games, sales seem to follow a close to normal disribution in each region. However, the peak of this normal distribution varies by region with North America having the most sales and Japan having the least. Another interesting thing to note is that the correlation between log(sales) in Japan and log(sales) in either the EU or North America is negative. While these correlations are not very strong, its surprising that they are not strong positive correlations since we expected games that do well in one region to do well in others as well.
Our data is ranked in descending order of sales in North America. We are interested in seeing the relationship between the sales in North America and Globally. Intuitively, North American sales should have an impact on Global sales because of its sheer size and status as consumer based economy. The figures below show the relationship between the two sales metrics in billions of dollars. The second figure, is zoomed into the cluster in the lower right hand corner of the first graph, for the sake of visualization.
There appears to be a positive linear association between the sales in North America and global sales. There also appears to be some separation or clustering in the colors of the points in the graphs. We see that the Sony and Microsoft games represented by the pink and purple dots are more so clustered in the top sections of the seemingly linear trend. In contrast, Sega and “other” as represented by the blue and green points are more present in the lower half of the data. Nintendo seems to be evenly spread throughout the data without being clustered in one particular region.
While the plot from before shows the overall sales of games, we also want to know how video game sales have faired over time. Specifically, how has video game sales changed since the 1980’s. To do this, we look at video game sales in each region in each year since 1980.
Video Game sales by year per region
Looking at the graph above, we can see that the sales trend is very similar for all the regions. All of them start out low with little to no sales until late 1990’s, exponentially increase in the amount of sales starting early 2000’s and peak at 2008. From 2008 to 2020, all of the regions show a rapid decline in the amount of video game sales that have happened. Even though all the regions have a similar trend, North America consistently had a higher amount of sales than any other region. This leads us to the next question, why is there a sudden decline in the number of sales after 2008? Is it due the great recession or are there other factors that influenced the rapid decline in sales?
To further explore why video games sales declined in 2008, we looked at how the sales of each Publisher fared across time since we thought publisher could be a factor in declining sales. In the plot below, we plot the number of video games sold by the top 5 publishers between the years of 1980 and 2020. These publishers were chosen based on the wordcloud depicting the top 10 publishers who consistently made good video games.
Global video Game sales by year per publisher
From the graph, we can see that after 2008, Nintendo, the most popular publisher with the highest number of sales, had a major decline in the number of games sold. This seems to be the case for Activision as well. The major drop in the number of sales in these two companies could explain why the global and regional video game sales dropped as shown in prior graphs.
From our analysis, we found that many attributes might affect video game rankings. For example, whether or not the name of the game contains negative words and if it has the name of a popular series are contributing factors. Also, genre and publisher could affect rankings since different genres had different amounts of sales with platform games having the largest. Similarly, different publishers had varying amounts of sales with Nintendo being the highest. When looking at North America, we see that platform games are the most popular but their sales vary by region and are considerably less popular in Japan. When looking at sales over time, we noticed that there was a sharp decline in sales in 2008 across all regions. We found that most of this decline was in the sales of Nintendo and think that more research needs to be done to determine if Nintendo’s decline in sales is the reason behind the large drop in global sales or if there was some other external factor. To do this research, we would need to obtain more data such as data about the economy that could allow us to test if the recession in 2008 was the cause of this decline.