36-315 Final Project: Factors Influencing the Popularity of Animes

Introduction - A Description of the dataset

MyAnimeList is an anime and manga social networking and social cataloging application website. The site provides its users with a list-like system to organize and score anime and manga. It facilitates finding users who share similar tastes and provides a large database on anime and manga. The dataset we are working with comes from Tam Nguyen and MyAnimeList.net via Kaggle, and it includes information on 14,478 animes up untill 2019.

As a popular media of entertainment, anime has created its own culture that is open and inclusive to all audiences. However, recent years, animation companies have been struggling with creating captivating contents that could be comparable with famous anime in the past. Thus, this report would provide valuable information about the factors influencing the popularity and quality of an anime as well as the trend over time, providing directions for animation companies and anime lovers who would like to learn more about the animation culture.

In our project, we will only focus on animes ranked as top 1000, and we will be focusing on the following variables:

'animeID'       # anime ID (as in https://myanimelist.net/anime/animeID)
'name'          # anime title
'title_english' # title in English
'type'          # anime type (e.g. TV, Movie, OVA)
'source'        # source of anime (i.e original, manga, game, music, visual novel etc.)
'genres'        # list of strings: anime genre; there can be multiple genres.
'score'         # score 0 to 10 (median score by users)
'rank'          # weighted according to MyAnimeList formula
                  (lower numeric value means higher rank)
'popularity'    # a comprehensive ranking based on members and favorites 
                  (lower numeric value means higher popularity)
'members'       # number of members that added this anime in their list
'favorites'     # number of members that favorites these in their list
'premiered'     # anime premiered on date
'broadcast'     # when is (regularly) the anime broadcasted

In summary, our report aims to investigate the factors influencing the quality and popularity of an anime. We intend to answer the following three questions:
1. What are the principle components contributing to the variation in the anime dataset
and what are the factors included to determine the quality of an anime?
2. What are the factors influencing the scores of an anime?
3. How does braodcasting time relate to anime's score and popularity?

Question 1: What are the principle components contributing to the variation in the anime dataset and what are the factors included to determine the quality of an anime?

To investigate factors influencing the popularity of animes, we first need to reduce the dimensionality of the dataset and select relevant quantitative variables for further and deeper analysis. Thus, a principle component analysis is conducted on the quantitative variables in the dataset related to the quality of an anime, including score, rank, popularity, members, and favorites.

anime_q <- anime_quant %>%
  select(score, rank, popularity, members, favorites)

anime_pca <- prcomp(anime_q, center = TRUE, scale. = TRUE)
summary(anime_pca)

## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5
## Standard deviation     1.7316 1.0935 0.8024 0.34234 0.21119
## Proportion of Variance 0.5997 0.2392 0.1288 0.02344 0.00892
## Cumulative Proportion  0.5997 0.8389 0.9676 0.99108 1.00000

The result of principle component analysis shows that PC1 and PC2 together account for 84% of the variation in the anime_quant dataset, meaning that we can now take PC1 and PC2 for visualization and further analysis. To confirm that we can use k=2 (PC1 and PC2), 2 principle components for plotting the correlations, a scree (elbow) plot is conducted to visualize this trade-off between dimensions and marginal gain in information. A horizontal line at 1 divided by the number of variables (in this case, 1/5) is drawn to serve as an indicator of the elbow.

library(factoextra)

fviz_eig(anime_pca, addlabels = TRUE, barfill = "#BEBCDF") +
  geom_hline(yintercept = 100 * (1 / ncol(anime_q)),
             linetype = "dashed", color = "darkred") +
  labs(title = "The Scree Plot of Dimensionality and Variance Explained")

In this elbow plot, the x-axis has the numbers 1,2,3,4,5, which are the dimensions in the data (5 quantitative variables), and the y-axis has the proportion of variation that the particular principal component accounts for. From the elbow plot, we can tell that PC1 and PC2 explains most of the variance in the data, and the “elbow” presents a strong argument to stop at k=2. This confirms that we could use the first two principle components in graphics and other analyses to capture most of the variation in the data.

Next, to make the principle components interpretable and relatable to the original variables (rank, score, popularity, members, favorites), a biplot will be used to present the linear relationship between the original variables and the principle components.

First, we check the linear combinations of the original variables that compose the principle components.

anime_pca$rotation

##                   PC1        PC2         PC3        PC4         PC5
## score      -0.4696155 -0.5028756 -0.13976199 -0.1245823  0.70108724
## rank        0.4512593  0.5511906  0.04674095 -0.1973657  0.67187497
## popularity  0.4046458 -0.1827796 -0.84381719  0.3003267  0.02509582
## members    -0.4702002  0.4765615 -0.06607223  0.7259960  0.14270656
## favorites  -0.4370354  0.4275345 -0.51175079 -0.5729395 -0.18990981

Then, using the fviz_pca_biplot() function on the prcomp output, a biplot is created to visualize the linear relationships.

library(ggfortify)
autoplot(anime_pca, 
         data = anime_q,
         color = "#9BDFDF", alpha = 0.25,
         loadings = TRUE, loadings.colour = '#FCB2AF',
         loadings.label.colour = '#FCB2AF',
         loadings.label = TRUE, loadings.label.size = 3,
         loadings.label.repel = TRUE) +
  labs(title = "The Biplot of Principle Components 1 and 2") +
  theme_bw()

From the biplot, we can see that score and rank are in opposite directions, meaning that as score increases, PC1 and PC2 decreases, while as rank increases, PC1 and PC2 increases. In addition, popularity and favorites and members are in opposite directions, meaning that as popularity increases, PC1 increases and PC2 decreases, while as favorites and members increase, PC1 decreases and PC2 increases.

The angle of the different vectors is also indicative of the correlation between different variables. In this biplot, score and popularity are in different directions (their angle is greater than 90 degrees), suggesting that they are negatively correlated. In the meantime, the lengths of the lines indicates that score and rank relate strongest to the principle components.

In a nutshell, score and rank are negatively correlated, which makes sense as the higher the score, the higher the rank (lower numeric value). Popularity is negatively correlated with favorites and members, which makes sense as the more members and favorites, the higher the popularity (lower numeric value). Score and rank are contributing the most to the variance explanation of the dataset.

Thus, in the following graphics and analyses, the quantitative variables score and popularity will be focused on as they both contribute largely to principle component 1. Furthermore, score is a comprehensive indicator of the likability and quality of an anime, and popularity is also a comprehensive indicator of the numbers of people who added the anime into their list, favorite it, and like it. Here, we use a correlation plot to visualize the correlations between different variables to confirm that popularity and score are negatively correlated.

anime_cor <- anime_q %>%
  cor()
anime_cor

##                 score       rank popularity    members  favorites
## score       1.0000000 -0.9472024 -0.3875429  0.3753362  0.4067914
## rank       -0.9472024  1.0000000  0.3954584 -0.3366193 -0.3173943
## popularity -0.3875429  0.3954584  1.0000000 -0.6130525 -0.3660310
## members     0.3753362 -0.3366193 -0.6130525  1.0000000  0.8316261
## favorites   0.4067914 -0.3173943 -0.3660310  0.8316261  1.0000000

melted_anime <- melt(anime_cor)
head(melted_anime)

##         Var1  Var2      value
## 1      score score  1.0000000
## 2       rank score -0.9472024
## 3 popularity score -0.3875429
## 4    members score  0.3753362
## 5  favorites score  0.4067914
## 6      score  rank -0.9472024

melted_anime %>%
  ggplot(aes(x = Var1, y = Var2, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "#9BDFDF", mid = "white", high = "#E995C9",
                      midpoint = 0, limits = c(-1, 1),
                      name = "Correlation",
                      guide = guide_colorbar(barwidth = 5, barheight = 3,
                                             title.position = "top"))

What is the association between score and popularity of animes, and is the association clustered by the period of the anime?

To initiate further examination into the variables score, popularity, and their interactions with time variables, a preliminary analysis of the association is conducted.

From the scatterplot, we can see that score and popularity are indeed negatively correlated. However, this association is not mediated by the period that the anime was released. Thus, in the following sections, we will further investigate the factors influencing the score and popularity of animes.

Question 2: What are the factors influencing the scores of an anime?

In analyzing the trajectory of anime success, this question delves into the historical trends of anime scores and examines the potential predictors of high-scoring anime in the future. We wanted to learn about “what kinds of anime are more likely to gain high scores in the future based on the two factors type and genre”, which suggests we should examine type, genre, and start_yr.

Analyzing the Relationship Between Start Year and Score

This section explores the overarching trend in anime scores over time. Using a linear model, we investigate whether there is a general increase or decrease in anime scores from the year of release.

This trend plot presents a clear upward trajectory in anime scores over the years, suggesting an increase in the quality or changes in the rating behaviors across time. With a positive slope indicated by the regression line, there’s an implication of gradual improvement in scores. However, the wide variability within each year, as seen by the spread of points, denotes a diverse range of anime ratings, regardless of the year of release.

Notably, the plot reveals a denser cluster of scores in the more recent years, starting from 2000, indicating either an increase in anime production or a more comprehensive collection of scoring data. Despite this general upward trend, the presence of high-scoring outliers in earlier years reminds us that high-quality anime is not a new phenomenon and has consistently emerged over the decades.

The variability and outliers observed here call for a deeper investigation into other influential factors such as genre and type. These elements will be essential for understanding and predicting which anime genres and types are more likely to achieve high scores in the future, serving as the foundation for the subsequent sections of this report.

Analyzing the Relationship Between Start Year, Score, and Type

Building on the initial analysis, this section introduces the type of anime (Movie, TV, Other) as a variable. We assess the interaction between the start year and anime type to understand how different formats of anime have evolved in their reception over time.

From the scatter plot, we see an array of data points that represent the scores of different types of anime over various start years. Each type of anime shows a trend line, with the TV series trend line suggesting a steady increase in scores over time. The plot points for TV series are densely populated around their trend line, which steadily ascends, indicating a consistent rise in TV anime scores as years progress. For movies, while the trend line is positive, the scores are more varied and spread out. The “Other” category shows a more horizontal trend, suggesting that scores for this category have not seen a significant upward trend over the years.

The faceted plot for TV anime shows that while there is a large volume of data points and considerable variability in scores, the overall trend line still suggests a rise in scores over time. The faceted plot, dedicated to TV anime, presents a clearer view of this upward trajectory by eliminating visual competition from movies and other types of anime. It allows for an unobstructed view of the increase in scores, reaffirming the potential for continued success and high scoring among TV series in the anime industry.

Overall, the combined scatter plot serves to show the overarching trends across all types of anime, which is useful for drawing comparisons. However, the individual plots allow for a more detailed analysis of each type without the visual interference from the others. The facet for TV anime is particularly telling as it supports the hypothesis that TV anime has a rising trend in scores. This focused view highlights the potential for TV anime to continue gaining high scores in the future, making it a promising area for further investigation and perhaps investment or development within the industry.

Analyzing Top Genres by Score in the “New” Period and Predicting the Future

Focusing on the “New” period (2000 to 2020), we narrow down the top five genres within the TV type based on their average scores. This analysis aims to identify which genres stand out in recent times and to forecast their scoring potential.

## # A tibble: 5 × 2
##   genre      average_score
##   <chr>              <dbl>
## 1 Samurai             8.60
## 2 Police              8.44
## 3 Historical          8.37
## 4 Josei               8.33
## 5 Parody              8.30

The table and trend plot provide a detailed overview of the average scores for the top five genres of TV anime from 2000 to 2020. The “Samurai” leads with the highest average score, underscoring its enduring appeal. “Police” and “Historical” follow, with respectable average scores indicative of their solid footing in viewer preferences. The “Josei”, while not as high scoring as the others, still commands a strong presence. Notably, the “Parody”, despite its place in the top five, sits at the lower end of average scores, suggesting its selective appeal.

Interpreting the data, the “Samurai”‘s leading position reflects its consistent ability to engage and satisfy audiences. The “Historical”’s mid-2000s peak might point to specific series that captured viewers’ imagination during that era. The “Police”’s data shows volatility and a notable gap in the records post-2012, indicating a potential shift in industry focus or a lack of data collection rather than a definitive decline in quality or popularity. The “Josei”’s steadiness in scores suggests a dedicated, if possibly niche, audience, whereas the “Parody”’s lower scores could reflect the inherent challenge in hitting the mark with humor and satire in anime.

In conclusion, while the “Samurai” is likely to maintain its popularity, the future of “Police” is uncertain without recent data. The consistent scoring of the “Historical” and “Josei” genres suggests they will continue to have a dedicated audience. However, the “Parody” may need to evolve or see standout productions to increase its standing. These trends offer valuable insights for predicting which genres may garner high scores and viewer attention in the coming years.

Question 3: How does braodcasting time relate to anime’s score and popularity?

This question examines the relationship between the broadcast timing of anime and two key performance metrics: the anime’s score and its popularity. Specifically, we seek to understand how the time of day and the day of the week when an anime is aired might correlate with its critical reception and viewer engagement. We first investigate the distributions of the exact braodcast time in 24 hours scale JST and the day of week of broadcast for the anime’s in our sample.

The histogram depicting the distribution of broadcast time is trimodal, with the most significant peak during mid-late-night hours hours and lower peaks in late afternoon hours and evening hours after 8pm, suggesting that these are the preferred slots for broadcasting anime. This implies most anime are broadcasted during midnight hours to target older teens and adults rather than children, benefit from lower broadcasting costs, and provide creators with greater creative freedom.

Adjacent to our histogram, a bar graph illustrating the distribution of anime broadcast across different days of the week shows a higher frequency of broadcasts during the weekends, particularly on Friday, Saturdays, and Sundays. This indicates a possible strategic placement of anime shows during these days in the weekends to capture a larger audience, likely when viewers have more leisure time.

The density plot for the anime’s score versus broadcast time shows a scatter with the highest concentration of points in mid-late-night hours. The color gradient indicates density, with a darker shade denoting a higher density of data points, which suggests a majority of highly-scored anime are broadcasted during late mid-night hours and less proportion of higher-scored anime in late afternoon and evening.

Similarly, the density plot for anime’s popularity versus broadcast time shows a spread of points throughout the day. It appears that the most popular ones of anime are mostly in the evening time zone, and very densely populated low popularity data in mid-late-night hours hours, implying that evening hours might be the golden-hours for anime broadcast.

## 
##  Pearson's product-moment correlation
## 
## data:  anime_1000$score and anime_1000$broadcast_time
## t = -11.963, df = 7377, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1602673 -0.1155010
## sample estimates:
##        cor 
## -0.1379546

## 
##  Pearson's product-moment correlation
## 
## data:  anime_1000$popularity and anime_1000$broadcast_time
## t = 9.5233, df = 7377, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08760584 0.13268631
## sample estimates:
##       cor 
## 0.1102028

Our Pearson correlation tests provide a statistical measure of the strength and direction of the relationship between broadcast time and anime scores, and between broadcast time and popularity, with both tests showing statistical significance (p-value < 0.05). For anime scores, the test results in a weak negative correlation, suggesting a slight inverse relationship between the time of broadcast and the score received. For popularity, a weak positive correlation is found, hinting at a slight increase in popularity with certain broadcast times.

## Picking joint bandwidth of 0.0758

Our ridge plot comparing anime’s score across different days of week visualizes the distribution of anime scores across different days of the week, showing nuanced variations. Each day’s distribution is represented by a colored layer, with the peak of each layer indicating the mode score. The plot suggests that there are variations in the score distributions with Saturday and Sunday having wider and slightly higher distributions, indicating a potential preference or higher viewership on weekends. It also exhibits high modes of anime with high score in Monday and Thursday.

## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  score and broadcast_dayofweek
## F = 79.265, num df = 6.0, denom df = 2777.5, p-value < 2.2e-16

To examine the differences of means of score of anime across different days of week, the one-way ANOVA test result indicates a statistically significant difference in anime scores across different broadcast days of the week. The F-statistic is quite large, and the p-value is less than 0.05, providing strong evidence to reject the null hypothesis that there is no difference in mean scores across days. This suggests that the broadcast day could be a significant factor in an anime’s scored reception.

In conclusion, broadcast timing shows a meaningful relationship with an anime’s score and popularity, indicating strategic late-night and weekend slots correlate with higher scores and greater viewer engagement. While late nights seem favored for high-scoring anime, likely due to a dedicated audience, weekends attract more shows, suggesting an intent to capture the leisure time of a broader audience. The data imply that both the time of day and day of the week are significant considerations for broadcasters aiming to optimize their anime’s reception and popularity.

Conclusion

This project has successfully addressed the intricate dynamics affecting anime’s popularity and quality, providing substantial insights across three pivotal questions. Our analyses reveal that specific genres such as “Samurai”, “Historical”, and “Police” not only hold a prominent position in terms of average scores but also display a consistent appeal over the observed period, affirming their strong resonance with the audience. Simultaneously, the upward trajectory in the scores of TV series suggests an overall improvement in their quality or a shift in the ratings landscape over time. Furthermore, our investigation into broadcast timings has uncovered strategic preferences for late-night and weekend slots, which are associated with higher scores and increased viewer engagement, indicative of optimal broadcasting strategies to maximize audience reach and satisfaction.

These findings are meticulously grounded in the analyzed data, supported by detailed visualizations and statistical tests that enhance the reliability of the conclusions drawn. The accuracy of the claims is upheld by clear correlations and trends observed in the data, directly answering our research questions regarding what factors most significantly influence the popularity and quality of anime. The project’s conclusions not only shed light on the current state of anime genres and types but also map out the broadcasting strategies that potentially contribute to an anime’s success. These insights provide a coherent narrative that aligns with our initial research objectives, offering a holistic view of the factors that are pivotal in shaping the trajectories of anime within the competitive landscape of entertainment.

Discussions and Future Directions

Looking ahead, several questions beckon further inquiry. The unexpected data gap in the “Police” genre post-2012 requires additional scrutiny to understand whether this reflects a true decline in production or a lapse in data recording. Moreover, while “Josei” and “Parody” genres command niche audiences, identifying what specific elements within these genres resonate with viewers could lead to a surge in their popularity.

The advent of online streaming services and digital platforms has undeniably disrupted traditional broadcast metrics, calling for an updated analysis framework that incorporates these modern content consumption pathways. Future research should extend beyond broadcast time and day to include digital engagement metrics, such as streaming numbers, social media buzz, and global accessibility. These digital footprints could offer a more holistic view of an anime’s popularity and its reception across diverse, international audiences. Additionally, the interrelation between genre-specific trends and changing demographic patterns presents an uncharted territory. Exploring how evolving societal trends, shifts in global entertainment consumption, and the increasing cross-cultural exchange impact anime’s success will be pivotal.

Our conclusions lay the groundwork for subsequent scholarly work in this field. To maintain the relevance and rigor of anime research, scholars should continually adapt to the changing entertainment landscape, embracing both traditional and emerging analytics tools. As we venture into this new era of entertainment, a fusion of quantitative data analysis with qualitative cultural studies will be essential to capture the full spectrum of factors that influence the success of anime.