Data Set Description

For our final project, we picked the Avengers data set from FiveThirtyEight, which details the deaths of Marvel comic book characters between the time they joined the Avengers and April 30, 2015.

Below is a brief description of each variable and what they represent:

For i = 1 through 5, the dataset includes the following two variables:

The data set has a total of 173 data points.

Research Questions

We are interested in investigating what factors influence the popularity of characters. For the purposes of our research, we will be associating a high number of appearances with popularity. In particular, we want to explore:

  1. How should Marvel go about naming their characters? Should they consider using alliteration? Common names? Number of characters?

  2. What other factors should Marvel take into account when constructing a popular superhero? Specifically do Gender, Current? and Honorary have an effect?

  3. Should Marvel kill them? If so should they bring them back and how often?

Question 1

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Appearances by similar
## Bartlett's K-squared = 15.977, df = 1, p-value = 6.413e-05
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  Appearances and similar
## F = 3.4066, num df = 1.000, denom df = 57.487, p-value = 0.07009

## 
##  Welch Two Sample t-test
## 
## data:  Appearances by top_first
## t = -2.3069, df = 89.796, p-value = 0.02336
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -516.93166  -38.55155
## sample estimates:
## mean in group 0 mean in group 1 
##        323.4375        601.1791
## 
##  Welch Two Sample t-test
## 
## data:  Appearances by normal_first
## t = -0.83091, df = 33.024, p-value = 0.412
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -346.1100  145.3779
## sample estimates:
## mean in group 0 mean in group 1 
##        349.5500        449.9161

Looking at names among the top 100 over the last century from the Census Bureau, there is a significant increase in average appearances among the group with names among the top names as compared to names that are not within the top 100. Names with similar startings or endings (a measure of alliteration and/or assonance and a proxy for ‘catchiness’) tend to be more popular as well. This relationship is significant at the 10% level but not the 5% level. This means that there is some indication that catchy names make characters more popular.

Perhaps people can relate more to characters with names that they recognize and enjoy reading about them more. If characters have names that are similar to yourself or those that you know, it may be easier to become immersed in the world.

## 
## Call:
## lm(formula = Appearances ~ total_letters, data = a)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -964.8 -355.9 -197.5   80.0 3668.6 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     -44.44     155.83  -0.285  0.77587   
## total_letters    37.31      11.35   3.286  0.00125 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 671.6 on 161 degrees of freedom
## Multiple R-squared:  0.06286,    Adjusted R-squared:  0.05704 
## F-statistic:  10.8 on 1 and 161 DF,  p-value: 0.001246
## 
## Call:
## lm(formula = Appearances ~ letters_per_name, data = a)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -446.1 -373.5 -279.3   92.7 3899.3 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)  
## (Intercept)        502.05     287.56   1.746   0.0827 .
## letters_per_name   -10.79      47.26  -0.228   0.8198  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 693.7 on 161 degrees of freedom
## Multiple R-squared:  0.0003234,  Adjusted R-squared:  -0.005886 
## F-statistic: 0.05209 on 1 and 161 DF,  p-value: 0.8198

Letters per name is not associated with the number of appearances of Marvel characters but the total number of letters over the entire name is positively correlated with appearances. We believe that there are two possible reasons for this. First, people may simply remember people with longer names better because there is a higher chance that one name is memorable. Second, the increase in popularity due to name length could also be due to the inclusion of aliases in their names. People who appear more might be more likely to have an alias (which were included in the name variable). It is possible that their alias is easier to remember or that they get an alias (or their normal name is revealed) after they have been a character for longer. Overall, we recommend that characters have more total name length that includes a normal name and an alias although the length of individual names does not matter. We also recommend that comic creators try to make names more catchy (e.g. Peter Parker, Stephen Strange, Bruce Banner, Susan Storm) by matching starting or ending letters. We also recommend that the names should generally be names that are relatively common as these tend to perform better.

This being said, the strength of the relationship between name catchiness and commonality are not that strong. It may be possible that if they lean too heavily into creating catchy or common names that this could backfire.

Question 2

To better understand our second research question we wanted to explore what other possible factors could lead to an Avenger being more popular. The first variable we examined was gender in this boxplot. As we can see, the males have a slightly higher average than females and have a much larger upper bound and amount of outliers. While it is somewhat close, from this chart it seems that males are generally more popular characters, and have a higher chance of being very popular heroes, given the high level of outliers. But generally, there does not seem to be a big difference in popularity amongst genders.

## 
##  Pearson's Chi-squared test
## 
## data:  tab1a
## X-squared = 154.3, df = 147, p-value = 0.3235
## 
##  Pearson's Chi-squared test
## 
## data:  tab2a
## X-squared = 38.691, df = 38, p-value = 0.4383
## 
##  Pearson's Chi-squared test
## 
## data:  tab3a
## X-squared = 118.65, df = 112, p-value = 0.3156

To further test if gender has an effect on popularity of a character we ran a chi-square test. The p-value of that test is 0.3235 and given the high p-value we fail to reject the null hypothesis that the variables are independent and have evidence for the alternative that they are independent. We also ran two more tests on avengers who were created before 1985 and after 1985 to see if there is a difference in modern comic characters. But in both those tests we also saw high p-values (0.4383 for the old dataset and 0.3156 for the young dataset) so we once again have evidence that the popularity of an Avenger and their gender are independent.

To further explore the differences between genders we created this time series plot to see if the cumulative amount of avengers by gender changes over time. As we see there are more males added initially, but both genders seem to grow at a similar rate. However, more recently we have seen a lot more males added compared to females. This may point to the marvel comics team thinking males are more popular or attractive for readers due to their own research, but as we have shown above there is no significant difference in popularity between the genders so this may be misguided.

A second factor we examined to see how it affected popularity was the Avengers’ status (Academy, Full, Honorary, Probationary). From this violin we can see that the most popular Avengers are full Avengers, which should come as little surprise. This indicates that to increase popularity, the writers should try to elevate a hero to be a full member as soon as possible. Some other takeaways from this chart is that Academy Avengers seem to be the least popular, and that there are some fairly popular honorary and probationary characters. Potentially Avengers writers may want to try to have more honorary members in comics or potentially have more controversial characters that are put on probation as those also have seen some popularity.

Question 3

So far, we’ve looked at the characteristics that make up successful, marketable Marvel Comics Avengers superheroes. Though, much of what makes a superhero popular is their stories. We can look at the decision that Marvel makes to kill and bring back “dead” characters, and how this relates to the number of appearances they have.

Here, we can view “popular” characters as the points that are plotted above the local regression line. Note that the older popular characters have often died, while the newer ones haven’t. One possible reason for this is that as characters make more appearances/get older, Marvel Comics tries to rejuvenate interest in the character by putting them in increasingly dire situations. In the comics, this may go so far as to have the Avenger die, and possibly bring them back in the future.

Next, we want to look at how Marvel Comics makes the decision to bring back characters that have been killed. Here, we’ll consider only the data points across characters that have died, and try to understand which characters Marvel brings back and which characters Marvel keeps dead.

Here, we take account of the Death and Return variables, and denote a character to be Alive if they have returned from each of their deaths and Dead if they haven’t returned from their last death. Points are centered generally in two groups.

The largest group is characters appearing in the Avengers after 1975 and having less than 1000 appearances. In this group, characters are more generally dead, and work to support an argument that Marvel allows the newer Avengers they create that appear less often to die.

The second, smaller group appears has 750+ appearances and is made up of characters earlier than 1975. This indicates that Marvel Comics tries to bring back its older, higher-appearing characters to appeal to older fans.

Conclusion

Based on our findings, Marvel should create a character with multiple names (ideally common names in the top 100) in addition to having an alias. Additionally, using catchy names with alliteration and rhyming is recommended. Regarding gender, we did not have any findings that were statistically significant, but we did find that the most popular characters happen to be male. Additionally, we noticed that full time Avengers members tend to be more widely recognized and appear more in the comics, and thus Marvel should try to elevate a hero to be a full member as soon as possible. Regarding death and resurrection, popular characters should not be killed until they’ve had a chance to appear in comics for a sustained period of time - having said that, if they are killed, they should be brought back.

When working with this data set, we initially wanted to do text analysis and create visualizations based off of those findings - having said that, we noticed that most of the notes from the data set simply note the comic issue that the character died or returned in, without really providing any valuable insight. In the future, to continue our research, we could implement data provided from Marvel movies and TV shows. This could include adding additional counts for Appearances as well as produce additional sources of data (such as text from scripts, budgets, box office, etc.) that we could use to determine which additional factors affect a character’s popularity.