For our final project, we will be using the ‘Bechdel Test’ dataset, which is originally from FiveThirtyEight. The link to the description of the dataset and variables is https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-03-09/readme.md. This dataset includes 1793 movies and has 43 variables with a combination of quantitative and qualitative variables. We will be specifically looking at the following:
Our main response variable, rating, is categorical, and describes its score on the Bechdel test. The Bechdel test was originally created in order to measure the quality of female representation in media. It was coined by Alison Bechdel in 1985 as a measure of the lack of female representation in movies specifically. 0 represents unscored, 1 represents having two named women in the movie, 2 represents that the two named women talk to each other, and 3 represents that the two named women talk about something other than a man. The test is one of the most commonly used metrics in measuring equal gender representation in Hollywood films.
We are seeking to observe how gender roles in movies have changed over time, and what factors most greatly affect a movie’s Bechdel rating. We have four main questions that our project aims to answer:
This dataset contains the data for 1793 movies which were released from 1970 to 2013. In this dataset we see that there are 993 movies with a bechdel rating of 3, 185 movies that got a rating of 2, 486 movies that got a rating of 1, and there were 129 movies that were not rated. We were pleasantly surprised to see that rating 3 had the most number of movies since this dataset has thousands of movies spanning many decades and Hollywood hasn’t been known for having excellent gender representation throughout time. However, it is very likely that within a single rating, the quality of the film is highly variable. Next we explore some of the other variables and their relationships with one another.
From our pairs plot, we see that domgross and intgross are positively correlated. This means that movies that had higher gross levels in the United States saw similar results internationally. We also see that budget and gross all seem to increase over time. We see that there seems like there might be a slight relationship between Bechdel rating and all the other variables. Either way we want to look more closely at each of these variables/relationships to see what we can conclude.
For our first question we decided to look at what variables affect the budget of a movie. Does its guidance rating matter? Have movies’ budgets increased over time?
We started our analysis with PCA, in order to see which variables explain the most variance in the data.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.0206 1.1317 1.0024 0.65106 0.35322 0.2813 0.06299
## Proportion of Variance 0.5832 0.1830 0.1435 0.06055 0.01782 0.0113 0.00057
## Cumulative Proportion 0.5832 0.7662 0.9097 0.97030 0.98813 0.9994 1.00000
From the scree plot, we will only use the first 2 principal components, since they convey more information than any given column in the original dataset. From the biplot, we can see that the variables budget, intgross, domgross, intgross_2013, and domgross_2013 all point towards the left, thereby signaling that movies with a low first principal component tend to have higher values of these variables. On the other hand, rating is almost independent from these variables since its vector is close to orthogonal to the other vectors. Thus, using the high vs low budget colors, we can see that movies with a higher budget do not necessary have a higher Bechdel rating.
This plot shows a time series of the budget of a movie by the year it was released, with a separate line for each gross level. The gross levels were determined by the quantiles of the gross variable, with movies in the first quantile with a level of low, those between the first and second quantile as low-middle, those between the second and third quantile as high-middle, and those betwen the third and fourth quantile as high. From the graph, we see a clear relationship between the year a movie was released and its budget, with movies being released in later years having a higher budget than those in earlier years. We also see that for each year, the movies with the highest gross levels also had the highest budgets, and the movies with lower gross levels had lower budgets. Interestingly, all the gross levels start off with very similar budgets but as the years go on, difference in gross level indicates a much larger difference in budget. Now it seems that the larger the budget the more money it will generate at the box office, which makes sense given the fact that large franchises with high budgets, such as those in the Marvel Cinematic Universe, have taken over the list of highest grossing films in recent years.
##
## Welch Two Sample t-test
##
## data: lateryears$budget and earlyyears$budget
## t = 15.43, df = 40.827, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 33850756 Inf
## sample estimates:
## mean of x mean of y
## 45378999 7383921
In order to test whether movies produced in later years truly had higher budgets than those produced in earlier years, we first separated the data into earlyyears, which included all the movies released up to 1990, and then lateryears, which included all the movies released in or after 1990. We then conducted a t-test between the budget of the two datasets. Our null hypothesis is that there is not a significant difference between the budget of movies produced in earlier years and later years, and our alternate hypothesis is that movies produced in later years have higher budgets. Our t-test gave us a p-value of < 2.2*10^(-16), which is smaller than our alpha of 0.05. Therefore, we have enough evidence to reject the null hypothesis and conclude that the movies released in later years do have higher budgets than those released in earlier years.
Next, we are going to pivot slightly in order to look at other things that affect budget such as guidance rating and Bechdel rating. We wanted to see whether there is a difference in how movies with different types of content, as defined by guidance rating and Bechdel rating, are funded.
From this plot, we gain information about the relationship between the Bechdel rating of a film and its budget, by the different movie ratings that exist. Some notable things in this visualization are that G rated films with a Bechdel rating of 2 seem to have the largest mean budgets, with TV-PG movies with a Bechdel rating of 1 following close behind. Maybe this has to do the insane budget of Disney who primarily focuses on children’s movies which are often G rated. Additionally we see that it seems like PG-13 movies have similar mean budgets across all Bechdel ratings with only the Bechdel rating of 3 being slightly lower, which is an interesting thing to note. There are some guidance ratings where Bechdel rating does seem to make a difference in the mean budget. We can see this particularly in the G rated movies. However, there does not appear to be a clear relationship between the mean budget of films within a Bechdel rating for any of the movie ratings, which maybe suggests that Bechdel rating isn’t something considered when deciding the budget for a movie.
To conclude, budget seems to have have increased over time. We also see that a movie’s guidance rating seems to be related to budget. For example we see that G rated movies seem to have higher budgets on average. However, it is not super clear that Bechdel rating is a factor in a movies budget.
In the second question we ask whether or not a movie’s Bechdel test rating impacts its overall performance. Hollywood has been infamously bad at quality representation of women and we wanted to know whether or not that is simply because of how it performs with audiences. If movies with quality representation do more poorly overall, then we gain some insight into why there is limited representation. We chose to measure overall performance using gross revenue and awards, since if a movie wins a lot of awards/accolades and makes lots of money, then that is a decent mark of its success.
In the graph above we observe that there does not seem to be a significant relationship between the Bechdel rating and the number of awards that a movie won. If a combination had more or less observations than expected we would see the boxes of the mosaic plot filled with red or blue cells. This means that having good female representation in movies doesn’t seem to hurt or help the number of awards a movie got.
In this graph, we see that for most Bechdel ratings there does not seem to be a clear association between year and a film’s gross level. However, for Bechdel rating 3, there may be a slightly positive association between the two variables, meaning that for this specific Bechdel rating, film’s gross levels have grown over time. This could make sense as society (audiences) have likely become more accepting/willing to see better female representation on screen over time as society has become more progressive, and therefore, these types of films end up doing better (earning higher gross levels) in more recent years. However, to ensure that this positive trend does actually exist, we should conduct a statistical test to see if the interaction between year and Bechdel test rating is significant in a model with gross level as the response variable.
##
## Call:
## lm(formula = gross ~ year.y * as.factor(rating), data = bechdel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -750044598 -405741409 -222650229 143402172 6922351067
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.306e+08 2.053e+08 3.559 0.000382 ***
## year.y -2.170e+04 1.694e+04 -1.281 0.200242
## as.factor(rating)1 1.037e+08 2.314e+08 0.448 0.654038
## as.factor(rating)2 -2.840e+07 2.604e+08 -0.109 0.913181
## as.factor(rating)3 -1.124e+08 2.245e+08 -0.501 0.616699
## year.y:as.factor(rating)1 4.463e+02 1.903e+04 0.023 0.981292
## year.y:as.factor(rating)2 1.810e+03 2.148e+04 0.084 0.932845
## year.y:as.factor(rating)3 1.141e+04 1.840e+04 0.620 0.535299
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 671500000 on 1767 degrees of freedom
## (18 observations deleted due to missingness)
## Multiple R-squared: 0.01079, Adjusted R-squared: 0.006867
## F-statistic: 2.752 on 7 and 1767 DF, p-value: 0.007617
After conducting this test, we see that none of the interaction terms are significant. This is a very interesting result as previously, when we had solely looked at the graphs, we thought that there was potentially a positive correlation between year and gross level for Bechdel rating 3. However, it turns out that Bechdel test rating has no significant bearing on a film’s gross level over time.
Thus, from the previous two graphs we see that Bechdel rating (related to a movie’s female representation) doesn’t affect total gross level or the number of awards a movie earns. This means that there should be nothing preventing directors from including quality female interactions, at least materially. There have been many studies that cite the importance of representation in movies and the power it has to inspire. Therefore there should be more female interactions in movies.
In the third question we wanted to look at how gender equality in movies has changed over time. We think there is a narrative that things have generally improved over time but we wanted to verify that through data.
The time series plot here shows that bechdel ratings have indeed increased over time. We plotted the number of movies that received each rating across the past 50 years, and observed the trends across groups. Aside from the sheer increase in the total number of movies rated over time, we also saw that while the number of ratings in all three categories were quite similar in the 1970s, movies with a rating of 3 increased at a much faster speed compared to the other two ratings following the year 1995.
The above plot shows a violin plot displaying the conditional distribution of year given its rating on the Bechdel scale. We see that for the highest rating of 3, the average year is the latest, showing that later movies have higher ratings, on average. We see that for the score of 2, which represents two women speaking to each other, has the lower median year, besides 0 which represents unscored. We also see that for the three scored ratings, there are several outliers in the earlier years.
##
## Welch Two Sample t-test
##
## data: as.numeric(lateryears$rating) and as.numeric(earlyyears$rating)
## t = 2.5224, df = 26.194, p-value = 0.009031
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.1341694 Inf
## sample estimates:
## mean of x mean of y
## 2.144878 1.730769
In order to formally test the conclusion we made above, we conducted a t.test between the ratings of movies, with the movies separated into early years and later years. Our t-test gave us a p-value of 0.0001975, which is smaller than our alpha of 0.05. Therefore, we have enough evidence to reject the null and conclude that later movies have higher ratings.
After looking at the relationship between Bechdel rating and overall performance, as well as how Bechdel rating has changed over time we decided to look more closely at what kinds of topics characterize each Bechdel rating.
In order to see which words were most common in each rating level, we first split all of the data into each bechdel rating. For the words, we are looking at the words used to describe the plot of each movie. We then made a word cloud for each of the four ratings. The first word cloud represents movies with a bechdel rating of zero, or unscored, which is not particularly useful for our study. Looking at the second word cloud, representing the movies with a bechdel rating of one, we see that the most common words were “stories” and “life”. The other common words were words that had to do with families and relationships. This conveyed the idea that movies to do with personal life stories tended to score low on the bechdel test. The second word cloud we are looking at represents the movies with a bechdel rating of two. The most common words with a rating of two are “life” and “world”, and the other common words are “mysteries” and “murder”. From this, we can conclude that movies about the world and murder mysteries tend to score a two on the Bechdel test. For the last word cloud, we are looking at movies that scored the highest score on the Bechdel test. The most common words used to describe the plots of these movies are “life” and “woman”. This can be expected, as movies about women are more likely to have dialogue between named women than other movies.
We began by looking at budget and found that movie budgets have increased significantly over time. We also saw that a movie’s budget has a close relationship with how much it will gross. We found that guidance rating has some effect on budget but Bechdel rating does not have a clear effect. Perhaps this points to movie budgets being affected more by the audience’s age and genre rather than their representation. They don’t necessarily see quality gender representation as a reason to fund or defund movies. Because of this we chose to look to see if a movie’s Bechdel rating impacts its overall performance, to see if it is worth sending more funds to it.
When asking whether a movie’s Bechdel rating impacts its overall performance, we found that it doesn’t. In order to look at overall performance we looked at a movie’s total gross level and the number of awards that they won. Using various plots and graphs we saw that there was not a strong relationship between having quality female representation and overall performance. There have been many studies on the societal benefits to quality representation has on society, so we think that if it does not hurt overall performance we should advocate for more female representation in general. It would be interesting to try to quantify the societal benefit of having quality female representation in movies to better solidify that argument.
Next we looked at how the Bechdel ratings of movies have changed over time. There we found that more recent movies tend to perform better on the bechdel test. This generally makes sense as women have played a more dominant role in society in general and there has been increasing pushes to have quality women representation in the media. According to a study in 2021, 60% of movies released that year have passed the Bechdel test. While looking at this we realized one limitation of the Bechdel test is that it is very basic. It might be interesting to add more categories that are more “progressive.” For example the highest rating on the Bechdel scale is given when two named female characters have a conversation that is not about a man. We could instead have another category for two named female characters of color having a conversation about certain topics. We think the Bechdel test sets a low bar that we could build upon in a future study. Furthermore, we have to keep in mind that the dataset we have only has a limited amount of variables for us to explore. With more data and more categories on the Bechdel scale, we could in the future answer broader questions that touch on many more important aspects of cinema such as how central the female roles are, about people of color, and about those working behind the camera.
In the last question we looked at some of the key words in the plots of movies based on their Bechdel ratings. We found pretty stereotypical outcomes. Movies that performed poorly were “war” and “team” related movies which are typically male-centric. Then, for the higher Bechdel ratings, we see that there is more relationship, family, and women related films which are typically female related themes. It would be nice for the keywords in the highest Bechdel rating category to be less stereotypical. While are working towards that in some extent with movies like Captain Marvel, Encanto, and Soul, but as a society, we can continue pushing for more.
In our research we see a lot of improvement in represetation over time but we also see a lot of room for improvement. There is room for improvement in representation and also how we measure representation, which we have begun to explore in this report.