Damsel in distress
A recurrent pattern in movies ?

🚀 Introduction

An evildoer has kidnaped the princess! Will someone save her? ..... Pheeew, her saviour, naturally a man, is on his way to rescue her and defeat her evil captor. How would she have escaped this dangerous predicament otherwise, as helpless as she is? Needless to say, our hero needs to be rewarded for his unsolicited good dead. A kiss shall suffice. Seems familiar? The trope of the damsel in distress is particularly recurrent and present in stories throughout history. From Andromeda in greek mythology, passing from the princesses in the Grimm fairy tales, to Lois Lane that needs constant help from Superman, women always need men to take care of them. They are useless on their own. Ah.., what would women do without men?

More generally, women in the media are often portrayed as passive participants of narratives. They are only tools to provide motivation and purpose to the lead male character. Moreover, even though they represent more than half the population, they are underrepresented in movies. Indeed, only 34% of speaking roles in the top-grossing films of 2021 were given to women. In addition, about 85% of these movies featured more men than women that year. Women tend to have more personal-life related goals, whereas, the life goals are much more work related, as shown below. [1] Besides, the female characters in movies tend to have less pivotal roles than their male counterpart. This speaks to a major issue in society at large. Indeed, the prospects of women in today's society are still limited. Gender stereotypes are still rampant, even though a shift of mentality is happening.  We see more and more women in diverse fields and in positions of power. But do we also see this improvement in women's conditions in movies? Are they portrayed differently now than previously? In particular, do they have more power to determine their own fate or are they still only tools to drive male characters' plotlines?

Source: [1] Dr. Martha M. Lauzen, It’s a Man’s (Celluloid) World 2021: Portrayals of Female Characters in the Top U.S. Films of 2021 (2021)

Using the movie summary corpus, our project aims to explore the "activeness" of female characters in movies and their evolution through time. Using natural language processing, the latter would be performed by extracting the grammatically active and passive verbs associated with female and male characters in the plot summary and by investigating the type of power associated with these actions. Our analysis will focus on movies produced in the US as they compose the majority of the movies present in the CMU and are the products of the same culture. Additionally, only the movies released since 1940 are considered as the data from the previous years is insufficient in quantity for our analysis.

Now that we're equipped with all the necessary background, let's start this ADAventure! 🚀

🔎 A glimpse into our data...

To investigate the portrayal of women in movies, we used the CMU movie summary corpus which contains movie summaries from Wikipedia and their metadata extracted from Freebase [2]. For each movie, we will focus on its list of characters and its summary to explore this issue.

Data in numbers

7 353

USA movies 🎬

64 167

Female characters 🦸‍♀️

133 660

Male characters 🕵️

Source: [2] David Bamman and al., Learning Latent Personas of Film Characters, ACL 2013, Sofia, Bulgaria, (2013)

Women are underrepresented in movies

We've seen that the large majority of movies include a higher fraction of male compared to female characters at least in the top grossing movies released in 2021. But, has it always been the case and is the trend changing? As we can observe on the graph, this inequality has always been present. Indeed, the fraction of female characters varies between 25% and 36%. Interestingly, we can see a slight increasing trend from the 80s coinciding with the slow improvement of women rights since the late 70s.

Is the underrepresentation of women the same for all movie genres?

In this plot, we observe that the top 10 genres featuring female or male characters are different. We notice that some genres are shared such as drama, thriller and romantic comedy but others are completely absent for the other gender. Additionally, some of the genres are very stereotypical. For example, 4 out of 10 genres for female characters are related to romance or family whereas male characters are much more present in action and science fiction movies.

Now that we know that the genres in which women are most featured in is different, let’s look at a few of the most common movies genres in the dataset and see if the fractions of male and female characters has evolved differently through time.

In this initial exploratory analysis, we discovered that the fractions of female characters is much smaller than male and that it did not evolve a lot through time. So we were wondering, what are the typical roles and actions that female characters portray in movies? In order to research this question, we will analyze movie summaries by using natural language processing.

📝 Running NLP pipeline...

Here is an example of story that will be analyzed with the same process:

"Once upon a time, there lived a young prince called Flynn Rider. He was known by all to be kind, caring, and handsome. However, on one fateful day, his step-brother Cyrus consumed by jealousy decided to imprison him in a high tower far far away from the family kingdom. Research go on for weeks without success: the prince has vanished. Rapunzel, an ADAventurer from a small hamlet at the border of the kingdom, likes to explore the deep forests with her chameleon Pascal. On one of her wanders, she discovers the high tower where dear Flynn is kept captive. Sensing a human presence in the tower, Pascal rushed to the top where he encountered Flynn. The chameleon highly squeaked due to his surprise which worried Rapunzel who joined him at the top by using her long hair as a rope and saved Flynn. She then denounced Cyrus and his atrocious behaviour who is sentenced to jail."

Table with extraction of active and passive verbs


Above are the tables containing the extracted active and passive verbs and the number of mentions of each character. The NLP process first identifies the characters in the summary and then it counts the number of times each character or its references are mentioned and it extracts the verbs that are linked to them. Active and passive are defined by their grammatical tense. However, if a verb is extracted as active but figures in a list of descriptive verbs, as for example she seems sad, the verb is moved to the passive group. Next, the activeness score is computed by dividing the number of active verbs (after the potential shift of some verbs) by the total number of verbs extracted.
The first table is the result of the NLP process and the second one corresponds to what a human would extract. We can observe that the two tables are almost identical, only a few verbs (indicated by colours) are different which assures that the extraction is well done by our algorithm. Additionally, one can see that the missing/misplaced verbs are mostly present in difficult sentences structures, which is tolerable. Using the same process, all the summaries of the selected movies will be processed.

📊 Data Analysis

Are female characters as essential to plotlines as male characters?

Previously, we noticed that the fraction of female characters in movie metadata is much smaller compared to the fraction of male characters. Even though there seems to be an increase of the fraction of female characters between the 70s and 2010, this fraction was still around 35% in 2010 which is far from parity. Therefore, we wished to analyze if this difference is also present in the movie summaries. Indeed, a character may be present in the list of all characters of a movie, however it does not mean that the character is also mentioned in the summary of the movie as it could just be the housekeeper which appears only once in the movie and is not important for the story. Indeed, one can assume that the characters mentioned in the summaries are quite important for the story and have a certain contribution to its development and especially the ones mentioned several times.
As we can observe below, the fraction of mentions of characters in the summaries is around 35%. This value is very similar to the one computed in the CMU character metadata. Thus, it confirms that female characters are underrepresented and seem to have overall less importance compared to men. In addition, no statistically significant evolution has occurred between the 40s and 2010. In fact, the calculated fraction has remained relatively constant.

In order to analyze the activeness of the characters, we extracted all the verbs related to them and defined if the action was done by the character (active) or done to them (passive). At first glance of the graph, one might think that the men are more often described as doing things, e.g using the active voice. Indeed, the difference in activeness score among male and female characters is always positive through the years. However, the magnitude of this difference is very small (around 3-5%). Secondly, across the years, a decrease in activeness of male characters seems to appear. There is a statistical significant difference but its effect is very small, almost negligible.
By only doing this naive analysis, one might think that our initial hypothesis is incorrect. Indeed, this value seems to indicate that women and men in movies are portrayed to do active actions equally. However, it is important to realize that not all actions are equivalent. For example the two following sentences are both in the active voice, but have completely different level of “activeness” for the associated characters:
“Sophie fell in love with him at first sight.” or “Mark rushed to her rescue.”

What do male and female characters actually do?

As mentioned above, the context and underlying meaning of verbs is crucial to investigate the portrayal of characters. However our NLP pipeline does not keep the context of each verb. Therefore, it is impossible using computational methods to identify interesting clusters in the top occurring verbs without the context. Upon further analysis of the most frequent verbs, we identified three meaningful clusters. Indeed, it appears that a lot of the verbs are linked to violent actions, romantic relationships, and powerful actions. Moreover, these clusters are particularly meaningful as the propensity of each is different for female and male characters. Once these clusters were obtained, we counted the number of verbs and gave a score to each verb based on its number of occurrences. The higher its occurrences, the higher the score.

Clustering of active actions of movie characters Clustering of passive actions of movie characters
Cluster # verbs score Cluster # verbs score Cluster # verbs score Cluster # verbs score
Violence 10 468 Violence 7 291 Violence 30 1527 Violence 27 1293
Love 3 136 Love 6 323 Love 6 140 Love 7 338
Power 15 732 Power 10 534 Power 17 935 Power 16 791
Male Female Male Female

We can see that the stereotypical gender behaviour are indeed also present in the summaries. In fact, men have a high score in violence and power whereas women have a high score in love. Interestingly women have a relatively low score in violence in the active actions, female characters receive a lot of violence in the passive form. This is certainly due to the fact that in many plotlines women are victims of the violence and not the offenders. Here is a sample of our clusters:

Table with extraction of active and passive verbs

Conclusion

We have seen that male characters outnumber female characters by about 2:1. However, the percentage of women characters seems to increase slowly. But how long would it actually take for the parity to be reached if we assume that it continues to grow with the same rate as now?

Place your bet. How long do you think it will take? Scroll to the right on the graph below to reveal the answer!

*Sigh* That’s a long time to wait for women to finally be represented… Moreover, as we have seen, it is not just about having more women characters in movies, but also about how they are portrayed. Indeed, more female and male characters going beyond gender stereotypes are needed in movies.
The very slow evolution as well as the stereotypical portrayal of women can probably be explained in part by the fact that most people involved in the movie industry are mainly men. Indeed, only 17% of directors and writer are women. [3] If we want the movies we watch to be more representative of women in today’s society and stop reinforcing dangerous stereotypical ideas, more women must be given the opportunity to write and tell their own narratives.

Gentlemen, step aside and make way for the damsels in power 👊

Source: [3] Katharina Buchholz, This is how female representation is rising across the film industry , Word Economic Forum (2022)


OUR AMAZING TEAM

Ben M'Rad Imane

SV student 🧪🏀

Dormann Alexia

SV student 🧬🐈

Wifak Imane

SV student 🔬🌊

Maffei Theo

SV student 🥽🍝