NEW DELHI: Representation of male characters in books is four times more than female ones, claims a new AI-aided study that examined more than 3,000 English-language works published from 1800 to 1950.
Researchers at the USC Viterbi School of Engineering used artificial intelligence to scan the books ranging from science fiction and adventure, to mystery and romance – across short stories, poetry and novels.
The study outlined several methods for defining female prevalence in literature. The researchers utilised named entity recognition, a prominent natural language processing method used to extract gender-specific characters.
“One of the ways we define this is through looking at how many female pronouns are in a book compared to male pronouns,” said Mayank Kejriwal, a research lead at USC’s Information Sciences Institute. The other technique is to quantify how many female characters are the main characters in it.
In the study, the differences between male and female characters prevalence were defined and measured using three robust measures of prevalence, on a corpus of copyright-expired literary texts from the Project Gutenberg English-language corpus.
Using computationally replicable methodologies relying on modern natural language processing tools, it was found that female character prevalence is significantly lower than that of male character prevalence, although the difference declines (while still being significant) when controlling for the gender of the author.
It was also found that male character ratios have not varied much over time in the sample.
Akarsh Nagaraj, co-author of the study, noted the importance of how their methods and the study’s findings imparted them with a greater understanding of biases in society and its implications.
There was a 4:1 ratio of male versus female main characters, the study found.
There were also more negative terms used in connection to the female characters such as ‘weak’ and ‘stupid’ compared to ‘strong’ and ‘power’ for men.
On an average, there are 32 (unique) male and eight female characters per male-authored book compared to 38 male and 21 female characters in female-authored books.
The female character prevalence did not change much over the years from 1800 to 1950, the study said.
The authors say research has continued to shed light on the extent and significance of gender disparity in social, cultural and economic spheres, and more recently, computational tools from the natural language processing literature have been proposed for measuring such disparity using relatively extensive datasets and empirically rigorous methodologies.
They said in the study, they contribute to this line of research by studying gender disparity, at scale, in copyright-expired literary texts published in the pre-modern period (defined in this work as the period ranging from the mid-19th through the mid-20th century).
According to them, one of the challenges in using such tools is to ensure quality control, and by extension, trustworthy statistical analysis.
Another challenge is in using materials and methods that are publicly available and have been established for some time, both to ensure that they can be used and vetted in the future, and also, to add confidence to the methodology itself, they added. (PTI)