Researchers from the Queensland University of Technology (QUT) have developed a statistical model they hope can send online abuse towards women out of the Twittersphere.
QUT said online abuse targeting women, including threats of harm or sexual violence, has proliferated across all social media platforms, but by using a “sophisticated and accurate” algorithm to detect such posts on Twitter, it’s touting the ability to cut through the “raucous rabble” of millions of tweets to identify misogynistic content.
The algorithm, developed by associate professor Richi Nayak, professor Nicolas Suzor, and research fellow Dr Md Abul Bashar as a collaboration between QUT’s faculties of Science and Engineering and Law and the Digital Media Research Centre, was created by mining a dataset of 1 million tweets.
The tweets were then refined by searching for those containing one of three abusive keywords: Whore, slut, and rape.
The algorithm learns the language as it goes, Nayak explained, first by developing a base-level understanding then augmenting that knowledge with both tweet-specific and abusive language.
“We implemented a deep learning algorithm called Long Short-Term Memory with Transfer Learning, which means that the machine could look back at its previous understanding of terminology and change the model as it goes, learning and developing its contextual and semantic understanding over time,” she said.
The onus is on the user to report abuse they receive, but the team believes this isn’t good enough.
“We hope our machine-learning solution can be adopted by social media platforms to automatically identify and report this content to protect women and other user groups online,” Nayak said.
Nayak said the key challenge in misogynistic tweet detection is understanding the context of a tweet.
“The complex and noisy nature of tweets makes it difficult,” she added. “On top of that, teaching a machine to understand natural language is one of the more complicated ends of data science: Language changes and evolves constantly, and much of meaning depends on context and tone.”
While the system started with a base dictionary and built its vocabulary from there, context and intent had to be monitored to ensure that the algorithm could differentiate between abuse, sarcasm, and friendly use of aggressive terminology.
“Take the phrase ‘get back to the kitchen’ as an example — devoid of context of structural inequality, a machine’s literal interpretation could miss the misogynistic meaning,” Nayak explained. “But seen with the understanding of what constitutes abusive or misogynistic language, it can be identified as a misogynistic tweet.”
She said teaching a machine to differentiate context, without the help of tone and through text alone, was key to the success of the project. QUT said the model identifies misogynistic content with 75% accuracy.
“Sadly, there’s no shortage of misogynistic data out there to work with, but labelling the data was quite labour-intensive,” Nayak added.
“This modelling could also be expanded upon and used in other contexts in the future, such as identifying racism, homophobia, or abuse toward people with disabilities.
“Our end goal is to take the model to social media platforms and trial it in place. If we can make identifying and removing this content easier, that can help create a safer online space for all users.”