Developing a Multilingual Annotated Corpus of Misogyny and Aggression

Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Atul Kr. Ojha

Abstract

In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project). The dataset is collected from comments on YouTube videos and currently contains a total of over 20,000 comments. The comments are annotated at two levels - aggression (overtly aggressive, covertly aggressive, and non-aggressive) and misogyny (gendered and non-gendered). We describe the process of data collection, the tagset used for annotation, and issues and challenges faced during the process of annotation. Finally, we discuss the results of the baseline experiments conducted to develop a classifier for misogyny in the three languages.

Anthology ID:: 2020.trac-1.25
Volume:: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Ritesh Kumar, Atul Kr. Ojha, Bornini Lahiri, Marcos Zampieri, Shervin Malmasi, Vanessa Murdock, Daniel Kadar
Venue:: TRAC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 158–168
Language:: English
URL:: https://aclanthology.org/2020.trac-1.25/
DOI:
Bibkey:
Cite (ACL):: Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, and Atul Kr. Ojha. 2020. Developing a Multilingual Annotated Corpus of Misogyny and Aggression. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pages 158–168, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):: Developing a Multilingual Annotated Corpus of Misogyny and Aggression (Bhattacharya et al., TRAC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.trac-1.25.pdf

PDF Cite Search Fix data