Abstract
Social networking platforms provide a conduit to disseminate our ideas, views, and thoughts and proliferate information. This has led to the amalgamation of English with natively spoken languages. Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. Thus, the worldwide hate speech detection rate of around 44% drops even more considering the content in Indian colloquial languages and slangs. In this paper, we propose a methodology for efficient detection of unstructured code-mix Hinglish language. Fine-tuning-based approaches for Hindi-English code-mixed language are employed by utilizing contextual-based embeddings such as embeddings for language models (ELMo), FLAIR, and transformer-based bidirectional encoder representations from transformers (BERT). Our proposed approach is compared against the pre-existing methods and results are compared for various datasets. Our model outperforms the other methods and frameworks.
These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 1638–1649
Akbik A, Bergmann T, Blythe D, Rasul K, Schweter S, Vollgraf R (2019) Flair: an easy-to-use framework for state-of-the-art nlp. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations), pp 54–59
Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi-English code-mixed social media text for hate speech detection. https://doi.org/10.18653/v1/w18-1105
Burch S (2019) YouTube deletes 500 million comments in fight against ‘hate speech.’ TheWrap. https://www.thewrap.com/youtube-deletes-500-million-comments-in-fight-against-hate-speech/
Cement J (2020) Social media: active usage penetration in selected countries 2020. Retrieved from https://www.statista.com/statistics/282846/regular-social-networking-usage-penetration-worldwide-by-country/
Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: Proceedings—2012 ASE/IEEE international conference on privacy, security, risk and trust and 2012 ASE/IEEE international conference on social computing, SocialCom/PASSAT 2012. https://doi.org/10.1109/SocialCom-PASSAT.2012.55
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th international conference on web and social media, ICWSM 2017
Devlin J, Chang MW, Lee K, Toutanova K (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019—2019 conference of the North American chapter of the association for computational linguistics: human language technologies—proceedings of the conference
Do HTT, Huynh HD, Van Nguyen K, Nguyen NLT, Nguyen AGT (2019) Hate speech detection on Vietnamese social media text using the bidirectional-lstm model. arXiv preprint arXiv:1911.03648
Gelber K, McNamara L (2016) Evidencing the harms of hate speech. Soc Ident 22(3):324–341
Leets L (2002) Experiencing hate speech: perceptions and responses to anti-semitism and antigay speech. J Soc Issues 58(2):341–361
Mathur P, Sawhney R, Ayyar M, Shah R (2019) Did you offend me? Classification of offensive tweets in Hinglish language. https://doi.org/10.18653/v1/w18-5118
Mathur P, Shah R, Sawhney R, Mahata D (2019) Detecting offensive tweets in Hindi-English code-switched language. https://doi.org/10.18653/v1/w18-3504
Mehta I (2020) Twitter sees 900% increase in hate speech towards China because coronavirus. The Next Web. https://thenextweb.com/world/2020/03/27/twitter-sees-900-increase-in-hate-speech-towards-china-because-coronavirus/
Mozafari M, Farahbakhsh R, Crespi N (2020) A BERT-based transfer learning approach for hate speech detection in online social media. Stud Comput Intell. https://doi.org/10.1007/978-3-030-36687-2_77
Mubarak H, Darwish K, Magdy W (2017) Abusive language detection on Arabic social media. https://doi.org/10.18653/v1/w17-3008
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018). Deep contextualized word representations. In: NAACL HLT 2018—2018 Conference of the North American Chapter of the association for computational linguistics: human language technologies—proceedings of the conference. https://doi.org/10.18653/v1/n18-1202
Raisi E, Huang B (2017) Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2017. https://doi.org/10.1145/3110025.3110049
SeleniumHQ/selenium (2007) GitHub. https://github.com/SeleniumHQ/Selenium
Sinha RMK, Thakur A (2005) Machine translation of bi-lingual Hindi-English (Hinglish) Text. In: 10th machine translation summit (MT Summit X).
Spertus E (1997) Smokey: automatic recognition of hostile messages. In: Innovative applications of artificial intelligence—conference proceedings
Turc I, Chang M-W, Lee K, Toutanova K (2019) Well-read students learn better: the impact of student initialization on knowledge distillation. ArXiv.
Twitter Scraper (2017) Github. https://github.com/taspinar/twitterscraper
Unidecode (2019) PyPI. https://pypi.org/project/Unidecode/
United Nations (2020) UN strategy and plan of action on hate speech. https://www.un.org/en/genocideprevention/hate-speech-strategy.shtml
Waseem Z, Hovy D (2016) Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. https://doi.org/10.18653/v1/n16-2013
Wikipedia contributors (2020) Hinglish. Wikipedia. https://en.wikipedia.org/wiki/Hinglish
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Srivastava, A., Hasan, M., Yagnik, B., Walambe, R., Kotecha, K. (2021). Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media. In: Choudhary, A., Agrawal, A.P., Logeswaran, R., Unhelkar, B. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 778. Springer, Singapore. https://doi.org/10.1007/978-981-16-3067-5_8
Download citation
DOI: https://doi.org/10.1007/978-981-16-3067-5_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3066-8
Online ISBN: 978-981-16-3067-5
eBook Packages: Computer ScienceComputer Science (R0)