Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 778))

1697 Accesses
7 Citations

Abstract

Social networking platforms provide a conduit to disseminate our ideas, views, and thoughts and proliferate information. This has led to the amalgamation of English with natively spoken languages. Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. Thus, the worldwide hate speech detection rate of around 44% drops even more considering the content in Indian colloquial languages and slangs. In this paper, we propose a methodology for efficient detection of unstructured code-mix Hinglish language. Fine-tuning-based approaches for Hindi-English code-mixed language are employed by utilizing contextual-based embeddings such as embeddings for language models (ELMo), FLAIR, and transformer-based bidirectional encoder representations from transformers (BERT). Our proposed approach is compared against the pre-existing methods and results are compared for various datasets. Our model outperforms the other methods and frameworks.

These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 143.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 179.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 179.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Language-Free Hate Speech Identification on Code-mixed Conversational Tweets

Hate Speech Detection on Code-Mixed Dataset Using a Fusion of Custom and Pre-trained Models with Profanity Vector Augmentation

Article 24 May 2022

Detection of Hate Speech in Hinglish Language

References

Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 1638–1649
Google Scholar
Akbik A, Bergmann T, Blythe D, Rasul K, Schweter S, Vollgraf R (2019) Flair: an easy-to-use framework for state-of-the-art nlp. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations), pp 54–59
Google Scholar
Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi-English code-mixed social media text for hate speech detection. https://doi.org/10.18653/v1/w18-1105
Burch S (2019) YouTube deletes 500 million comments in fight against ‘hate speech.’ TheWrap. https://www.thewrap.com/youtube-deletes-500-million-comments-in-fight-against-hate-speech/
Cement J (2020) Social media: active usage penetration in selected countries 2020. Retrieved from https://www.statista.com/statistics/282846/regular-social-networking-usage-penetration-worldwide-by-country/
Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: Proceedings—2012 ASE/IEEE international conference on privacy, security, risk and trust and 2012 ASE/IEEE international conference on social computing, SocialCom/PASSAT 2012. https://doi.org/10.1109/SocialCom-PASSAT.2012.55
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46
Article Google Scholar
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th international conference on web and social media, ICWSM 2017
Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019—2019 conference of the North American chapter of the association for computational linguistics: human language technologies—proceedings of the conference
Google Scholar
Do HTT, Huynh HD, Van Nguyen K, Nguyen NLT, Nguyen AGT (2019) Hate speech detection on Vietnamese social media text using the bidirectional-lstm model. arXiv preprint arXiv:1911.03648
Gelber K, McNamara L (2016) Evidencing the harms of hate speech. Soc Ident 22(3):324–341
Article Google Scholar
Leets L (2002) Experiencing hate speech: perceptions and responses to anti-semitism and antigay speech. J Soc Issues 58(2):341–361
Article Google Scholar
Mathur P, Sawhney R, Ayyar M, Shah R (2019) Did you offend me? Classification of offensive tweets in Hinglish language. https://doi.org/10.18653/v1/w18-5118
Mathur P, Shah R, Sawhney R, Mahata D (2019) Detecting offensive tweets in Hindi-English code-switched language. https://doi.org/10.18653/v1/w18-3504
Mehta I (2020) Twitter sees 900% increase in hate speech towards China because coronavirus. The Next Web. https://thenextweb.com/world/2020/03/27/twitter-sees-900-increase-in-hate-speech-towards-china-because-coronavirus/
Mozafari M, Farahbakhsh R, Crespi N (2020) A BERT-based transfer learning approach for hate speech detection in online social media. Stud Comput Intell. https://doi.org/10.1007/978-3-030-36687-2_77
Article Google Scholar
Mubarak H, Darwish K, Magdy W (2017) Abusive language detection on Arabic social media. https://doi.org/10.18653/v1/w17-3008
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018). Deep contextualized word representations. In: NAACL HLT 2018—2018 Conference of the North American Chapter of the association for computational linguistics: human language technologies—proceedings of the conference. https://doi.org/10.18653/v1/n18-1202
Raisi E, Huang B (2017) Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2017. https://doi.org/10.1145/3110025.3110049
SeleniumHQ/selenium (2007) GitHub. https://github.com/SeleniumHQ/Selenium
Sinha RMK, Thakur A (2005) Machine translation of bi-lingual Hindi-English (Hinglish) Text. In: 10th machine translation summit (MT Summit X).
Google Scholar
Spertus E (1997) Smokey: automatic recognition of hostile messages. In: Innovative applications of artificial intelligence—conference proceedings
Google Scholar
Turc I, Chang M-W, Lee K, Toutanova K (2019) Well-read students learn better: the impact of student initialization on knowledge distillation. ArXiv.
Google Scholar
Twitter Scraper (2017) Github. https://github.com/taspinar/twitterscraper
Unidecode (2019) PyPI. https://pypi.org/project/Unidecode/
United Nations (2020) UN strategy and plan of action on hate speech. https://www.un.org/en/genocideprevention/hate-speech-strategy.shtml
Waseem Z, Hovy D (2016) Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. https://doi.org/10.18653/v1/n16-2013
Wikipedia contributors (2020) Hinglish. Wikipedia. https://en.wikipedia.org/wiki/Hinglish
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820.

Download references

Author information

Authors and Affiliations

Symbiosis Institute of Technology, Pune, India
Ananya Srivastava, Mohammed Hasan & Bhargav Yagnik
Symbiosis Centre for Applied Artificial Intelligence, Pune, India
Rahee Walambe & Ketan Kotecha

Authors

Ananya Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Bhargav Yagnik
View author publications
You can also search for this author in PubMed Google Scholar
Rahee Walambe
View author publications
You can also search for this author in PubMed Google Scholar
Ketan Kotecha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rahee Walambe .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Sharda University, Greater Noida, Uttar Pradesh, India
Ankur Choudhary
Department of Computer Science and Engineering, Sharda University, Greater Noida, Uttar Pradesh, India
Arun Prakash Agrawal
Asia Pacific Centre for Analytics (APCA), Asia Pacific University of Technology and Innovation (APU), Kuala Lumpur, Malaysia
Rajasvaran Logeswaran
Information Technology, University of South Florida Sarasota–Manatee Campus, Sarasota, FL, USA
Bhuvan Unhelkar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Srivastava, A., Hasan, M., Yagnik, B., Walambe, R., Kotecha, K. (2021). Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media. In: Choudhary, A., Agrawal, A.P., Logeswaran, R., Unhelkar, B. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 778. Springer, Singapore. https://doi.org/10.1007/978-981-16-3067-5_8

Download citation

DOI: https://doi.org/10.1007/978-981-16-3067-5_8
Published: 27 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3066-8
Online ISBN: 978-981-16-3067-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Language-Free Hate Speech Identification on Code-mixed Conversational Tweets

Hate Speech Detection on Code-Mixed Dataset Using a Fusion of Custom and Pre-trained Models with Profanity Vector Augmentation

Detection of Hate Speech in Hinglish Language

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Language-Free Hate Speech Identification on Code-mixed Conversational Tweets

Hate Speech Detection on Code-Mixed Dataset Using a Fusion of Custom and Pre-trained Models with Profanity Vector Augmentation

Detection of Hate Speech in Hinglish Language

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation