[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media

  • Conference paper
  • First Online:
Applications of Artificial Intelligence and Machine Learning

Abstract

Social networking platforms provide a conduit to disseminate our ideas, views, and thoughts and proliferate information. This has led to the amalgamation of English with natively spoken languages. Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. Thus, the worldwide hate speech detection rate of around 44% drops even more considering the content in Indian colloquial languages and slangs. In this paper, we propose a methodology for efficient detection of unstructured code-mix Hinglish language. Fine-tuning-based approaches for Hindi-English code-mixed language are employed by utilizing contextual-based embeddings such as embeddings for language models (ELMo), FLAIR, and transformer-based bidirectional encoder representations from transformers (BERT). Our proposed approach is compared against the pre-existing methods and results are compared for various datasets. Our model outperforms the other methods and frameworks.

These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 143.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 179.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
GBP 179.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 1638–1649

    Google Scholar 

  2. Akbik A, Bergmann T, Blythe D, Rasul K, Schweter S, Vollgraf R (2019) Flair: an easy-to-use framework for state-of-the-art nlp. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations), pp 54–59

    Google Scholar 

  3. Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi-English code-mixed social media text for hate speech detection. https://doi.org/10.18653/v1/w18-1105

  4. Burch S (2019) YouTube deletes 500 million comments in fight against ‘hate speech.’ TheWrap. https://www.thewrap.com/youtube-deletes-500-million-comments-in-fight-against-hate-speech/

  5. Cement J (2020) Social media: active usage penetration in selected countries 2020. Retrieved from https://www.statista.com/statistics/282846/regular-social-networking-usage-penetration-worldwide-by-country/

  6. Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: Proceedings—2012 ASE/IEEE international conference on privacy, security, risk and trust and 2012 ASE/IEEE international conference on social computing, SocialCom/PASSAT 2012. https://doi.org/10.1109/SocialCom-PASSAT.2012.55

  7. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46

    Article  Google Scholar 

  8. Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th international conference on web and social media, ICWSM 2017

    Google Scholar 

  9. Devlin J, Chang MW, Lee K, Toutanova K (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019—2019 conference of the North American chapter of the association for computational linguistics: human language technologies—proceedings of the conference

    Google Scholar 

  10. Do HTT, Huynh HD, Van Nguyen K, Nguyen NLT, Nguyen AGT (2019) Hate speech detection on Vietnamese social media text using the bidirectional-lstm model. arXiv preprint arXiv:1911.03648

  11. Gelber K, McNamara L (2016) Evidencing the harms of hate speech. Soc Ident 22(3):324–341

    Article  Google Scholar 

  12. Leets L (2002) Experiencing hate speech: perceptions and responses to anti-semitism and antigay speech. J Soc Issues 58(2):341–361

    Article  Google Scholar 

  13. Mathur P, Sawhney R, Ayyar M, Shah R (2019) Did you offend me? Classification of offensive tweets in Hinglish language. https://doi.org/10.18653/v1/w18-5118

  14. Mathur P, Shah R, Sawhney R, Mahata D (2019) Detecting offensive tweets in Hindi-English code-switched language. https://doi.org/10.18653/v1/w18-3504

  15. Mehta I (2020) Twitter sees 900% increase in hate speech towards China because coronavirus. The Next Web. https://thenextweb.com/world/2020/03/27/twitter-sees-900-increase-in-hate-speech-towards-china-because-coronavirus/

  16. Mozafari M, Farahbakhsh R, Crespi N (2020) A BERT-based transfer learning approach for hate speech detection in online social media. Stud Comput Intell. https://doi.org/10.1007/978-3-030-36687-2_77

    Article  Google Scholar 

  17. Mubarak H, Darwish K, Magdy W (2017) Abusive language detection on Arabic social media. https://doi.org/10.18653/v1/w17-3008

  18. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018). Deep contextualized word representations. In: NAACL HLT 2018—2018 Conference of the North American Chapter of the association for computational linguistics: human language technologies—proceedings of the conference. https://doi.org/10.18653/v1/n18-1202

  19. Raisi E, Huang B (2017) Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2017. https://doi.org/10.1145/3110025.3110049

  20. SeleniumHQ/selenium (2007) GitHub. https://github.com/SeleniumHQ/Selenium

  21. Sinha RMK, Thakur A (2005) Machine translation of bi-lingual Hindi-English (Hinglish) Text. In: 10th machine translation summit (MT Summit X).

    Google Scholar 

  22. Spertus E (1997) Smokey: automatic recognition of hostile messages. In: Innovative applications of artificial intelligence—conference proceedings

    Google Scholar 

  23. Turc I, Chang M-W, Lee K, Toutanova K (2019) Well-read students learn better: the impact of student initialization on knowledge distillation. ArXiv.

    Google Scholar 

  24. Twitter Scraper (2017) Github. https://github.com/taspinar/twitterscraper

  25. Unidecode (2019) PyPI. https://pypi.org/project/Unidecode/

  26. United Nations (2020) UN strategy and plan of action on hate speech. https://www.un.org/en/genocideprevention/hate-speech-strategy.shtml

  27. Waseem Z, Hovy D (2016) Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. https://doi.org/10.18653/v1/n16-2013

  28. Wikipedia contributors (2020) Hinglish. Wikipedia. https://en.wikipedia.org/wiki/Hinglish

  29. Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahee Walambe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Srivastava, A., Hasan, M., Yagnik, B., Walambe, R., Kotecha, K. (2021). Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media. In: Choudhary, A., Agrawal, A.P., Logeswaran, R., Unhelkar, B. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 778. Springer, Singapore. https://doi.org/10.1007/978-981-16-3067-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-3067-5_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-3066-8

  • Online ISBN: 978-981-16-3067-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics