[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3632754.3633278acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
extended-abstract

Overview of the HASOC Subtracks at FIRE 2023: Hate Speech and Offensive Content Identification in Assamese, Bengali, Bodo, Gujarati and Sinhala

Published: 12 February 2024 Publication History

Abstract

The evaluation of content moderation systems requires reliable benchmark data. This task becomes particularly formidable for low-resource languages, where obtaining or curating such data poses significant challenges. Addressing this issue, HASOC 2023 organised various shared tasks focused on identifying offensive content in low-resource languages. This paper reports on tasks for hate speech detection in several Indo-Aryan languages—Assamese, Bengali, Gujarati, and Sinhala as well as a Sino-Tibetan language, Bodo, for which limited linguistic resources currently exist. The shared task involved the compilation of multiple datasets. In total, nearly 200 runs were submitted by more than 30 teams, which are presented and analysed in this report.

References

[1]
Branco Di Fátima. 2023. Hate Speech on Social Media: A Global Approach. LabCom Books & EdiPUCE, Covilhã, Portugal. https://doi.org/10.25768/654-916-9
[2]
Saurabh Sampatrao Gaikwad, Tharindu Ranasinghe, Marcos Zampieri, and Christopher Homan. 2021. Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), Ruslan Mitkov and Galia Angelova (Eds.). INCOMA Ltd., Held Online, 437–443. https://aclanthology.org/2021.ranlp-1.50
[3]
Koyel Ghosh, Apurbalal Senapati, Mwnthai Narzary, and Maharaj Brahma. 2023. Hate Speech Detection in Low-Resource Bodo and Assamese Texts with ML-DL and BERT Models. Scalable Computing: Practice and Experience 24, 4 (2023), 941–955.
[4]
Koyel Ghosh, Debarshi Sonowal, Abhilash Basumatary, Bidisha Gogoi, and Apurbalal Senapati. 2023. Transformer-Based Hate Speech Detection in Assamese. In 2023 IEEE Guwahati Subsection Conference (GCON). 1–5. https://doi.org/10.1109/GCON58516.2023.10183497
[5]
Horacio Jarquín-Vásquez, Delia Irazú Hernández-Farías, Luis Joaquín Arellano, Hugo Jair Escalante, Luis Villaseñor-Pineda, Manuel Montes, Fernando Sanchez-Vega, 2023. Overview of da-vincis at iberlef 2023: Detection of aggressive and violent incidents from social media in spanish. Procesamiento del Lenguaje Natural 71 (2023), 351–360.
[6]
Thomas Mandl, Sandip Modha, Anand Kumar M, and Bharathi Raja Chakravarthi. 2020. Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German. In FIRE 2020: Forum for Information Retrieval Evaluation, Hyderabad, India, December 16-20, 2020, Prasenjit Majumder, Mandar Mitra, Surupendu Gangopadhyay, and Parth Mehta (Eds.). ACM, 29–32. https://doi.org/10.1145/3441501.3441517
[7]
Thomas Mandl, Sandip Modha, Prasenjit Majumder, Daksh Patel, Mohana Dave, Chintak Mandlia, and Aditya Patel. 2019. Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. In Proceedings of the 11th forum for information retrieval evaluation. 14–17.
[8]
Thomas Mandl, Sandip Modha, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Prasenjit Majumder, Johannes Schäfer, Tharindu Ranasinghe, Marcos Zampieri, Durgesh Nandini, and Amit Kumar Jaiswal. 2021. Overview of the HASOC Subtrack at FIRE 2021: HateSpeech and Offensive Content Identification in English and Indo-Aryan Languages. In Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation, Gandhinagar, India, December 13-17, 2021(CEUR Workshop Proceedings, Vol. 3159), Parth Mehta, Thomas Mandl, Prasenjit Majumder, and Mandar Mitra (Eds.). CEUR-WS.org, 1–19. http://ceur-ws.org/Vol-3159/T1-1.pdf
[9]
Sandip Modha, Thomas Mandl, Prasenjit Majumder, Shrey Satapara, Tithi Patel, and Hiren Madhu. 2022. Overview of the HASOC Subtrack at FIRE 2022: Identification of Conversational Hate-Speech in Hindi-English Code-Mixed and German Language. In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS.org.
[10]
Tharindu Ranasinghe, Isuri Anuradha, Damith Premasiri, Kanishka Silva, Hansi Hettiarachchi, Lasitha Uyangodage, and Marcos Zampieri. 2022. Sold: Sinhala offensive language dataset. arXiv preprint arXiv:2212.00851 (2022).
[11]
Tharindu Ranasinghe and Marcos Zampieri. 2020. Multilingual Offensive Language Identification with Cross-lingual Embeddings. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 5838–5844. https://doi.org/10.18653/v1/2020.emnlp-main.470
[12]
Julian Risch, Anke Stoll, Lena Wilms, and Michael Wiegand. 2021. Overview of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments. Association for Computational Linguistics, 1–12. https://aclanthology.org/2021.germeval-1.1
[13]
Shrey Satapara, Sandip Modha, Thomas Mandl, Hiren Madhu, and Prasenjit Majumder. 2021. Overview of the HASOC Subtrack at FIRE 2021: Conversational Hate Speech Detection in Code-mixed language. In Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation. CEUR. https://ceur-ws.org/Vol-3159/T1-2.pdf
[14]
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation. 75–86.
[15]
Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, and Çağrı Çöltekin. 2020. SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020). In Proceedings of SemEval.
[16]
Marcos Zampieri, Tharindu Ranasinghe, Mrinal Chaudhari, Saurabh Gaikwad, Prajwal Krishna, Mayuresh Nene, and Shrunali Paygude. 2022. Predicting the type and target of offensive social media posts in Marathi. Social Network Analysis and Mining 12, 1 (09 Jul 2022), 77. https://doi.org/10.1007/s13278-022-00906-8

Index Terms

  1. Overview of the HASOC Subtracks at FIRE 2023: Hate Speech and Offensive Content Identification in Assamese, Bengali, Bodo, Gujarati and Sinhala

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation
      December 2023
      170 pages
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 February 2024

      Check for updates

      Author Tags

      1. Assamese
      2. Bengali
      3. Bodo
      4. Gujarati
      5. Hate speech
      6. Multilingual Datasets
      7. Sinhala
      8. Social media
      9. Under-resourced languages

      Qualifiers

      • Extended-abstract
      • Research
      • Refereed limited

      Conference

      FIRE 2023

      Acceptance Rates

      Overall Acceptance Rate 19 of 64 submissions, 30%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 52
        Total Downloads
      • Downloads (Last 12 months)52
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 02 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media