[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Fight Against Misinformation on Social Media: Detecting Attention-Worthy and Harmful Tweets and Verifiable and Check-Worthy Claims

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2023)

Abstract

In this paper, we present our participation in CLEF 2022 CheckThat! Lab’s Task 1 on detecting check-worthy and verifiable claims and attention-worthy and harmful tweets. We participated in all subtasks of Task 1 for Arabic, Bulgarian, Dutch, English, and Turkish datasets. We investigate the impact of fine-tuning various transformer models and how to increase training data size using machine translation. We also use feed-forward networks with the Manifold Mixup regularization for the respective tasks. We are ranked first in detecting factual claims in Arabic and harmful tweets in Dutch. In addition, we are ranked second in detecting check-worthy claims in Arabic and Bulgarian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 55.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://covid19asi.saglik.gov.tr/?_Dil=2.

  2. 2.

    https://github.com/Carnagie/manifold-mixup-text-classification.

  3. 3.

    https://pytorch.org/.

  4. 4.

    https://www.tensorflow.org/.

  5. 5.

    https://huggingface.co/docs/transformers/index.

  6. 6.

    https://github.com/google/sentencepiece.

  7. 7.

    https://huggingface.co/iarfmoose/roberta-base-bulgarian.

  8. 8.

    https://huggingface.co/bert-base-uncased.

  9. 9.

    https://huggingface.co/dbmdz/distilbert-base-turkish-cased.

  10. 10.

    https://huggingface.co/iarfmoose/roberta-small-bulgarian.

  11. 11.

    https://huggingface.co/savasy/bert-base-turkish-sentiment-cased.

  12. 12.

    https://www.unhcr.org/figures-at-a-glance.html.

  13. 13.

    We were not able to use Spanish for other languages due to the insufficient time to meet the deadlines of the lab.

  14. 14.

    We note that some of the results were absent at the time of submission. Therefore, in our submission we chose the results based on the incomplete results.

References

  1. Abdaoui, A., Pradel, C., Sigel, G.: Load what you need: smaller versions of mutlilingual BERT. In: SustaiNLP/EMNLP (2020)

    Google Scholar 

  2. Alam, F., et al.: Fighting the COVID-19 infodemic in social media: a holistic perspective and a call to arms. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 15, pp. 913–922 (2021)

    Google Scholar 

  3. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)

  4. Ameur, M.S.H., Aliane, H.: AraCOVID19-MFH: Arabic COVID-19 multi-label fake news and hate speech detection dataset (2021)

    Google Scholar 

  5. Antoun, W., Baly, F., Hajj, H.: AraBERT: transformer-based model for arabic language understanding. In: LREC 2020 Workshop Language Resources and Evaluation Conference, p. 9 (2020)

    Google Scholar 

  6. Bondielli, A., Marcelloni, F.: A survey on fake news and rumour detection techniques. Inf. Sci. 497, 38–55 (2019)

    Article  Google Scholar 

  7. Caselli, T., Basile, V., Mitrović, J., Granitzer, M.: HateBERT: retraining BERT for abusive language detection in English. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). Association for Computational Linguistics, Online (2021)

    Google Scholar 

  8. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online (2020)

    Google Scholar 

  9. Delobelle, P., Winters, T., Berendt, B.: RobBERT: a Dutch RoBERTa-based language model. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online (2020)

    Google Scholar 

  10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  11. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing (2020)

    Google Scholar 

  12. Hansen, C., Hansen, C., Simonsen, J.G., Lioma, C.: Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss. In: CLEF (Working Notes) (2019)

    Google Scholar 

  13. Haouari, F., Elsayed, T., Mansour, W.: Who can verify this? Finding authorities for rumor verification in twitter. Inf. Process. Manage. 60(4), 103366 (2023)

    Google Scholar 

  14. Inoue, G., Alhafni, B., Baimukan, N., Bouamor, H., Habash, N.: The interplay of variant, size, and task type in Arabic pre-trained language models. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv (Online) (2021)

    Google Scholar 

  15. Kartal, Y.S., Kutlu, M.: TrClaim-19: the first collection for Turkish check-worthy claim detection with annotator rationales. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 386–395 (2020)

    Google Scholar 

  16. Kartal, Y.S., Kutlu, M.: Re-think before you share: a comprehensive study on prioritizing check-worthy claims. IEEE Trans. Comput. Soc. Syst. 10(1), 362–375 (2023)

    Article  Google Scholar 

  17. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

  18. Lespagnol, C., Mothe, J., Ullah, M.Z.: Information nutritional label and word embedding to estimate information check-worthiness. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 941–944 (2019)

    Google Scholar 

  19. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692

  20. Müller, M., Salathé, M., Kummervold, P.E.: COVID-twitter-BERT: a natural language processing model to analyse COVID-19 content on twitter. arXiv preprint arXiv:2005.07503 (2020)

  21. Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! Lab task 1 on identifying relevant claims in tweets. In: Working Notes of CLEF 2022–Conference and Labs of the Evaluation Forum, CLEF 2022, Bologna, Italy (2022)

    Google Scholar 

  22. Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! Lab on fighting the COVID-19 infodemic and fake news detection. In: Barrón-Cedeño, A., et al. (eds.) CLEF 2022. LNCS, vol. 13390, pp. 495–520. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_29

    Chapter  Google Scholar 

  23. Roozenbeek, J., et al.: Susceptibility to misinformation about COVID-19 around the world. Roy. Soc. Open Sci. 7(10), 201199 (2020)

    Google Scholar 

  24. Safaya, A., Abdullatif, M., Yuret, D.: KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2054–2059. International Committee for Computational Linguistics (2020)

    Google Scholar 

  25. Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 1 on check-worthiness estimation in tweets and political debates. In: CLEF (Working Notes) (2021)

    Google Scholar 

  26. Verma, V., et al.: Manifold mixup: better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447. PMLR (2019)

    Google Scholar 

  27. de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., Noord, G.V., Nissim, M.: BERTje: a dutch BERT model. arXiv:1912.09582 (2019)

  28. Webersinke, N., Kraus, M., Bingler, J., Leippold, M.: ClimateBERT: a pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010 (2021)

  29. Williams, E., Rodrigues, P., Tran, S.: Accenture at CheckThat! 2021: interesting claim identification and ranking with contextually sensitive lexical training data augmentation. arXiv preprint arXiv:2107.05684 (2021)

  30. Zengin, M., Kartal, Y., Kutlu, M.: TOBB ETU at CheckThat! 2021: data engineering for detecting check-worthy claims. In: CEUR Workshop Proceedings (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mucahid Kutlu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Eyuboglu, A.B., Altun, B., Arslan, M.B., Sonmezer, E., Kutlu, M. (2023). Fight Against Misinformation on Social Media: Detecting Attention-Worthy and Harmful Tweets and Verifiable and Check-Worthy Claims. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42448-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42447-2

  • Online ISBN: 978-3-031-42448-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics