Fight Against Misinformation on Social Media: Detecting Attention-Worthy and Harmful Tweets and Verifiable and Check-Worthy Claims

Ahmet Bahadir Eyuboglu¹⁷,
Bahadir Altun¹⁷,
Mustafa Bora Arslan¹⁷,
Ekrem Sonmezer¹⁷ &
…
Mucahid Kutlu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14163))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

934 Accesses

Abstract

In this paper, we present our participation in CLEF 2022 CheckThat! Lab’s Task 1 on detecting check-worthy and verifiable claims and attention-worthy and harmful tweets. We participated in all subtasks of Task 1 for Arabic, Bulgarian, Dutch, English, and Turkish datasets. We investigate the impact of fine-tuning various transformer models and how to increase training data size using machine translation. We also use feed-forward networks with the Manifold Mixup regularization for the respective tasks. We are ranked first in detecting factual claims in Arabic and harmful tweets in Dutch. In addition, we are ranked second in detecting check-worthy claims in Arabic and Bulgarian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Overview of the CLEF–2022 CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection

COVID-19-FAKES: A Twitter (Arabic/English) Dataset for Detecting Misleading Information on COVID-19

Mining the Discussion of Monkeypox Misinformation on Twitter Using RoBERTa

Notes

1.
https://covid19asi.saglik.gov.tr/?_Dil=2.
2.
https://github.com/Carnagie/manifold-mixup-text-classification.
3.
https://pytorch.org/.
4.
https://www.tensorflow.org/.
5.
https://huggingface.co/docs/transformers/index.
6.
https://github.com/google/sentencepiece.
7.
https://huggingface.co/iarfmoose/roberta-base-bulgarian.
8.
https://huggingface.co/bert-base-uncased.
9.
https://huggingface.co/dbmdz/distilbert-base-turkish-cased.
10.
https://huggingface.co/iarfmoose/roberta-small-bulgarian.
11.
https://huggingface.co/savasy/bert-base-turkish-sentiment-cased.
12.
https://www.unhcr.org/figures-at-a-glance.html.
13.
We were not able to use Spanish for other languages due to the insufficient time to meet the deadlines of the lab.
14.
We note that some of the results were absent at the time of submission. Therefore, in our submission we chose the results based on the incomplete results.

References

Abdaoui, A., Pradel, C., Sigel, G.: Load what you need: smaller versions of mutlilingual BERT. In: SustaiNLP/EMNLP (2020)
Google Scholar
Alam, F., et al.: Fighting the COVID-19 infodemic in social media: a holistic perspective and a call to arms. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 15, pp. 913–922 (2021)
Google Scholar
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)
Ameur, M.S.H., Aliane, H.: AraCOVID19-MFH: Arabic COVID-19 multi-label fake news and hate speech detection dataset (2021)
Google Scholar
Antoun, W., Baly, F., Hajj, H.: AraBERT: transformer-based model for arabic language understanding. In: LREC 2020 Workshop Language Resources and Evaluation Conference, p. 9 (2020)
Google Scholar
Bondielli, A., Marcelloni, F.: A survey on fake news and rumour detection techniques. Inf. Sci. 497, 38–55 (2019)
Article Google Scholar
Caselli, T., Basile, V., Mitrović, J., Granitzer, M.: HateBERT: retraining BERT for abusive language detection in English. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). Association for Computational Linguistics, Online (2021)
Google Scholar
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online (2020)
Google Scholar
Delobelle, P., Winters, T., Berendt, B.: RobBERT: a Dutch RoBERTa-based language model. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing (2020)
Google Scholar
Hansen, C., Hansen, C., Simonsen, J.G., Lioma, C.: Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss. In: CLEF (Working Notes) (2019)
Google Scholar
Haouari, F., Elsayed, T., Mansour, W.: Who can verify this? Finding authorities for rumor verification in twitter. Inf. Process. Manage. 60(4), 103366 (2023)
Google Scholar
Inoue, G., Alhafni, B., Baimukan, N., Bouamor, H., Habash, N.: The interplay of variant, size, and task type in Arabic pre-trained language models. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv (Online) (2021)
Google Scholar
Kartal, Y.S., Kutlu, M.: TrClaim-19: the first collection for Turkish check-worthy claim detection with annotator rationales. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 386–395 (2020)
Google Scholar
Kartal, Y.S., Kutlu, M.: Re-think before you share: a comprehensive study on prioritizing check-worthy claims. IEEE Trans. Comput. Soc. Syst. 10(1), 362–375 (2023)
Article Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Lespagnol, C., Mothe, J., Ullah, M.Z.: Information nutritional label and word embedding to estimate information check-worthiness. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 941–944 (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692
Müller, M., Salathé, M., Kummervold, P.E.: COVID-twitter-BERT: a natural language processing model to analyse COVID-19 content on twitter. arXiv preprint arXiv:2005.07503 (2020)
Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! Lab task 1 on identifying relevant claims in tweets. In: Working Notes of CLEF 2022–Conference and Labs of the Evaluation Forum, CLEF 2022, Bologna, Italy (2022)
Google Scholar
Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! Lab on fighting the COVID-19 infodemic and fake news detection. In: Barrón-Cedeño, A., et al. (eds.) CLEF 2022. LNCS, vol. 13390, pp. 495–520. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_29
Chapter Google Scholar
Roozenbeek, J., et al.: Susceptibility to misinformation about COVID-19 around the world. Roy. Soc. Open Sci. 7(10), 201199 (2020)
Google Scholar
Safaya, A., Abdullatif, M., Yuret, D.: KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2054–2059. International Committee for Computational Linguistics (2020)
Google Scholar
Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 1 on check-worthiness estimation in tweets and political debates. In: CLEF (Working Notes) (2021)
Google Scholar
Verma, V., et al.: Manifold mixup: better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447. PMLR (2019)
Google Scholar
de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., Noord, G.V., Nissim, M.: BERTje: a dutch BERT model. arXiv:1912.09582 (2019)
Webersinke, N., Kraus, M., Bingler, J., Leippold, M.: ClimateBERT: a pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010 (2021)
Williams, E., Rodrigues, P., Tran, S.: Accenture at CheckThat! 2021: interesting claim identification and ranking with contextually sensitive lexical training data augmentation. arXiv preprint arXiv:2107.05684 (2021)
Zengin, M., Kartal, Y., Kutlu, M.: TOBB ETU at CheckThat! 2021: data engineering for detecting check-worthy claims. In: CEUR Workshop Proceedings (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

TOBB University of Economics and Technology, Ankara, Turkey
Ahmet Bahadir Eyuboglu, Bahadir Altun, Mustafa Bora Arslan, Ekrem Sonmezer & Mucahid Kutlu

Authors

Ahmet Bahadir Eyuboglu
View author publications
You can also search for this author in PubMed Google Scholar
Bahadir Altun
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Bora Arslan
View author publications
You can also search for this author in PubMed Google Scholar
Ekrem Sonmezer
View author publications
You can also search for this author in PubMed Google Scholar
Mucahid Kutlu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mucahid Kutlu .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Avi Arampatzis
University of Amsterdam, Amsterdam, The Netherlands
Evangelos Kanoulas
CERTH-ITI, Thessaloniki, Greece
Theodora Tsikrika
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Utrecht University, Utrecht, The Netherlands
Anastasia Giachanou
Elsevier, Amsterdam, The Netherlands
Dan Li
University of Amsterdam, Amsterdam, The Netherlands
Mohammad Aliannejadi
University of Lausanne, Lausanne, Switzerland
Michalis Vlachos
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eyuboglu, A.B., Altun, B., Arslan, M.B., Sonmezer, E., Kutlu, M. (2023). Fight Against Misinformation on Social Media: Detecting Attention-Worthy and Harmful Tweets and Verifiable and Check-Worthy Claims. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-42448-9_14
Published: 11 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42447-2
Online ISBN: 978-3-031-42448-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics