Abstract
In this paper, we present our participation in CLEF 2022 CheckThat! Lab’s Task 1 on detecting check-worthy and verifiable claims and attention-worthy and harmful tweets. We participated in all subtasks of Task 1 for Arabic, Bulgarian, Dutch, English, and Turkish datasets. We investigate the impact of fine-tuning various transformer models and how to increase training data size using machine translation. We also use feed-forward networks with the Manifold Mixup regularization for the respective tasks. We are ranked first in detecting factual claims in Arabic and harmful tweets in Dutch. In addition, we are ranked second in detecting check-worthy claims in Arabic and Bulgarian.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
We were not able to use Spanish for other languages due to the insufficient time to meet the deadlines of the lab.
- 14.
We note that some of the results were absent at the time of submission. Therefore, in our submission we chose the results based on the incomplete results.
References
Abdaoui, A., Pradel, C., Sigel, G.: Load what you need: smaller versions of mutlilingual BERT. In: SustaiNLP/EMNLP (2020)
Alam, F., et al.: Fighting the COVID-19 infodemic in social media: a holistic perspective and a call to arms. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 15, pp. 913–922 (2021)
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)
Ameur, M.S.H., Aliane, H.: AraCOVID19-MFH: Arabic COVID-19 multi-label fake news and hate speech detection dataset (2021)
Antoun, W., Baly, F., Hajj, H.: AraBERT: transformer-based model for arabic language understanding. In: LREC 2020 Workshop Language Resources and Evaluation Conference, p. 9 (2020)
Bondielli, A., Marcelloni, F.: A survey on fake news and rumour detection techniques. Inf. Sci. 497, 38–55 (2019)
Caselli, T., Basile, V., Mitrović, J., Granitzer, M.: HateBERT: retraining BERT for abusive language detection in English. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). Association for Computational Linguistics, Online (2021)
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online (2020)
Delobelle, P., Winters, T., Berendt, B.: RobBERT: a Dutch RoBERTa-based language model. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing (2020)
Hansen, C., Hansen, C., Simonsen, J.G., Lioma, C.: Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss. In: CLEF (Working Notes) (2019)
Haouari, F., Elsayed, T., Mansour, W.: Who can verify this? Finding authorities for rumor verification in twitter. Inf. Process. Manage. 60(4), 103366 (2023)
Inoue, G., Alhafni, B., Baimukan, N., Bouamor, H., Habash, N.: The interplay of variant, size, and task type in Arabic pre-trained language models. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv (Online) (2021)
Kartal, Y.S., Kutlu, M.: TrClaim-19: the first collection for Turkish check-worthy claim detection with annotator rationales. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 386–395 (2020)
Kartal, Y.S., Kutlu, M.: Re-think before you share: a comprehensive study on prioritizing check-worthy claims. IEEE Trans. Comput. Soc. Syst. 10(1), 362–375 (2023)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Lespagnol, C., Mothe, J., Ullah, M.Z.: Information nutritional label and word embedding to estimate information check-worthiness. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 941–944 (2019)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692
Müller, M., Salathé, M., Kummervold, P.E.: COVID-twitter-BERT: a natural language processing model to analyse COVID-19 content on twitter. arXiv preprint arXiv:2005.07503 (2020)
Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! Lab task 1 on identifying relevant claims in tweets. In: Working Notes of CLEF 2022–Conference and Labs of the Evaluation Forum, CLEF 2022, Bologna, Italy (2022)
Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! Lab on fighting the COVID-19 infodemic and fake news detection. In: Barrón-Cedeño, A., et al. (eds.) CLEF 2022. LNCS, vol. 13390, pp. 495–520. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_29
Roozenbeek, J., et al.: Susceptibility to misinformation about COVID-19 around the world. Roy. Soc. Open Sci. 7(10), 201199 (2020)
Safaya, A., Abdullatif, M., Yuret, D.: KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2054–2059. International Committee for Computational Linguistics (2020)
Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 1 on check-worthiness estimation in tweets and political debates. In: CLEF (Working Notes) (2021)
Verma, V., et al.: Manifold mixup: better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447. PMLR (2019)
de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., Noord, G.V., Nissim, M.: BERTje: a dutch BERT model. arXiv:1912.09582 (2019)
Webersinke, N., Kraus, M., Bingler, J., Leippold, M.: ClimateBERT: a pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010 (2021)
Williams, E., Rodrigues, P., Tran, S.: Accenture at CheckThat! 2021: interesting claim identification and ranking with contextually sensitive lexical training data augmentation. arXiv preprint arXiv:2107.05684 (2021)
Zengin, M., Kartal, Y., Kutlu, M.: TOBB ETU at CheckThat! 2021: data engineering for detecting check-worthy claims. In: CEUR Workshop Proceedings (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Eyuboglu, A.B., Altun, B., Arslan, M.B., Sonmezer, E., Kutlu, M. (2023). Fight Against Misinformation on Social Media: Detecting Attention-Worthy and Harmful Tweets and Verifiable and Check-Worthy Claims. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-42448-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42447-2
Online ISBN: 978-3-031-42448-9
eBook Packages: Computer ScienceComputer Science (R0)