Computer Science > Social and Information Networks

arXiv:2209.03162 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 7 Sep 2022]

Title:Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News

Authors:Mohammad Majid Akhtar, Bibhas Sharma, Ishan Karunanayake, Rahat Masood, Muhammad Ikram, Salil S. Kanhere

View PDF

Abstract:COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus. Misinformation spread through online social networks (OSN) often misled people from following correct medical practices. In particular, OSN bots have been a primary source of disseminating false information and initiating cyber propaganda. Existing work neglects the presence of bots that act as a catalyst in the spread and focuses on fake news detection in 'articles shared in posts' rather than the post (textual) content. Most work on misinformation detection uses manually labeled datasets that are hard to scale for building their predictive models. In this research, we overcome this challenge of data scarcity by proposing an automated approach for labeling data using verified fact-checked statements on a Twitter dataset. In addition, we combine textual features with user-level features (such as followers count and friends count) and tweet-level features (such as number of mentions, hashtags and urls in a tweet) to act as additional indicators to detect misinformation. Moreover, we analyzed the presence of bots in tweets and show that bots change their behavior over time and are most active during the misinformation campaign. We collected 10.22 Million COVID-19 related tweets and used our annotation model to build an extensive and original ground truth dataset for classification purposes. We utilize various machine learning models to accurately detect misinformation and our best classification model achieves precision (82%), recall (96%), and false positive rate (3.58%). Also, our bot analysis indicates that bots generated approximately 10% of misinformation tweets. Our methodology results in substantial exposure of false information, thus improving the trustworthiness of information disseminated through social media platforms.

Subjects:	Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2209.03162 [cs.SI]
	(or arXiv:2209.03162v1 [cs.SI] for this version)
	https://doi.org/10.48550/arXiv.2209.03162

Submission history

From: Mohammad Majid Akhtar [view email]
[v1] Wed, 7 Sep 2022 13:55:59 UTC (294 KB)

Computer Science > Social and Information Networks

Title:Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Social and Information Networks

Title:Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators