Abstract
Fake news detection is a challenging problem in online social media, with considerable social and political impacts. Several methods have already been proposed for the automatic detection of fake news, which are often based on the statistical features of the content or context of news. In this paper, we propose a novel fake news detection method based on Natural Language Inference (NLI) approach. Instead of using only statistical features of the content or context of the news, the proposed method exploits a human-like approach, which is based on inferring veracity using a set of reliable news. In this method, the related and similar news published in reputable news sources are used as auxiliary knowledge to infer the veracity of a given news item. We also collect and publish the first inference-based fake news detection dataset, called FNID, in two formats: the two-class version (FNID-FakeNewsNet) and the six-class version (FNID-LIAR). We use the NLI approach to boost several classical and deep machine learning models, including Decision Tree, Naïve Bayes, Random Forest, Logistic Regression, k-Nearest Neighbors, Support Vector Machine, BiGRU, and BiLSTM along with different word embedding methods including Word2vec, GloVe, fastText, and BERT. The experiments show that the proposed method achieves 85.58% and 41.31% accuracies in the FNID-FakeNewsNet and FNID-LIAR datasets, respectively, which are 10.44% and 13.19% respective absolute improvements.
Similar content being viewed by others
References
Ajao O, Bhowmik D, Zargari S (2019) Sentiment aware fake news detection on online social networks. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2507–2511. IEEE
Amirkhani H, AzariJafari M, Pourjafari Z, Faridan-Jahromi S, Kouhkan Z, Amirak A (2021) FarsTail: A Persian Natural Language Inference Dataset, arXiv:2009.08820
Bakhteev O, Ogaltsov A, Ostroukhov P (2020) Fake News Spreader Detection using Neural Tweet Aggregation. CLEF 2020 Labs and Workshops, Notebook Papers, CEUR-WS.org
Behzad B, Bheem B, Elizondo D, Marsh D, Martonosi S (2021) Prevalence and Propagation of Fake News, arXiv:2106.09586
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Ass Comput Linguist ics, 5:135–146. MIT Press
Bowman SR, Angeli G, Potts C, Manning DD (2015) A large annotated corpus for learning natural language inference, arXiv:1508.05326
Breiman L (2001) Random forests: Machine learning, vol 45. Springer, pp 5–32
Chen Q, Zhu X, Ling Z, Wei S, Jiang H, Inkpen D (2016) Enhanced lstm for natural language inference. arXiv:1609.06038
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv:1406.1078
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data (EMNLP)
Della Vedova ML, Tacchini E, Moret S, Ballarin G, DiPierro M, de Alfaro L (2018) Automatic online fake news detection combining content and social signals. 2018 22nd Conference of Open Innovations Association (FRUCT), pp 272–279. IEEE
Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE, pp 1597–1600
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186
Dong X, Victor U, Qian L (2020) Two-path Deep Semi-supervised Learning for Timely Fake News Detection, arXiv:2002.00763
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inf 35, 352–359. Elsevier
Farajtabar M, Yang J, Ye X, Xu H, Trivedi R, Khalil E, Li S, Song L, Zha H (2017) Fake News Mitigation via Point Process Based Intervention: International conference on machine learning, pp 1097–1106, PMLR
Golbeck J, Mauriello M, Auxier B, Bhanushali Keval H, Bonk C, Bouzaghrane MA, Buntain C, Chanduka R, Cheakalos P, Everett Jennine B et al (2018) Fake news vs satire: A dataset and analysis. Proceedings of the 10th ACM Conference on Web Science, pp 17–21
Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp 2018
Hakak S, Khan WZ, Bhattacharya S, Reddy GT, Choo K-R (2020) Propagation of fake news on social media: challenges and opportunities. International Conference on Computational Data and Social Networks, pp 345–353. Springer
Hakak S, Alazab M, Khan S, Gadekallu TR, Maddikunta PKR, Khan WZ (2021) An ensemble machine learning approach through effective feature extraction to classify fake news. Fut Gener Comput Syst 117:47–58. Elsevier
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. MIT Press
Holtzman A, Buys J, Forbes M, Bosselut A, Golub D, Choi Y (2018) Learning to Write with Cooperative Discriminators. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Assoc Comput Linguist:1638–1649
Horne BD, Adali S (2017) This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. Eleventh International AAAI Conference on Web and Social Media
Hu H, Richardson K, Xu L, Li L, Kuebler S, Moss LS (2020) OCNLI: Original Chinese Natural Language Inference, arXiv:2010.05444
Jiang S, Chen X, Zhang L, Chen S, Liu H (2019) User-Characteristic Enhanced Model for Fake News Detection in Social Media. CCF International conference on natural language processing and chinese computing, pp 634–646. Springer
Jiang L, Wang D, Cai Z, Yan X (2007) Survey of improving naive bayes for classification. International conference on advanced data mining and applications. Springer, pp 134–145
Kaliyar RK, Goswami A, Narang P (2021) FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed Tools Appl 80(8):11765–11788. Springer
Karimi H, Roy P, Saba-Sadiya S, Tang J (2018) Multi-source multi-class fake news detection. Proc 27th Int Conf Comput Linguisti:1546–1557
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585.IEEE
Khot T, Sabharwal A, Clark PS (2018) A textual entailment dataset from science question answering. Thirty-Second AAAI Conference on Artificial Intelligence
Kumar P J S, Devi PR, Sai NR, Kumar S, Benarji T (2021) Battling Fake News A Survey on Mitigation Techniques and Identification. 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, pp 829–835
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. Nature Publishing Group
Li P, Yu H, Zhang W, Xu G, Sun X (2020) SA-NLI: A supervised attention based framework for natural language inference, Elsevier, Neurocomputing
Liu X, He P, Chen W, Gao J (2019) Improving multi-task deep neural networks via knowledge distillation for natural language understanding, arXiv:1904.09482
Li X, Lu P, Hu, Wang X, Lu L (2021) A novel self-learning semi-supervised deep learning network to detect fake news on social media. Multimedia Tools and Applications. Springer, pp 1–9
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach, . arXiv:1907.11692
MacCartney B (2009) Natural language inference. Stanford University
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf PSyste:3111–3119
Moreno-Sandoval LG, Del Puertas EAP, Quimbaya AP, Alvarado-Valencia JA (2020) Assembly of Polarity: Emotion and user statistics for detection of fake profiles. CLEF 2020 Labs and Workshops, Notebook Papers, CEUR-WS.org
Noureen J, Asif M (2017) Crowdsensing: socio-technical challenges and opportunities. IJACSA 8:363–369
Pamungkas EW, Basile V, Patti V (2019) Stance classification for rumour analysis in Twitter: Exploiting affective information and conversation structure, arXiv:1901.01911
Parikh AP, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. arXiv:1606.01933
Pasunuru R, Bansal M (2017) Reinforced video captioning with entailment rewards. CoRR, arXiv:1708.02300
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Pradhan A (2012) Support vector machine-a survey, vol 2
Reddy H, Raj N, Gala M, Basava A (2020) Text-mining-based Fake News Detection Using Ensemble Methods. International journal of automation and computing, pp 1–12 Springer
Ross QJ. (1986) Induction of decision trees. Mach Learn 1:81–106. Springer
Sadeghi F, Bidgoly AJ, Amirkhani H (2020) FNID: Fake News Inference Dataset. IEEE Dataport. https://doi.org/10.21227/fbzd-sw81
Shabani S, Sokhn M (2018) Hybrid machine-crowd approach for fake news detection. 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC), pp 299–306. IEEE
Shu K, Mahudeswaran D, Wang S, Lee D, Liu H (2018) FakeNewsNet: A data repository with news content, social context and dynamic information for studying fake news on social media, arXiv:1809.01286
Shu K, Mahudeswaran D, Liu H (2019) Fakenewstracker: a tool for fake news collection, detection, and visualization. Comput Math Organ Theory 25:60–71. Springer
Shu K, Zhou X, Wang S, Zafarani R, Liu H (2019) The role of user profiles for fake news detection. Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp 436–439
Silverman C, Strapagiel L, Shaban H, Hall E, Singer-Vine J (2016) Hyperpartisan Facebook pages are publishing false and misleading information at an alarming rate. Buzzfeed News 20
Talman A, Yli-Jyrä A, Tiedemann J (2018) Natural language inference with hierarchical bilstm max pooling architecture, arXiv:1808.08762
Thorne J, Vlachos A, Cocarascu O, Christodoulopoulos C, Mittal A (2018) The fact extraction and VERification (FEVER) shared task proceedings of the first workshop on fact extraction and VERification (FEVER). Assoc Comput Linguist:1–9
Trivedi H, Kwon H, Khot T, Sabharwal A, Balasubramanian N (2019) Repurposing Entailment for Multi-Hop Question Answering Tasks, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Assoc Comput Linguist:2948–2958
Wang W, Yang L (2017) Liar pants on fire: A new benchmark dataset for fake news detection, arXiv:1705.00648
Wang Y, Ma F, Jin Z, Yuan Y, Xun G, Jha K, Su L, Gao J (2018) Eann: Event adversarial neural networks for multi-modal fake news detection
Vlachos A, Riedel S (2014) Fact checking: Task definition and dataset construction. Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, pp 18–22
Williams A, Nangia N, Bowman SR (2017) A broad-coverage challenge corpus for sentence understanding through inference, arXiv:1704.05426
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Inf Process Syst:5753–5763
Zhou X, Zafarani R (2018) A survey of fake news: Fundamental theories, Detection Methods, and Opportunities, arXiv:1812.00315
Zubiaga A, Aker A, Bontcheva K, Liakata M, Procter R (2018) Detection and resolution of rumours in social media: A survey, ACM Computing Surveys (CSUR), vol 51. ACM, New York, pp 1–36
Zhao Z, Zhao J, Sano Y, Levy O, Takayasu H, Takayasu M, Li D, Wu J, Havlin S (2020) Fake news propagates differently from real news even at early stages of spreading. EPJ Data Sci 9:11–14. SpringerOpen
Zhou X, Zafarani R (2019) Network-based Fake News Detection: A Pattern-driven Approach. ACM SIGKDD Explor Newslett 21, 2, 48–60. ACM, New York
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sadeghi, F., Bidgoly, A.J. & Amirkhani, H. Fake news detection on social media using a natural language inference approach. Multimed Tools Appl 81, 33801–33821 (2022). https://doi.org/10.1007/s11042-022-12428-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12428-8