[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Fake News Detection on Social Media: A Data Mining Perspective

Published: 01 September 2017 Publication History

Abstract

Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of \fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ine ective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

References

[1]
Sadia Afroz, Michael Brennan, and Rachel Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In ISSP'12.
[2]
Hunt Allcott and Matthew Gentzkow. Social media and fake news in the 2016 election. Technical report, National Bureau of Economic Research, 2017.
[3]
Solomon E. Asch and H. Guetzkow. Effects of group pressure upon the modification and distortion of judgments. Groups, leadership, and men, pages 222--236, 1951.
[4]
Meital Balmas. When fake news becomes real: Combined exposure to multiple news sources and political attitudes of inefficacy, alienation, and cynicism. Communication Research, 41(3):430--454, 2014.
[5]
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI'07.
[6]
Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 us presidential election online discussion. First Monday, 21(11), 2016.
[7]
Prakhar Biyani, Kostas Tsioutsiouliklis, and John Blackmer. "8 amazing secrets for getting more clicks": Detecting clickbaits in news streams using article informality. In AAAI'16.
[8]
Jonas Nygaard Blom and Kenneth Reinecke Hansen. Click bait: Forward-reference as lure in online news headlines. Journal of Pragmatics, 76:87--100, 2015.
[9]
Paul R Brewer, Dannagal Goldthwaite Young, and Michelle Morreale. The impact of real news about fake news: Intertextual processes and political satire. International Journal of Public Opinion Research, 25(3):323--343, 2013.
[10]
Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer, and Matt Stempeck. Characterizing the life cycle of online news stories using social media reactions. In CSCW'14.
[11]
Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. Information credibility on twitter. In WWW'11.
[12]
Abhijnan Chakraborty, Bhargavi Paranjape, Sourya Kakarla, and Niloy Ganguly. Stop clickbait: Detecting and preventing clickbaits in online news media. In ASONAM'16.
[13]
Yimin Chen, Niall J. Conroy, and Victoria L. Rubin. Misleading online content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, pages 15--19. ACM, 2015.
[14]
Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. Anyone can become a troll: Causes of trolling behavior in online discussions. In CSCW '17.
[15]
Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing, 9(6):811--824, 2012.
[16]
Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M. Rocha, Johan Bollen, Filippo Menczer, and Alessandro Flammini. Computational fact checking from knowledge networks. PloS one, 10(6):e0128193, 2015.
[17]
Niall J. Conroy, Victoria L. Rubin, and Yimin Chen. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1):1--4, 2015.
[18]
Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. The spreading of misinformation online. Proceedings of the National Academy of Sciences, 113(3):554--559, 2016.
[19]
Michela Del Vicario, Gianna Vivaldo, Alessandro Bessi, Fabiana Zollo, Antonio Scala, Guido Caldarelli, and Walter Quattrociocchi. Echo chambers: Emotional contagion and group polarization on facebook. Scientific Reports, 6, 2016.
[20]
Thomas G. Dietterich et al. Ensemble methods in machine learning. Multiple classifier systems, 1857:1--15, 2000.
[21]
Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, and Hongyuan Zha. Fake news mitigation via point process based intervention. arXiv preprint arXiv:1703.07823, 2017.
[22]
Song Feng, Ritwik Banerjee, and Yejin Choi. Syntactic stylometry for deception detection. In ACL'12.
[23]
Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. The rise of social bots. Communications of the ACM, 59(7):96--104, 2016.
[24]
Johannes Fürnkranz. A study using n-gram features for text categorization. Austrian Research Institute for Artifical Intelligence, 3(1998):1--10, 1998.
[25]
Ashutosh Garg and Dan Roth. Understanding probabilistic classifiers. ECML'01.
[26]
Matthew Gentzkow, Jesse M. Shapiro, and Daniel F. Stone. Media bias in the marketplace: Theory. Technical report, National Bureau of Economic Research, 2014.
[27]
Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A Zighed. Information diffusion in online social networks: A survey. ACM Sigmod Record, 42(2):17--28, 2013.
[28]
Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In WWW'13.
[29]
Manish Gupta, Peixiang Zhao, and Jiawei Han. Evaluating event credibility on twitter. In PSDM'12.
[30]
David J. Hand and Robert J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine learning, 2001.
[31]
Naeemul Hassan, Chengkai Li, and Mark Tremayne. Detecting check-worthy factual claims in presidential debates. In CIKM'15.
[32]
John Houvardas and Efstathios Stamatatos. N-gram feature selection for authorship identification. Artificial Intelligence: Methodology, Systems, and Applications, pages 77--86, 2006.
[33]
Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. Social spammer detection with sentiment information. In ICDM'14.
[34]
Xia Hu, Jiliang Tang, and Huan Liu. Online social spammer detection. In AAAI'14, pages 59--65, 2014.
[35]
Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. Social spammer detection in microblogging. In IJCAI'13.
[36]
Zhiwei Jin, Juan Cao, Yu-Gang Jiang, and Yongdong Zhang. News credibility evaluation on microblog with a hierarchical propagation model. In ICDM'14.
[37]
Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News verification by exploiting conicting social viewpoints in microblogs. In AAAI'16.
[38]
Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. Novel visual and statistical image features for microblogs news verification. IEEE Transactions on Multimedia, 19(3):598--608, 2017.
[39]
Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk. Econometrica: Journal of the econometric society, pages 263--291, 1979.
[40]
Jean-Noel Kapferer. Rumors: Uses, Interpretation and Necessity. Routledge, 2017.
[41]
David O. Klein and Joshua R. Wueller. Fake news: A legal perspective. 2017.
[42]
Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. Prominent features of rumor propagation in online social media. In ICDM'13, pages 1103--1108. IEEE, 2013.
[43]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436--444, 2015.
[44]
Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: social honeypots+ machine learning. In SIGIR'10.
[45]
Tony Lesce. Scan: Deception detection by scientific content analysis. Law and Order, 38(8):3--6, 1990.
[46]
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. A survey on truth discovery. ACM Sigkdd Explorations Newsletter, 17(2):1--16, 2016.
[47]
Charles X. Ling, Jin Huang, and Harry Zhang. Auc: a statistically consistent and more discriminating measure than accuracy.
[48]
Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Cha. Detecting rumors from microblogs with recurrent neural networks.
[49]
Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. Detect rumors using time series of social context information on microblogging websites. In CIKM'15.
[50]
Amr Magdy and Nayer Wanas. Web-based statistical fact checking of textual documents. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, pages 103--110. ACM, 2010.
[51]
Filippo Menczer. The spread of misinformation in social media. In WWW'16.
[52]
Tanushree Mitra and Eric Gilbert. Credbank: A largescale social media corpus with associated credibility annotations. In ICWSM'15.
[53]
Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. Stance and sentiment in tweets. ACM Transactions on Internet Technology (TOIT), 17(3):26, 2017.
[54]
Fred Morstatter, Harsh Dani, Justin Sampson, and Huan Liu. Can one tamper with the sample api?: Toward neutralizing bias from spam and bot content. In WWW'16.
[55]
Fred Morstatter, Liang Wu, Tahora H. Nazer, Kathleen M. Carley, and Huan Liu. A new approach to bot detection: Striking the balance between precision and recall. In ASONAM'16.
[56]
Subhabrata Mukherjee and Gerhard Weikum. Leveraging joint interactions for credibility analysis in news communities. In CIKM'15.
[57]
Eni Mustafaraj and Panagiotis Takis Metaxas. The fake news spreading plague: Was it preventable? arXiv preprint arXiv:1703.06988, 2017.
[58]
Raymond S. Nickerson. Con rmation bias: A ubiquitous phenomenon in many guises. Review of general psychology, 2(2):175, 1998.
[59]
Brendan Nyhan and Jason Reier. When corrections fail: The persistence of political misperceptions. Political Behavior, 32(2):303--330, 2010.
[60]
Christopher Paul and Miriam Matthews. The russian firehose of falsehood propaganda model.
[61]
Dongping Tian et al. A review on image feature extraction and representation techniques. International Journal of Multimedia and Ubiquitous Engineering, 8(4):385--396, 2013.
[62]
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638, 2017.
[63]
Martin Potthast, Sebastian Köpsel, Benno Stein, and Matthias Hagen. Clickbait detection. In European Conference on Information Retrieval, pages 810--817. Springer, 2016.
[64]
Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. Rumor has it: Identifying misinformation in microblogs. In EMNLP'11.
[65]
Walter Quattrociocchi, Antonio Scala, and Cass R. Sunstein. Echo chambers on facebook. 2016.
[66]
Victoria L. Rubin, Yimin Chen, and Niall J. Conroy. Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):1--4, 2015.
[67]
Victoria L. Rubin, Niall J. Conroy, Yimin Chen, and Sarah Cornwell. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of NAACL-HLT, pages 7--17, 2016.
[68]
Victoria L. Rubin and Tatiana Lukoianova. Truth and deception at the rhetorical structure level. Journal of the Association for Information Science and Technology, 66(5):905--917, 2015.
[69]
Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A hybrid deep model for fake news. arXiv preprint arXiv:1703.06959, 2017.
[70]
Justin Sampson, Fred Morstatter, Liang Wu, and Huan Liu. Leveraging the implicit structure within social media for emergent rumor detection. In CIKM'15.
[71]
Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. Hoaxy: A platform for tracking online misinformation. In WWW'16.
[72]
Baoxu Shi and Tim Weninger. Fact checking in heterogeneous information networks. In WWW'16.
[73]
Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. User identity linkage across online social networks: A review. ACM SIGKDD Explorations Newsletter, 18(2):5--17, 2017.
[74]
Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (TOG), 36(4):95, 2017.
[75]
Eugenio Tacchini, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, and Luca de Alfaro. Some like it hoax: Automated fake news detection in social networks. arXiv preprint arXiv:1704.07506, 2017.
[76]
Henri Tajfel and John C. Turner. An integrative theory of intergroup conict. The social psychology of intergroup relations, 33(47):74, 1979.
[77]
Henri Tajfel and John C. Turner. The social identity theory of intergroup behavior. 2004.
[78]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW'15.
[79]
Jiliang Tang, Yi Chang, and Huan Liu. Mining social media with social theories: a survey. ACM SIGKDD Explorations Newsletter, 15(2):20--29, 2014.
[80]
Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR'16.
[81]
Amos Tversky and Daniel Kahneman. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5(4):297--323, 1992.
[82]
Udo Undeutsch. Beurteilung der glaubhaftigkeit von aussagen. Handbuch der psychologie, 11:26--181, 1967.
[83]
Andreas Vlachos and Sebastian Riedel. Fact checking: Task definition and dataset construction. ACL'14.
[84]
Aldert Vrij. Criteria-based content analysis: A qualitative review of the first 37 studies. Psychology, Public Policy, and Law, 11(1):3, 2005.
[85]
Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu. Attributed signed network embedding. In CIKM'17.
[86]
Suhang Wang, Jiliang Tang, Charu Aggarwal, Yi Chang, and Huan Liu. Signed network embedding in social media. In SDM'17.
[87]
Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. Linked document embedding for classification. In CIKM'16.
[88]
Suhang Wang, Jiliang Tang, Fred Morstatter, and Huan Liu. Paired restricted boltzmann machine for linked data. In CIKM'16.
[89]
Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu. What your images reveal: Exploiting visual contents for point-of-interest recommendation. In WWW'17.
[90]
William Yang Wang. "liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648, 2017.
[91]
Yilin Wang, Suhang Wang, Jiliang Tang, Huan Liu, and Baoxin Li. Unsupervised sentiment analysis for social media images. In IJCAI, pages 2378--2379, 2015.
[92]
Andrew Ward, L. Ross, E. Reed, E. Turiel, and T. Brown. Naive realism in everyday life: Implications for social conict and misunderstanding. Values and knowledge, pages 103--135, 1997.
[93]
Gerhard Weikum. What computers should know, shouldn't know, and shouldn't believe. In WWW'17.
[94]
L. Wu, F. Morstatter, X. Hu, and H. Liu. Chapter 5: Mining misinformation in social media, 2016.
[95]
Liang Wu, Xia Hu, Fred Morstatter, and Huan Liu. Adaptive spammer detection with sparse group modeling. In ICWSM'17.
[96]
Liang Wu, Jundong Li, Xia Hu, and Huan Liu. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In SDM'17.
[97]
Liang Wu, Fred Morstatter, Xia Hu, and Huan Liu. Mining misinformation in social media. Big Data in Complex and Social Networks, pages 123--152, 2016.
[98]
You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. Toward computational fact-checking. Proceedings of the VLDB Endowment, 7(7):589--600, 2014.
[99]
Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 13. ACM, 2012.
[100]
Robert B. Zajonc. Attitudinal effects of mere exposure. Journal of personality and social psychology, 9(2p2):1, 1968
[101]
Robert B. Zajonc. Mere exposure: A gateway to the subliminal. Current directions in psychological science, 10(6):224--228, 2001.
[102]
Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, and Rob Procter. Detection and resolution of rumours in social media: A survey. arXiv preprint arXiv:1704.00656, 2017.

Cited By

View all
  • (2025)Fake News Detection and Classification: A Comparative Study of Convolutional Neural Networks, Large Language Models, and Natural Language Processing ModelsFuture Internet10.3390/fi1701002817:1(28)Online publication date: 9-Jan-2025
  • (2025)Veracity‐Oriented Context‐Aware Large Language Models–Based Prompting Optimization for Fake News DetectionInternational Journal of Intelligent Systems10.1155/int/59201422025:1Online publication date: 15-Jan-2025
  • (2025)Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and ObjectivesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350130737:2(881-895)Online publication date: Feb-2025
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 19, Issue 1
June 2017
59 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/3137597
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2017
Published in SIGKDD Volume 19, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5,123
  • Downloads (Last 6 weeks)523
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Fake News Detection and Classification: A Comparative Study of Convolutional Neural Networks, Large Language Models, and Natural Language Processing ModelsFuture Internet10.3390/fi1701002817:1(28)Online publication date: 9-Jan-2025
  • (2025)Veracity‐Oriented Context‐Aware Large Language Models–Based Prompting Optimization for Fake News DetectionInternational Journal of Intelligent Systems10.1155/int/59201422025:1Online publication date: 15-Jan-2025
  • (2025)Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and ObjectivesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350130737:2(881-895)Online publication date: Feb-2025
  • (2025)Silver Lining in the Fake News Cloud: Can Large Language Models Help Detect Misinformation?IEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34402486:1(14-24)Online publication date: Jan-2025
  • (2025)Tackling misinformation in mobile social networks a BERT-LSTM approach for enhancing digital literacyScientific Reports10.1038/s41598-025-85308-415:1Online publication date: 7-Jan-2025
  • (2025)Disaster Health Care and Resiliency: A Systematic Review of the Application of Social Network Data AnalyticsDisaster Medicine and Public Health Preparedness10.1017/dmp.2024.29418Online publication date: 3-Jan-2025
  • (2025)Harnessing prompt-based large language models for disaster monitoring and automated reporting from social media feedbackOnline Social Networks and Media10.1016/j.osnem.2024.10029545(100295)Online publication date: Jan-2025
  • (2025)A unified multimodal classification framework based on deep metric learningNeural Networks10.1016/j.neunet.2024.106747181(106747)Online publication date: Jan-2025
  • (2025)Adversarial contrastive representation training with external knowledge injection for zero-shot stance detectionNeurocomputing10.1016/j.neucom.2024.128849614(128849)Online publication date: Jan-2025
  • (2025)RPCP-PURI: A robust and precise computational predictor for Phishing Uniform Resource IdentificationJournal of Information Security and Applications10.1016/j.jisa.2024.10395389(103953)Online publication date: Mar-2025
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media