[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3366030.3366039acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Fake News Classification Based on Subjective Language

Published: 22 February 2020 Publication History

Abstract

While many works investigate spread patterns of fake news in social networks, we focus on the textual content. Instead of relying on syntactic representations of documents (aka Bag of Words) as many works do, we seek more robust representations that may better differentiate fake from legitimate news. We propose to consider the subjectivity of news under the assumption that the subjectivity levels of legitimate and fake news are significantly different. For computing the subjectivity level of news, we rely on a set subjectivity lexicons built by Brazilian linguists. We then build subjectivity feature vectors for each news article by calculating the Word Mover's Distance (WMD) between the news and these lexicons considering the embedding the news words lie in, in order to classify the documents. The results demonstrate that our method is more robust than classical text classification approaches, especially in scenarios where training and test domains are different.

References

[1]
Hadeer Ahmed, Issa Traore, and Sherif Saad. 2017. Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. In Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, Issa Traore, Isaac Woungang, and Ahmed Awad (Eds.). Springer International Publishing, Cham, 127--138.
[2]
Hunt Allcott and Matthew Gentzkow. 2017. Social Media and Fake News in the 2016 Election. Working Paper 23089. National Bureau of Economic Research. https://doi.org/10.3386/w23089
[3]
Evelin Amorim, Marcia Cançado, and Adriano Veloso. 2018. Automated Essay Scoring in the Presence of Biased Ratings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 229--237. https://doi.org/10.18653/v1/N18-1021
[4]
Peter Bourgonje, Julian Moreno Schneider, and Georg Rehm. 2017. From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles. In Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism. Association for Computational Linguistics, Copenhagen, Denmark, 84--89. https://doi.org/10.18653/v1/W17-4215
[5]
Jesse Davis and Mark Goadrich. 2006. The Relationship Between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML '06). ACM, New York, NY, USA, 233--240. https://doi.org/10.1145/1143844.1143874
[6]
Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. 2016. The Rise of Social Bots. Commun. ACM 59, 7 (June 2016), 96--104. https://doi.org/10.1145/2818717
[7]
Benjamin Horne and Sibel Adali. 2017. This Just In: Fake News Packs A Lot In Title, Uses Simpler, Repetitive Content in Text Body, More Similar To Satire Than Real News. https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15772/14898
[8]
Gao Huang, Chuan Quo, Matt J. Kusner, Yu Sun, Kilian Q. Weinberger, and Fei Sha. 2016. Supervised Word Mover's Distance. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). Curran Associates Inc., USA, 4869--4877. http://dl.acm.org/citation.cfm?id=3157382.3157641
[9]
Christian Janze and Marten Risius. 2017. Automatic Detection of Fake News on Social Media Platforms. (2017).
[10]
Matt J. Kusner, Yu Sun, Nicholas I. Kolkin, and Kilian Q. Weinberger. 2015. From Word Embeddings to Document Distances. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning-Volume 37 (ICML'15). JMLR.org, 957--966. http://dl.acm.org/citation.cfm?id=3045118.3045221
[11]
Changjun Lee, Jieun Shin, and Ahreum Hong. 2018. Does social media use really make people politically polarized? Direct and indirect effects of social media use on political polarization in South Korea. Telematics and Informatics 35, 1 (2018), 245--254. https://doi.org/10.1016/j.tele.2017.11.005
[12]
Regina Marchi. 2012. With Facebook, blogs, and fake news, teens reject journalistic âĂIJobjectivityâĂİ. Journal of Communication Inquiry 36, 3 (2012), 246--262.
[13]
Rada Mihalcea, Carmen Banea, and Janyce Wiebe. 2007. Learning Multilingual Subjective Language via Cross-Lingual Projections. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, Prague, Czech Republic, 976--983. https://www.aclweb.org/anthology/P07-1123
[14]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1301.html#abs-1301-3781
[15]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations ofWords and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111--3119.
[16]
Tanushree Mitra and Eric Gilbert. 2015. CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations.
[17]
Rafael A. Monteiro, Roney L. S. Santos, Thiago A. S. Pardo, Tiago A. de Almeida, Evandro E. S. Ruiz, and Oto A. Vale. 2018. Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. In Computational Processing of the Portuguese Language. Springer International Publishing, Cham, 324--334.
[18]
S. B. Parikh and P. K. Atrey. 2018. Media-Rich Fake News Detection: A Survey. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 436--441. https://doi.org/10.1109/MIPR.2018.00093
[19]
Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2018. Automatic Detection of Fake News. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 3391--3401.
[20]
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2018. A Stylometric Inquiry into Hyperpartisan and Fake News. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 231--240. https://doi.org/10.18653/v1/P18-1022
[21]
Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2931--2937. https://doi.org/10.18653/v1/D17-1317
[22]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 1135--1144. https://doi.org/10.1145/2939672.2939778
[23]
Takaya Saito and Marc Rehmsmeier. 2015. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE 10, 3 (03 2015), 1--21. https://doi.org/10.1371/journal.pone.0118432
[24]
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake News Detection on Social Media: A Data Mining Perspective. SIGKDD Explor. Newsl. 19, 1 (Sept. 2017), 22--36. https://doi.org/10.1145/3137597.3137600
[25]
C Silverman. 2016. Hyperpartisan Facebook Pages Are Publishing False And Misleading Information At An Alarming Rate. BuzzFeed, Nov. 16. https://www.buzzfeednews.com/article/craigsilverman/partisan-fb-pages-analysis
[26]
Eugenio Tacchini, Gabriele Ballarin, Marco L Della Vedova, Stefano Moret, and Luca de Alfaro. 2017. Some like it hoax: Automated fake news detection in social networks. arXiv preprint arXiv:1704.07506 (2017).
[27]
William Yang Wang. 2017. "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 422--426. https://doi.org/10.18653/v1/P17-2067
[28]
Janyce Wiebe, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. 2004. Learning Subjective Language. Comput. Linguist. 30, 3 (Sept. 2004), 277--308. https://doi.org/10.1162/0891201041850885

Cited By

View all
  • (2024)Extracting Features from Text Flows based on Semantic Similarity for Text Classification: an Approach Inspired by Audio AnalysisJournal of the Brazilian Computer Society10.5753/jbcs.2024.375930:1(297-314)Online publication date: 25-Sep-2024
  • (2024)A Comparative Study of Machine Learning Models for Detecting Fake News Content in Bahasa Indonesia Online Media2024 International Conference on Smart Computing, IoT and Machine Learning (SIML)10.1109/SIML61815.2024.10578272(43-48)Online publication date: 6-Jun-2024
  • (2024)Artificial Intelligence in Fake News Detection and Analysis for Low-Resource LanguagesCongress on Smart Computing Technologies10.1007/978-981-97-5081-8_3(29-45)Online publication date: 30-Oct-2024
  • Show More Cited By

Index Terms

  1. Fake News Classification Based on Subjective Language

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
    December 2019
    709 pages
    © 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    In-Cooperation

    • JKU: Johannes Kepler Universität Linz
    • @WAS: International Organization of Information Integration and Web-based Applications and Services

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 February 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Fake news
    2. Misleading content detection
    3. Subjective Language

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    iiWAS2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Extracting Features from Text Flows based on Semantic Similarity for Text Classification: an Approach Inspired by Audio AnalysisJournal of the Brazilian Computer Society10.5753/jbcs.2024.375930:1(297-314)Online publication date: 25-Sep-2024
    • (2024)A Comparative Study of Machine Learning Models for Detecting Fake News Content in Bahasa Indonesia Online Media2024 International Conference on Smart Computing, IoT and Machine Learning (SIML)10.1109/SIML61815.2024.10578272(43-48)Online publication date: 6-Jun-2024
    • (2024)Artificial Intelligence in Fake News Detection and Analysis for Low-Resource LanguagesCongress on Smart Computing Technologies10.1007/978-981-97-5081-8_3(29-45)Online publication date: 30-Oct-2024
    • (2024)Overview of the CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial RobustnessExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71908-0_2(28-52)Online publication date: 19-Sep-2024
    • (2024)The CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial RobustnessAdvances in Information Retrieval10.1007/978-3-031-56069-9_62(449-458)Online publication date: 23-Mar-2024
    • (2023)XAI in Automated Fact-Checking? The Benefits Are Modest and There's No One-Explanation-Fits-AllProceedings of the 35th Australian Computer-Human Interaction Conference10.1145/3638380.3638388(624-638)Online publication date: 2-Dec-2023
    • (2023)Misinformation detection based on news dispersion2023 24th International Conference on Digital Signal Processing (DSP)10.1109/DSP58604.2023.10167997(1-5)Online publication date: 11-Jun-2023
    • (2023)Tasaheel: An Arabic Automative Textual Analysis Tool—All in OneIEEE Access10.1109/ACCESS.2023.334052011(139979-139992)Online publication date: 2023
    • (2023)The CLEF-2023 CheckThat! Lab: Checkworthiness, Subjectivity, Political Bias, Factuality, and AuthorityAdvances in Information Retrieval10.1007/978-3-031-28241-6_59(506-517)Online publication date: 16-Mar-2023
    • (2023)Ethical Challenges in the Use of Digital Technologies: AI and Big DataDigital Transformation in Policing: The Promise, Perils and Solutions10.1007/978-3-031-09691-4_3(33-58)Online publication date: 3-Jan-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media