[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3340531.3412698acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

Detection of Novel Social Bots by Ensembles of Specialized Classifiers

Published: 19 October 2020 Publication History

Abstract

Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. While researchers have developed sophisticated methods to detect abuse, novel bots with diverse behaviors evade detection. We show that different types of bots are characterized by different behavioral features. As a result, supervised learning techniques suffer severe performance deterioration when attempting to detect behaviors not observed in the training data. Moreover, tuning these models to recognize novel bots requires retraining with a significant amount of new annotations, which are expensive to obtain. To address these issues, we propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule. The ensemble of specialized classifiers (ESC) can better generalize, leading to an average improvement of 56% in F1 score for unseen accounts across datasets. Furthermore, novel bot behaviors are learned with fewer labeled examples during retraining. We deployed ESC in the newest version of Botometer, a popular tool to detect social bots in the wild, with a cross-validation AUC of 0.99.

Supplementary Material

MP4 File (3340531.3412698.mp4)
In this video we present the limitations of current bot detection tools on Twitter.\r\nWe then propose a new method that generalizes better when tested on new class of bots. \r\nWe employed the new method in the new version of Botometer (v4) which was launched recently and is publicly available.

References

[1]
Faraz Ahmed and Muhammad Abulaish. 2013. A generic statistical approach for spam detection in Online Social Networks. Computer Communications 36, 10--11 (2013), 1120--1129.
[2]
Jon-Patrick Allem and Emilio Ferrara. 2018. Could social bots pose a threat to public health? American Journal of Public Health 108, 8 (2018), 1005.
[3]
Jon-Patrick Allem, Emilio Ferrara, Sree Priyanka Uppu, Tess Boley Cruz, and Jennifer B Unger. 2017. E-cigarette surveillance with social media data: social bots, emerging topics, and trends. JMIR public health and surveillance 3, 4 (2017), e98.
[4]
Eiman Alothali, Nazar Zaki, Elfadil A Mohamed, and Hany Alashwal. 2018. Detecting social bots on Twitter: a literature review. In International Conference on Innovations in Information Technology. IEEE, 175--180.
[5]
Alessandro Bessi and Emilio Ferrara. 2016. Social bots distort the 2016 US Presidential election online discussion. First Monday 21, 11 (2016).
[6]
David A Broniatowski, Amelia M Jamison, SiHua Qi, Lulwah AlKulaib, Tao Chen, Adrian Benton, Sandra C Quinn, and Mark Dredze. 2018. Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. American Journal of Public Health 108, 10 (2018), 1378--1384.
[7]
Nikan Chavoshi, Hossein Hamooni, and Abdullah Mueen. 2016. DeBot: Twitter Bot Detection via Warped Correlation. In Proc. Intl. Conf. on Data Mining. 817--822.
[8]
Zhouhan Chen and Devika Subramanian. 2018. An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter. arXiv preprint arXiv:1804.05232 (2018).
[9]
Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2015. Fame for sale: efficient detection of fake Twitter followers. Decision Support Systems 80 (2015), 56--71.
[10]
Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2016. DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intelligent Systems 31, 5 (2016), 58--64.
[11]
Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2017. The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proc. Intl. Conf. of the Web Companion. 963--972.
[12]
Stefano Cresci, Fabrizio Lillo, Daniele Regoli, Serena Tardelli, and Maurizio Tesconi. 2018. $ FAKE: Evidence of Spam and Bot Activity in Stock Microblogs on Twitter. In Proc. 12th International AAAI Conference on Web and Social Media.
[13]
Stefano Cresci, Fabrizio Lillo, Daniele Regoli, Serena Tardelli, and Maurizio Tesconi. 2019. Cashtag piggybacking: Uncovering spam and bot activity in stock microblogs on Twitter. ACM Transactions on the Web (TWEB) 13, 2 (2019), 11.
[14]
Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi. 2018. From Reaction to Proaction: UnexploredWays to the Detection of Evolving Spambots. In Companion of The Web Conference. 1469--1470.
[15]
Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi. 2019. Better safe than sorry: an adversarial approach to improve social bot detection. In Proceedings of the 10th ACM Conference on Web Science. 47--56.
[16]
Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2016. BotOrNot: A system to evaluate social bots. In In Proc. 25th Intl. Conf. Companion on World Wide Web. 273--274.
[17]
Ashok Deb, Anuja Majmundar, Sungyong Seo, Akira Matsui, Rajat Tandon, Shen Yan, Jon-Patrick Allem, and Emilio Ferrara. 2018. Social Bots for Online Public Health Interventions. In Proc. of the Intl. Conf. on Advances in Social Networks Analysis and Mining.
[18]
Juan Echeverría, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Gianluca Stringhini, Shi Zhou, et al. 2018. LOBO--evaluation of generalization deficiencies in Twitter bot classifiers. In Proc. of the Annual Computer Security Applications Conf. ACM, 137--146.
[19]
Juan Echeverria and Shi Zhou. 2017. Discovery of the Twitter Bursty Botnet. arXiv preprint arXiv:1709.06740 (2017).
[20]
Emilio Ferrara. 2017. Disinformation and Social Bot Operations in the Run Up to the 2017 French Presidential Election. First Monday 22, 8 (2017).
[21]
Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. 2016. The rise of social bots. Commun. ACM 59, 7 (2016), 96--104.
[22]
Zafar Gilani, Reza Farahbakhsh, Gareth Tyson, Liang Wang, and Jon Crowcroft. 2017. Of bots and humans (on Twitter). In Proc. of the Intl. Conf. on Advances in Social Networks Analysis and Mining. ACM, 349--354.
[23]
Zafar Gilani, Ekaterina Kochmar, and Jon Crowcroft. 2017. Classification of Twitter accounts into automated agents and human users. In Proc. of the Intl. Conf. on Advances in Social Networks Analysis and Mining. ACM, 489--496.
[24]
Christian Grimme, Dennis Assenmacher, and Lena Adam. 2018. Changing Perspectives: Is It Sufficient to Detect Social Bots?. In Proc. International Conference on Social Computing and Social Media. 445--461.
[25]
Sneha Kudugunta and Emilio Ferrara. 2018. Deep Neural Networks for Bot Detection. Information Sciences 467, October (2018), 312--322.
[26]
Kyumin Lee, Brian David Eoff, and James Caverlee. 2011. Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter. In Proc. AAAI Intl. Conf. on Web and Social Media (ICWSM).
[27]
Shenghua Liu, Bryan Hooi, and Christos Faloutsos. 2017. HoloScope: Topologyand- Spike Aware Fraud Detection. CoRR abs/1705.02505 (2017). arXiv:1705.02505 http://arxiv.org/abs/1705.02505
[28]
Michele Mazza, Stefano Cresci, Marco Avvenuti,Walter Quattrociocchi, and Maurizio Tesconi. 2019. RTbust: Exploiting Temporal Patterns for Botnet Detection on Twitter. arXiv preprint arXiv:1902.04506 (2019).
[29]
Zachary Miller, Brian Dickinson, William Deitrick, Wei Hu, and Alex Hai Wang. 2014. Twitter spammer detection using data stream clustering. Information Sciences 260 (2014), 64--73.
[30]
Silvia Mitter, Claudia Wagner, and Markus Strohmaier. 2014. A categorization scheme for socialbot attacks in online social networks. CoRR abs/1402.6288 (2014). arXiv:1402.6288 http://arxiv.org/abs/1402.6288
[31]
Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting Good Probabilities with Supervised Learning. In Proc. 22nd International Conference on Machine Learning (ICML) (Bonn, Germany). 625--632.
[32]
Diogo Pacheco, Pik-Mai Hui, Christopher Torres-Lugo, Bao Tran Truong, Alessandro Flammini, and Filippo Menczer. 2020. Uncovering Coordinated Networks on Social Media. preprint arXiv:2001.05658 (2020). To appear in Proc. ICWSM 2021.
[33]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830.
[34]
Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gonçalves, Snehal Patil, Alessandro Flammini, and Filippo Menczer. 2011. Truthy: mapping the spread of astroturf in microblog streams. In Proc. 20th Intl. Conf. Companion on World Wide Web. 249--252.
[35]
Adrian Rauchfleisch and Jonas Kaiser. 2020. The False Positive Problem of Automatic Bot Detection in Social Science Research. SSRN Electronic Journal (01 2020). https://doi.org/10.2139/ssrn.3565233
[36]
Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibility content by social bots. Nature communications 9, 1 (2018), 4787.
[37]
Chengcheng Shao, Pik-Mai Hui, Lei Wang, Xinwen Jiang, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2018. Anatomy of an online misinformation network. PLoS ONE 13, 4 (2018), e0196087.
[38]
Tao Stein, Erdong Chen, and Karan Mangla. 2011. Facebook Immune System. In Proc. 4th Workshop on Social Network Systems (SNS) (Salzburg, Austria). Article 8, 8 pages. https://doi.org/10.1145/1989656.1989664
[39]
Massimo Stella, Emilio Ferrara, and Manlio De Domenico. 2018. Bots increase exposure to negative and inflammatory content in online social systems. Proceedings of the National Academy of Sciences 115, 49 (2018), 12435--12440.
[40]
Galen Stoking and Nami Sumida. 2018. Social Media Bots Draw Public?s Attention and Concern. Pew Research Center (15 Oct 2018). https://www.journalism.org/2018/10/15/social-media-bots-draw-publics-attention-and-concern/
[41]
VS Subrahmanian, Amos Azaria, Skylar Durst, Vadim Kagan, Aram Galstyan, Kristina Lerman, Linhong Zhu, Emilio Ferrara, Alessandro Flammini, Filippo Menczer, et al. 2016. The DARPA Twitter bot challenge. Computer 49, 6 (2016), 38--46.
[42]
Onur Varol, Clayton A Davis, Filippo Menczer, and Alessandro Flammini. 2018. Feature Engineering for Social Bot Detection. Feature Engineering for Machine Learning and Data Analytics (2018), 311--334.
[43]
Onur Varol, Emilio Ferrara, Clayton A Davis, Filippo Menczer, and Alessandro Flammini. 2017. Online human-bot interactions: Detection, estimation, and characterization. In Proc. Intl. AAAI Conf. on Web and Social Media (ICWSM).
[44]
Onur Varol and Ismail Uluturk. 2018. Deception strategies and threats for online discussions. First Monday 23, 5 (2018).
[45]
Onur Varol and Ismail Uluturk. 2020. Journalists on Twitter: self-branding, audiences, and involvement of bots. Journal of Computational Social Science 3, 1 (01 Apr 2020), 83--101. https://doi.org/10.1007/s42001-019-00056-6
[46]
Nguyen Vo, Kyumin Lee, Cheng Cao, Thanh Tran, and Hongkyu Choi. 2017. Revealing and detecting malicious retweeter groups. In Proc. of the Intl. Conf. on Advances in Social Networks Analysis and Mining. 363--368.
[47]
Gang Wang, Manish Mohanlal, Christo Wilson, Xiao Wang, Miriam J. Metzger, Haitao Zheng, and Ben Y. Zhao. 2012. Social Turing Tests: Crowdsourcing Sybil Detection. CoRR abs/1205.3856 (2012). arXiv:1205.3856
[48]
Kai-Cheng Yang, Onur Varol, Clayton A Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2019. Arming the public with artificial intelligence to counter social bots. Human Behav. and Emerging Technologies 1, 1 (2019), 48--61.
[49]
Kai-Cheng Yang, Onur Varol, Pik-Mai Hui, and Filippo Menczer. 2020. Scalable and generalizable social bot detection through data selection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1096--1103.

Cited By

View all
  • (2024)What is the best time to tweet a journal article? Quasi-randomized controlled trialQual é a melhor hora para tuitar um artigo de revista? Ensaio controlado quasi-randomizadoAtoZ: novas práticas em informação e conhecimento10.5380/atoz.v13i0.8929613(1-12)Online publication date: 3-Jan-2024
  • (2024)role of social bots in the Brazilian environmental debate:The International Review of Information Ethics10.29173/irie51033:1Online publication date: 1-Apr-2024
  • (2024)Social Media as an Agent of Influence: Twitter Bots in Russia - Ukraine WarGüvenlik Stratejileri Dergisi10.17752/guvenlikstrtj.139670520:47(99-122)Online publication date: 26-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-domain
  2. machine learning
  3. recall
  4. social bots
  5. social media

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)511
  • Downloads (Last 6 weeks)56
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)What is the best time to tweet a journal article? Quasi-randomized controlled trialQual é a melhor hora para tuitar um artigo de revista? Ensaio controlado quasi-randomizadoAtoZ: novas práticas em informação e conhecimento10.5380/atoz.v13i0.8929613(1-12)Online publication date: 3-Jan-2024
  • (2024)role of social bots in the Brazilian environmental debate:The International Review of Information Ethics10.29173/irie51033:1Online publication date: 1-Apr-2024
  • (2024)Social Media as an Agent of Influence: Twitter Bots in Russia - Ukraine WarGüvenlik Stratejileri Dergisi10.17752/guvenlikstrtj.139670520:47(99-122)Online publication date: 26-Apr-2024
  • (2024)Identifying and characterizing superspreaders of low-credibility content on TwitterPLOS ONE10.1371/journal.pone.030220119:5(e0302201)Online publication date: 22-May-2024
  • (2024)Negative affect variability differs between anxiety and depression on social mediaPLOS ONE10.1371/journal.pone.027210719:2(e0272107)Online publication date: 21-Feb-2024
  • (2024)Empowered or Constrained in Platform Governance? An Analysis of Twitter Users’ Responses to Elon Musk’s TakeoverSocial Media + Society10.1177/2056305124127760610:3Online publication date: 12-Sep-2024
  • (2024)Mechanisms Driving Online Vaccine Debate During the COVID-19 PandemicSocial Media + Society10.1177/2056305124122965710:1Online publication date: 12-Feb-2024
  • (2024)Investigating responses to US drone strikes in Yemen using Twitter dataMedia, War & Conflict10.1177/17506352231219694Online publication date: 8-Jan-2024
  • (2024)Unveiling the Veiled Threat: The Impact of Bots on COVID-19 Health CommunicationSocial Science Computer Review10.1177/08944393241275641Online publication date: 8-Sep-2024
  • (2024)Statin Twitter: Human and Automated Bot Contributions, 2010 to 2022Journal of the American Heart Association10.1161/JAHA.123.03267813:7Online publication date: 2-Apr-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media