[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Advertisement

Entropy-Based Automation Detection on Twitter Using DNA Profiling

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Twitter is a popular microblogging-based online social network (OSN) that acts as a platform for users to express themselves and enrich public relationships. Monthly active users in twitter have reached approximately 237.8 million by 2023. With the rise in popularity, there is also a proportional increase in the number of automated accounts. Some bots conduct productive tasks such as posting news and delivering disaster alerts. However, there also exist some bots that are used as vectors to mislead legitimate users by spreading misinformation or distributing malware. Therefore, detecting malicious bots is crucial for maintaining a safe and secure Twitter environment. In this paper, a novel technique to identify bots by analyzing the degree of regularity in user behavior is proposed. Real-time tweets of users are mined and their online behaviors are characterized as DNA sequences. Further, we integrate approximate entropy to assess the degree of regularity in numerically encoded DNA sequences. Accounts with entropy values lower than a fixed threshold represent bots. The outcomes of the experiments conducted in real-time Twitter data demonstrated that the proposed detection technique achieves a precision of 0.9573, recall of 0.9961, F1 score of 0.9609, and accuracy of 0.9563.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Yamaguchi Y, Amagasa T, Kitagawa H. Tag-based user topic discovery using twitter lists. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, IEEE, 2011, pp. 13–20.

  2. Liu H, Han J, Motoda H. Uncovering deception in social media. Springer; 2014. p. 162.

  3. Twitter. Automation Rules—Twitter Help Center. 2017. https://help.twitter.com/en/rules-and-policies/twitter-automation.

  4. Yang KC, Varol O, Davis CA, Ferrara E, Flammini A, Menczer F. Arming the public with artificial intelligence to counter social bots. Hum Behav Emerg Technol. 2019;1(1):48–61.

    Article  Google Scholar 

  5. Shukla H, Jagtap N, Patil B. Enhanced Twitter bot detection using ensemble machine learning. Int Conf Invent Comput Technol (ICICT). 2021. https://doi.org/10.1109/ICICT50816.2021.9358734.

    Article  Google Scholar 

  6. Himelein-Wachowiak M, Giorgi S, Devoto A, Rahman M, Ungar L, Schwartz HA, Epstein DH, Leggio L, Curtis B. Bots and misinformation spread on social media: implications for COVID-19. J Med Internet Res. 2021;23(5): e26933.

    Article  Google Scholar 

  7. Allyn B. Researchers: Nearly half of accounts tweeting about coronavirus are likely bots. 2020. https://www.npr.org/sections/coronavirus-live-updates/2020/05/20/859814085/researchers-nearly-half-of-accounts-tweeting-about-coronavirus-are-likely-bots

  8. Kouzy R, Abi Jaoude J, Kraitem A, El-Alam MB, Karam B, Adib EE, Baddour K, et al. Coronavirus goes viral: quantifying the COVID-19 misinformation epidemic on twitter. Cureus. 2020;12(3):e7255.

    Google Scholar 

  9. Ferrara E. What types of COVID-19 conspiracies are populated by Twitter bots? First Monday. 2020. https://doi.org/10.5210/fm.v25i6.10633.

    Article  Google Scholar 

  10. Mehta B, Salmon J, Ibrahim S. Potential shortages of hydroxychloroquine for patients with lupus during the coronavirus disease 2019 pandemic. In JAMA Health Forum. 2020;1(4): e20043.

    Article  Google Scholar 

  11. Loomba S, de Figueiredo A, Piatek SJ, de Graaf K, Larson HJ. Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA. Nat Hum Behav. 2021;5(3):337–48.

    Article  Google Scholar 

  12. Stella M, Ferrara E, De Domenico M. Bots increase exposure to negative and inflammatory content in online social systems. Proc Natl Acad Sci. 2018;115(49):12435–40.

    Article  Google Scholar 

  13. Shao C, Ciampaglia GL, Varol O, Yang KC, Flammini A, Menczer F. The spread of low-credibility content by social bots. Nat Commun. 2018;9(1):1–9.

    Article  Google Scholar 

  14. Starbird K, Arif A, Wilson T. Disinformation as collaborative work: surfacing the participatory nature of strategic information operations”. Proc ACM Hum-Comput Interact. 2019;3:1–26.

    Article  Google Scholar 

  15. Cresci S, Lillo F, Regoli D, Tardelli S, Tesconi M. $ FAKE: Evidence of spam and bot activity in stock microblogs on Twitter. In: Twelfth International AAAI Conference on Web and Social Media. 2018.

  16. Haustein S, Bowman TD, Holmberg K, Tsou A, Sugimoto CR, Larivière V. Tweets as impact indicators: examining the implications of automated “bot” accounts on Twitter. J Am Soc Inf Sci. 2016;67(1):232–8.

    Google Scholar 

  17. Chu Z, Gianvecchio S, Wang H, Jajodia S. Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secure Comput. 2012;9(6):811–24.

    Article  Google Scholar 

  18. Botometer. FAQ. 2019. https://botometer.iuni.iu.edu/#!/faq. Accessed 24 Jun 2019.

  19. Davis CA, Varol O, Ferrara E, Flammini A, Menczer F. BotOrNot: a system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web, 2016, pp. 273–74.

  20. Rauchfleisch A, Kaiser J. The false positive problem of automatic bot detection in social science research. PLoS ONE. 2020;15(10): e0241045.

    Article  Google Scholar 

  21. Luceri L, Deb A, Giordano S, Ferrara E. Evolution of bot and human behavior during elections. First Monday. 2019. https://doi.org/10.5210/fm.v24i9.10213.

    Article  Google Scholar 

  22. Gorwa R, Guilbeault D. Unpacking the Social Media Bot: A Typology to Guide Research and Policy. Policy Internet. 2020;12(2):225.

    Article  Google Scholar 

  23. Chavoshi N, Hamooni H, Mueen A. Identifying correlated bots in twitter. In: International conference on social informatics. Springer; 2016. p. 14–21.

  24. Echeverria J, Zhou S. Discovery, retrieval, and analysis of the 'Star Wars' Botnet in Twitter. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2017, pp.1–8.

  25. Song J, Lee S, J. Kim J,. Spam filtering in twitter using sender receiver relationship. In: International workshop on recent advances in intrusion detection. Berlin: Springer; 2011. p. 301–7.

    Chapter  Google Scholar 

  26. Warriner AB, Kuperman V, Brysbaert M. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav Res Methods. 2013;45(4):1191–207.

    Article  Google Scholar 

  27. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M. Fame for sale: efficient detection of fake Twitter followers. Decis Support Syst. 2015;80:56–71.

    Article  Google Scholar 

  28. Li K, Fu Y. Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell. 2014;36(8):1644–57.

    Article  Google Scholar 

  29. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M. The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: WWW’17 companion. ACM; 2023.

  30. R. Agrawal and R. Srikant, “Mining sequential patterns,” in ICDE’95. IEEE, pp. 3–14, 1995.

  31. Arnold M, Ohlebusch E. Linear time algorithms for generalizations of the longest common substring problem. Algorithmica. 2011;60(4):806–18.

    Article  MathSciNet  MATH  Google Scholar 

  32. Wang D, Tapan S. A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps. IEEE Trans Neural Netw Learn Syst. 2013;24(10):1677–88.

    Article  Google Scholar 

  33. Kvålseth TO. On the measurement of randomness (uncertainty): a more informative entropy. Entropy. 2016;18(5):159.

    Article  Google Scholar 

  34. Holzinger A, Hörtenhuber M, Mayer C, Bachler M, Wassertheurer S, Pinho AJ, Koslicki D. On entropy-based data mining. In: Interactive knowledge discovery and data mining in biomedical informatics. Berlin: Springer; 2014. p. 209–26.

    Chapter  Google Scholar 

  35. Kabakus AT, Kara R. A survey of spam detection methods on twitter. Int J Adv Comput Sci Appl. 2017;8(3):29–38.

    Google Scholar 

  36. Latah M. Detection of malicious social bots: a survey and a refined taxonomy. Expert Syst Appl. 2020;151: 113383.

    Article  Google Scholar 

  37. Cui P, Liu H, Aggarwal C, Wang F. Online behavioral analysis and modeling (guest editorial). IEEE Intell Syst. 2016;31(1):2–4.

    Article  Google Scholar 

  38. Bucur D. Gender homophily in online book networks. Inf Sci. 2019;481:229–43. https://doi.org/10.1016/j.ins.2019.01.003.

    Article  Google Scholar 

  39. Liu S, Wang S, Zhu F. Structured learning from heterogeneous behavior for social identity linkage. IEEE Trans Knowl Data Eng. 2015;27(7):2005–19.

    Article  Google Scholar 

  40. Chou C-K, Chen M-S. Learning multiple factors-aware Diffusion models in social networks. IEEE Trans Knowl Data Eng. 2018;30(7):1268–81.

    Article  Google Scholar 

  41. Kudugunta S, Ferrara E. Deep neural networks for bot detection. Inf Sci. 2018;467:312–22. https://doi.org/10.1016/j.ins.2018.08.019.

    Article  Google Scholar 

  42. Jeong J, Moon S. Interval signature: persistence and distinctiveness of inter-event time distributions in online human behavior. In: WWW’17 companion. ACM; 2017. p. 1585–93.

  43. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M. DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intell Syst. 2016;31(5):58–64. https://doi.org/10.1109/MIS.2016.29.

    Article  Google Scholar 

  44. Cresci S, Pietro RD, Petrocchi M, Spognardi A, Tesconi M. Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans Dependable Secure Comput. 2018;15(4):561–76. https://doi.org/10.1109/TDSC.2017.2681672.

    Article  Google Scholar 

  45. Cresci S, Petrocchi M, Spognardi A, Tognazzi S. On the capability of evolved spambots to evade detection via genetic engineering. Online Soc Netw Media. 2019;9:1–16.

    Article  Google Scholar 

  46. Cresci S, di Pietro R, Petrocchi M, Spognardi A, Tesconi M. Exploiting digital DNA for the analysis of similarities in twitter behaviours. IEEE Int Conf Data Sci Adv Anal (DSAA). 2017. https://doi.org/10.1109/DSAA.2017.57.

    Article  Google Scholar 

  47. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M. Emergent properties, models, and laws of behavioral similarities within groups of twitter users. Comput Commun. 2020;150:47–61.

    Article  Google Scholar 

  48. Gianvecchio S, Xie M, Wu Z, Wang H. Measurement and classification of humans and bots in internet chat. In: USENIX Security Symposium, 2008, pp. 155–170.

  49. Gianvecchio S, Xie M, Wu Z, Wang H. Humans and bots in internet chat: measurement, analysis, and automated classification. IEEE/ACM Trans Netw. 2011;19(5):1557–71.

    Article  Google Scholar 

  50. Ghosh R, Surachawala T, Lerman K. Entropy-based classification of 'retweeting' activity on twitter. 2011. arXiv preprint arXiv:1106.0346.

  51. Bereziński P, Jasiul B, Szpyrka M. An entropy-based network anomaly detection method. Entropy. 2015;17(4):2367–408.

    Article  Google Scholar 

  52. Bhuvaneswari A, Valliyammai C. Information entropy based event detection during disaster in cyber-social networks. J Intell Fuzzy Syst. 2019;36(5):3981–92.

    Article  Google Scholar 

  53. Perdana RS, Muliawati TH, Alexandro R. Bot spammer detection in Twitter using tweet similarity and time interval entropy. Jurnal Ilmu Komputer dan Informasi. 2015;8(1):19–25.

    Article  Google Scholar 

  54. Rout RR, Lingam G, Somayajulu DV. Detection of malicious social bots using learning automata with url features in twitter network. IEEE Trans Comput Soc Syst. 2020;7(4):1004–18.

    Article  Google Scholar 

  55. Jin X, Lin CX, Luo J, Han J. Socialspamguard: a data mining-based spam detection system for social media networks. Proc VLDB Endow. 2011;4(12):1458–61.

    Article  Google Scholar 

  56. Dougherty ER, Huang Y, Kim S, Cai X, Yamaguchi R. Genomic signal processing. Curr Genom. 2009;10(6):364. https://doi.org/10.2174/138920209789177593.

    Article  Google Scholar 

  57. Kumar MR, Vaegae NK. A new numerical approach for DNA representation using modified Gabor wavelet transform for the identification of protein coding regions. Biocybern Biomed Eng. 2020;40(2):836–48.

    Article  Google Scholar 

  58. Vinga S, Almeida JS. Rényi continuous entropy of DNA sequences. J Theor Biol. 2004;231(3):377–88.

    Article  MATH  Google Scholar 

  59. Aljohani NR, Fayoumi A, Hassan SU. Bot prediction on social networks of Twitter in altmetrics using deep graph convolutional networks. Soft Comput. 2020;24:11109.

    Article  Google Scholar 

  60. Twitter Dev. Developer Agreement and Policy. Twitter Incorporated. 2020. https://developer.twitter.com/en/developerterms/agreement-and-policy. Accessed 15 Nov 2020.

  61. Samper-Escalante LD, Loyola-González O, Monroy R, Medina-Pérez MA. Bot datasets on twitter: analysis and challenges. Appl Sci. 2021;11(9):4105.

    Article  Google Scholar 

  62. Firdaus SN, Ding C, Sadeghian A. Retweet: a popular information diffusion mechanism–a survey paper. Online Soc Netw Media. 2018;6:26–40.

    Article  Google Scholar 

  63. Wang G, Mohanlal M, Wilson C, Wang X, Metzger M, Zheng H, Zhao BY. Social turing tests: Crowdsourcing sybil detection. 2012. arXiv preprint arXiv:1205.3856.

  64. Avvenuti M, Bellomo S, Cresci S, La Polla MN, Tesconi M. Hybrid crowdsensing: A novel paradigm to combine the strengths of opportunistic and participatory crowdsensing. In: Proceedings of the 26th International Conference on World Wide Web companion, 2017, pp. 1413–21.

  65. Chernick MR, LaBudde RA. An introduction to bootstrap methods with applications to R. Cham: John Wiley & Sons; 2014.

    Google Scholar 

  66. Chen X, Solomon IC, Chon KH. Comparison of the use of approximate entropy and sample entropy: applications to neural respiratory signal. In: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference 2006, IEEE, 2005, pp. 4212–5.

  67. Pincus S. Approximate entropy (ApEn) as a complexity measure. Chaos. 1995;5(1):110–7.

    Article  MathSciNet  Google Scholar 

  68. Pincus SM. Approximate entropy as a measure of system complexity. Proc Natl Acad Sci. 1991;88(6):2297–301.

    Article  MathSciNet  MATH  Google Scholar 

  69. Richman JS, Moorman JR. Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol-Heart Circ Physiol. 2000. https://doi.org/10.1152/ajpheart.2000.278.6.H2039.

    Article  Google Scholar 

  70. Costa M, Goldberger AL, Peng CK. Multiscale entropy analysis of biological signals. Phys Rev E. 2005;71(2): 021906.

    Article  MathSciNet  Google Scholar 

  71. Pincus SM, Huang WM. Approximate entropy: statistical properties and applications. Commun Stat Theory Methods. 1992;21(11):3061–77.

    Article  MATH  Google Scholar 

  72. Gilmary R, Venkatesan A, Vaiyapuri G, Balamurali D. DNA-influenced automated behavior detection on twitter through relative entropy. Sci Rep. 2022;16(1):8022.

    Article  Google Scholar 

  73. Keller TR, Klinger U. Social bots in election campaigns: Theoretical, empirical, and methodological implications. Polit Commun. 2019;36(1):171–89.

    Article  Google Scholar 

  74. Vosoughi S, Roy D, Aral S. The spread of true and false news online. Science. 2018;359(6380):1146–51.

    Article  Google Scholar 

  75. Gilmary R, Venkatesan A, Vaiyapuri G. Discovering social bots on Twitter: a thematic review. Int J Internet Technol Secured Trans. 2021;11(4):369–95.

    Article  Google Scholar 

  76. Tyagi R, Paul T, Manoj BS, Thanudas B. A novel HTTP botnet traffic detection method. In: 2015 Annual IEEE India Conference (INDICON), 2015, pp. 1–6.

  77. Paul T, Tyagi R, Manoj BS, Thanudas B. Fast-flux botnet detection from network traffic. In: 2014 Annual IEEE India Conference (INDICON), 2014, pp. 1–6.

Download references

Acknowledgements

The authors would like to thank Dr. Govindasamy Vaiyapuri for his comments and the anonymous reviewers for their insightful suggestions and careful reading of the manuscript. This work has been supported by the Research Grant No. SPG/2020/000594 under the SERB POWER grant scheme, Science and Engineering Research Board, Government of India., to Akila Venkatesan, Puducherry Technological University, India.

Funding

This work was supported by the SERB POWER grant scheme, Science and Engineering Research Board under Research Grant No. SPG/2020/000594.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosario Gilmary.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gilmary, R., Venkatesan, A. Entropy-Based Automation Detection on Twitter Using DNA Profiling. SN COMPUT. SCI. 4, 847 (2023). https://doi.org/10.1007/s42979-023-02324-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02324-9

Keywords