More Web Proxy on the site http://driver.im/

research-article

Twitter spam detection based on deep learning

Authors:

Yang XiangAuthors Info & Claims

ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference

Article No.: 3, Pages 1 - 8

https://doi.org/10.1145/3014812.3014815

Published: 31 January 2017 Publication History

Abstract

Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning-based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 80%. However, due to the problems of spam drift and information fabrication, these machine-learning based methods cannot efficiently detect spam activities in real-life scenarios. Moreover, the blacklisting method cannot catch up with the variations of spamming activities as manually inspecting suspicious URLs is extremely time-consuming. In this paper, we proposed a novel technique based on deep learning techniques to address the above challenges. The syntax of each tweet will be learned through WordVector Training Mode. We then constructed a binary classifier based on the preceding representation dataset. In experiments, we collected and implemented a 10-day real Tweet datasets in order to evaluate our proposed method. We first studied the performance of different classifiers, and then compared our method to other existing text-based methods. We found that our method largely outperformed existing methods. We further compared our method to non-text-based detection techniques. According to the experiment results, our proposed method was more accurate.

References

[1]

R. Aires, A. Manfrin, S. M. Aluísio, and D. Santos. Which Classification Algorithm Works Best with Stylistic Features of Portuguese in Order to Classify Web Texts According to Users' needs?. ICMC-USP, 2004.

[2]

N. B. Amor, S. Benferhat, and Z. Elouedi. Naive bayes vs decision trees in intrusion detection systems. In Proceedings of the 2004 ACM symposium on Applied computing, pages 420--424. ACM, 2004.

Digital Library

[3]

F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.

[4]

M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, and B. Wiswedel. Knime-the konstanz information miner: version 2.0 and beyond. AcM SIGKDD explorations Newsletter, 11(1):26--31, 2009.

Digital Library

[5]

L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001.

Digital Library

[6]

C. Chen, J. Zhang, Y. Xiang, and W. Zhou. Asymmetric self-learning for tackling twitter spam drift. In 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pages 208--213. IEEE, 2015.

[7]

C. Chen, J. Zhang, Y. Xie, Y. Xiang, W. Zhou, M. M. Hassan, A. AlElaiwi, and M. Alrubaian. A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Transactions on Computational Social Systems, 2(3):65--76, 2015.

[8]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493--2537, 2011.

Digital Library

[9]

T. G. Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1--15. Springer, 2000.

Digital Library

[10]

V. N. Ghate and S. V. Dudul. Optimal mlp neural network classifier for fault detection of three phase induction motor. Expert Systems with Applications, 37(4):3468--3481, 2010.

Digital Library

[11]

C. Grier, K. Thomas, V. Paxson, and M. Zhang. @ spam: the underground on 140 characters or less. In Proceedings of the 17th ACM conference on Computer and communications security, pages 27--37. ACM, 2010.

Digital Library

[12]

A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56--65. ACM, 2007.

Digital Library

[13]

X. Jin, C. Lin, J. Luo, and J. Han. A data mining-based spam detection system for social media networks. Proceedings of the VLDB Endowment, 4(12):1458--1461, 2011.

Digital Library

[14]

Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, volume 14, pages 1188--1196, 2014.

Digital Library

[15]

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015.

[16]

K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: social honeypots+ machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 435--442. ACM, 2010.

Digital Library

[17]

S. Lee and J. Kim. Warningbird: Detecting suspicious urls in twitter stream. In NDSS, volume 12, pages 1--13, 2012.

[18]

A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18--22, 2002.

[19]

S. Liu, J. Zhang, Y. Wang, and Y. Xiang. Fuzzy-based feature and instance recovery. In Asian Conference on Intelligent Information and Database Systems, pages 605--615. Springer, 2016.

[20]

S. Liu, J. Zhang, and Y. Xiang. Statistical detection of online drifting twitter spam: Invited paper. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pages 1--10. ACM, 2016.

Digital Library

[21]

J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Learning to detect malicious urls. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):30, 2011.

Digital Library

[22]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

[23]

J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang. An in-depth analysis of abuse on twitter. Trend Micro, 225, 2014.

[24]

J. D. Rennie, L. Shih, J. Teevan, D. R. Karger, et al. Tackling the poor assumptions of naive bayes text classifiers. In ICML, volume 3, pages 616--623. Washington DC), 2003.

Digital Library

[25]

K. Rybina. Sentiment analysis of contexts around query terms in documents. PhD thesis, MasterâĂ&Zacute;s thesis, 2012.

[26]

J. Song, S. Lee, and J. Kim. Spam filtering in twitter using sender-receiver relationship. In International Workshop on Recent Advances in Intrusion Detection, pages 301--317. Springer, 2011.

Digital Library

[27]

G. Stringhini, C. Kruegel, and G. Vigna. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference, pages 1--9. ACM, 2010.

Digital Library

[28]

I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014.

Digital Library

[29]

D. Tang, F. Wei, B. Qin, T. Liu, and M. Zhou. Coooolll: A deep learning system for twitter sentiment classification. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 208--212, 2014.

[30]

D. Urbansky, K. Muthmann, P. Katz, and S. Reichert. Tud palladian overview. TU Dresden, Department of Systems Engineering, Chair Computer Networks, IIR Group, 5, 2011.

[31]

A. H. Wang. Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1--10. IEEE, 2010.

[32]

D. Wang, S. B. Navathe, L. Liu, D. Irani, A. Tamersoy, and C. Pu. Click traffic analysis of short url spam on twitter. In Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), 2013 9th International Conference Conference on, pages 250--259. IEEE, 2013.

[33]

C. Yang, R. Harkreader, and G. Gu. Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security, 8(8):1280--1293, 2013.

Digital Library

Cited By

Su S(2025)Research on Spam Filters Based on NB AlgorithmITM Web of Conferences10.1051/itmconf/2025700101670(01016)Online publication date: 23-Jan-2025
https://doi.org/10.1051/itmconf/20257001016
Sindiramutty STan CLau SThangaveloo RGharib AManchuri AKhan NTee WMuniandy L(2024)Explainable AI for CybersecurityAdvances in Explainable AI Applications for Smart Cities10.4018/978-1-6684-6361-1.ch002(31-97)Online publication date: 18-Jan-2024
https://doi.org/10.4018/978-1-6684-6361-1.ch002
Erbay HBakır RBakır H(2024)ALBERT4Spam: A Novel Approach for Spam Detection on Social NetworksBilişim Teknolojileri Dergisi10.17671/gazibtd.142623017:2(81-94)Online publication date: 30-Apr-2024
https://doi.org/10.17671/gazibtd.1426230
Show More Cited By

Index Terms

Twitter spam detection based on deep learning
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Social engineering attacks
      1. Phishing
      2. Spoofing attacks

Recommendations

Statistical Detection of Online Drifting Twitter Spam: Invited Paper
ASIA CCS '16: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security

Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of ...
Spam detection on twitter using traditional classifiers
ATC'11: Proceedings of the 8th international conference on Autonomic and trusted computing

Social networking sites have become very popular in recent years. Users use them to find new friends, updates their existing friends with their latest thoughts and activities. Among these sites, Twitter is the fastest growing site. Its popularity also ...
A comprehensive survey on deep learning based malware detection techniques
Abstract
Recent theoretical and practical studies have revealed that malware is one of the most harmful threats to the digital world. Malware mitigation techniques have evolved over the years to ensure security. Earlier, several classical ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference

January 2017

615 pages

ISBN:9781450347686

DOI:10.1145/3014812

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 January 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ACSW 2017

ACSW 2017: Australasian Computer Science Week 2017

January 30 - February 3, 2017

Geelong, Australia

Acceptance Rates

ACSW '17 Paper Acceptance Rate 78 of 156 submissions, 50%;

Overall Acceptance Rate 204 of 424 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

101
Total Citations
View Citations
1,938
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Su S(2025)Research on Spam Filters Based on NB AlgorithmITM Web of Conferences10.1051/itmconf/2025700101670(01016)Online publication date: 23-Jan-2025
https://doi.org/10.1051/itmconf/20257001016
Sindiramutty STan CLau SThangaveloo RGharib AManchuri AKhan NTee WMuniandy L(2024)Explainable AI for CybersecurityAdvances in Explainable AI Applications for Smart Cities10.4018/978-1-6684-6361-1.ch002(31-97)Online publication date: 18-Jan-2024
https://doi.org/10.4018/978-1-6684-6361-1.ch002
Erbay HBakır RBakır H(2024)ALBERT4Spam: A Novel Approach for Spam Detection on Social NetworksBilişim Teknolojileri Dergisi10.17671/gazibtd.142623017:2(81-94)Online publication date: 30-Apr-2024
https://doi.org/10.17671/gazibtd.1426230
Gao HZhang HWang JZhang XWang HLi WTu T(2024)NUAT-GAN: Generating Black-Box Natural Universal Adversarial Triggers for Text Classifiers Using Generative Adversarial NetworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.341684919(6484-6498)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3416849
Tusher EIsmail MRahman MAlenezi AUddin M(2024)Email Spam: A Comprehensive Review of Optimize Detection Methods, Challenges, and Open Research ProblemsIEEE Access10.1109/ACCESS.2024.346799612(143627-143657)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3467996
Rashidi ASalehi MNajari S(2024)CGANS: a code-based GAN for spam detection in social mediaSocial Network Analysis and Mining10.1007/s13278-024-01379-714:1Online publication date: 17-Nov-2024
https://doi.org/10.1007/s13278-024-01379-7
Alves VRibeiro J(2024)Detection and Classification of Spam in Social Media Comments Using Artificial Intelligence – A Case StudyProgress in Artificial Intelligence10.1007/978-3-031-73500-4_26(311-323)Online publication date: 16-Nov-2024
https://doi.org/10.1007/978-3-031-73500-4_26
Jamal SWimmer HSarker I(2024)An improved transformer‐based model for detecting phishing, spam and ham emails: A large language model approachSECURITY AND PRIVACY10.1002/spy2.402Online publication date: 24-Apr-2024
https://doi.org/10.1002/spy2.402
Dar MIqbal FLatif RAltaf AJamail N(2023)Policy-Based Spam Detection of Tweets DatasetElectronics10.3390/electronics1212266212:12(2662)Online publication date: 14-Jun-2023
https://doi.org/10.3390/electronics12122662
Liu HHuang XZhang XLi QMa FWang WChen HYu HZhang XElkind E(2023)Boosting decision-based black-box adversarial attack with gradient priorsProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/133(1195-1203)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/133
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten