[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3014812.3014815acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesaus-cswConference Proceedingsconference-collections
research-article

Twitter spam detection based on deep learning

Published: 31 January 2017 Publication History

Abstract

Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning-based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 80%. However, due to the problems of spam drift and information fabrication, these machine-learning based methods cannot efficiently detect spam activities in real-life scenarios. Moreover, the blacklisting method cannot catch up with the variations of spamming activities as manually inspecting suspicious URLs is extremely time-consuming. In this paper, we proposed a novel technique based on deep learning techniques to address the above challenges. The syntax of each tweet will be learned through WordVector Training Mode. We then constructed a binary classifier based on the preceding representation dataset. In experiments, we collected and implemented a 10-day real Tweet datasets in order to evaluate our proposed method. We first studied the performance of different classifiers, and then compared our method to other existing text-based methods. We found that our method largely outperformed existing methods. We further compared our method to non-text-based detection techniques. According to the experiment results, our proposed method was more accurate.

References

[1]
R. Aires, A. Manfrin, S. M. Aluísio, and D. Santos. Which Classification Algorithm Works Best with Stylistic Features of Portuguese in Order to Classify Web Texts According to Users' needs?. ICMC-USP, 2004.
[2]
N. B. Amor, S. Benferhat, and Z. Elouedi. Naive bayes vs decision trees in intrusion detection systems. In Proceedings of the 2004 ACM symposium on Applied computing, pages 420--424. ACM, 2004.
[3]
F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.
[4]
M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, and B. Wiswedel. Knime-the konstanz information miner: version 2.0 and beyond. AcM SIGKDD explorations Newsletter, 11(1):26--31, 2009.
[5]
L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001.
[6]
C. Chen, J. Zhang, Y. Xiang, and W. Zhou. Asymmetric self-learning for tackling twitter spam drift. In 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pages 208--213. IEEE, 2015.
[7]
C. Chen, J. Zhang, Y. Xie, Y. Xiang, W. Zhou, M. M. Hassan, A. AlElaiwi, and M. Alrubaian. A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Transactions on Computational Social Systems, 2(3):65--76, 2015.
[8]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493--2537, 2011.
[9]
T. G. Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1--15. Springer, 2000.
[10]
V. N. Ghate and S. V. Dudul. Optimal mlp neural network classifier for fault detection of three phase induction motor. Expert Systems with Applications, 37(4):3468--3481, 2010.
[11]
C. Grier, K. Thomas, V. Paxson, and M. Zhang. @ spam: the underground on 140 characters or less. In Proceedings of the 17th ACM conference on Computer and communications security, pages 27--37. ACM, 2010.
[12]
A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56--65. ACM, 2007.
[13]
X. Jin, C. Lin, J. Luo, and J. Han. A data mining-based spam detection system for social media networks. Proceedings of the VLDB Endowment, 4(12):1458--1461, 2011.
[14]
Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, volume 14, pages 1188--1196, 2014.
[15]
Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015.
[16]
K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: social honeypots+ machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 435--442. ACM, 2010.
[17]
S. Lee and J. Kim. Warningbird: Detecting suspicious urls in twitter stream. In NDSS, volume 12, pages 1--13, 2012.
[18]
A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18--22, 2002.
[19]
S. Liu, J. Zhang, Y. Wang, and Y. Xiang. Fuzzy-based feature and instance recovery. In Asian Conference on Intelligent Information and Database Systems, pages 605--615. Springer, 2016.
[20]
S. Liu, J. Zhang, and Y. Xiang. Statistical detection of online drifting twitter spam: Invited paper. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pages 1--10. ACM, 2016.
[21]
J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Learning to detect malicious urls. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):30, 2011.
[22]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[23]
J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang. An in-depth analysis of abuse on twitter. Trend Micro, 225, 2014.
[24]
J. D. Rennie, L. Shih, J. Teevan, D. R. Karger, et al. Tackling the poor assumptions of naive bayes text classifiers. In ICML, volume 3, pages 616--623. Washington DC), 2003.
[25]
K. Rybina. Sentiment analysis of contexts around query terms in documents. PhD thesis, MasterâĂŹs thesis, 2012.
[26]
J. Song, S. Lee, and J. Kim. Spam filtering in twitter using sender-receiver relationship. In International Workshop on Recent Advances in Intrusion Detection, pages 301--317. Springer, 2011.
[27]
G. Stringhini, C. Kruegel, and G. Vigna. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference, pages 1--9. ACM, 2010.
[28]
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014.
[29]
D. Tang, F. Wei, B. Qin, T. Liu, and M. Zhou. Coooolll: A deep learning system for twitter sentiment classification. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 208--212, 2014.
[30]
D. Urbansky, K. Muthmann, P. Katz, and S. Reichert. Tud palladian overview. TU Dresden, Department of Systems Engineering, Chair Computer Networks, IIR Group, 5, 2011.
[31]
A. H. Wang. Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1--10. IEEE, 2010.
[32]
D. Wang, S. B. Navathe, L. Liu, D. Irani, A. Tamersoy, and C. Pu. Click traffic analysis of short url spam on twitter. In Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), 2013 9th International Conference Conference on, pages 250--259. IEEE, 2013.
[33]
C. Yang, R. Harkreader, and G. Gu. Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security, 8(8):1280--1293, 2013.

Cited By

View all
  • (2024)Explainable AI for CybersecurityAdvances in Explainable AI Applications for Smart Cities10.4018/978-1-6684-6361-1.ch002(31-97)Online publication date: 18-Jan-2024
  • (2024)ALBERT4Spam: A Novel Approach for Spam Detection on Social NetworksBilişim Teknolojileri Dergisi10.17671/gazibtd.142623017:2(81-94)Online publication date: 30-Apr-2024
  • (2024)NUAT-GAN: Generating Black-Box Natural Universal Adversarial Triggers for Text Classifiers Using Generative Adversarial NetworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.341684919(6484-6498)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference
January 2017
615 pages
ISBN:9781450347686
DOI:10.1145/3014812
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 January 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Twitter spam detection
  2. deep learning
  3. social network security

Qualifiers

  • Research-article

Conference

ACSW 2017
ACSW 2017: Australasian Computer Science Week 2017
January 30 - February 3, 2017
Geelong, Australia

Acceptance Rates

ACSW '17 Paper Acceptance Rate 78 of 156 submissions, 50%;
Overall Acceptance Rate 204 of 424 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Explainable AI for CybersecurityAdvances in Explainable AI Applications for Smart Cities10.4018/978-1-6684-6361-1.ch002(31-97)Online publication date: 18-Jan-2024
  • (2024)ALBERT4Spam: A Novel Approach for Spam Detection on Social NetworksBilişim Teknolojileri Dergisi10.17671/gazibtd.142623017:2(81-94)Online publication date: 30-Apr-2024
  • (2024)NUAT-GAN: Generating Black-Box Natural Universal Adversarial Triggers for Text Classifiers Using Generative Adversarial NetworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.341684919(6484-6498)Online publication date: 2024
  • (2024)Email Spam: A Comprehensive Review of Optimize Detection Methods, Challenges, and Open Research ProblemsIEEE Access10.1109/ACCESS.2024.346799612(143627-143657)Online publication date: 2024
  • (2024)CGANS: a code-based GAN for spam detection in social mediaSocial Network Analysis and Mining10.1007/s13278-024-01379-714:1Online publication date: 17-Nov-2024
  • (2024)Detection and Classification of Spam in Social Media Comments Using Artificial Intelligence – A Case StudyProgress in Artificial Intelligence10.1007/978-3-031-73500-4_26(311-323)Online publication date: 16-Nov-2024
  • (2024)An improved transformer‐based model for detecting phishing, spam and ham emails: A large language model approachSECURITY AND PRIVACY10.1002/spy2.402Online publication date: 24-Apr-2024
  • (2023)Policy-Based Spam Detection of Tweets DatasetElectronics10.3390/electronics1212266212:12(2662)Online publication date: 14-Jun-2023
  • (2023)Boosting decision-based black-box adversarial attack with gradient priorsProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/133(1195-1203)Online publication date: 19-Aug-2023
  • (2023)A Hybrid Spam Detection Framework for Social NetworksSosyal Ağlar için Hibrit Bir Spam Algılama FrameworkPoliteknik Dergisi10.2339/politeknik.93378526:2(823-837)Online publication date: 5-Jul-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media