More Web Proxy on the site http://driver.im/

research-article

Uncovering social spammers: social honeypots + machine learning

Authors:

James Caverlee,

Steve WebbAuthors Info & Claims

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Pages 435 - 442

https://doi.org/10.1145/1835449.1835522

Published: 19 July 2010 Publication History

Abstract

Web-based social systems enable new community-based opportunities for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware disseminators. In an effort to preserve community value and ensure longterm success, we propose and evaluate a honeypot-based approach for uncovering social spammers in online social systems. Two of the key components of the proposed approach are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers. We describe the conceptual framework and design considerations of the proposed approach, and we present concrete observations from the deployment of social honeypots in MySpace and Twitter. We find that the deployed social honeypots identify social spammers with low false positive rates and that the harvested spam data contains signals that are strongly correlated with observable profile features (e.g., content, friend information, posting patterns, etc.). Based on these profile features, we develop machine learning based classifiers for identifying previously unknown spammers with high precision and a low rate of false positives.

References

[1]

L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In WWW, 2008.

Digital Library

[2]

L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of web spam. In SIGIR Workshop on Adversarial Information Retrieval on the Web, 2006.

[3]

A. A. Benczur, K. Csalogany, and T. Sarlos. Link-based similarity search to fight web spam. In SIGIR Workshop on Adversarial Information Retrieval on the Web, 2006.

[4]

F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Goncalves. Detecting spammers and content promoters in online video social networks. In SIGIR, 2009.

Digital Library

[5]

D. Boyd and J. Heer. Profiles as conversation: Networked identity performance on friendster. In HICSS, 2006.

Digital Library

[6]

A. Bratko, B. FilipiÇc, G. V. Cormack, T. R. Lynam, and B. Zupan. Spam filtering using statistical data compression models. J. Mach. Learn. Res., 7:2673--2698, 2006.

Digital Library

[7]

G. Brown, T. Howe, M. Ihbe, A. Prakash, and K. Borders. Social networks and context-aware spam. In CSCW, 2008.

Digital Library

[8]

J. Caverlee and S. Webb. A large-scale study of myspace: Observations and implications for online social networks. In ICWSM, 2008.

[9]

G. V. Cormack. Email spam filtering: A systematic review. Found. Trends Inf. Retr., 1(4):335--455, 2007.

Digital Library

[10]

D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In WWW, 2009.

Digital Library

[11]

N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classification. In SIGKDD, 2004.

Digital Library

[12]

A. Felt and D. Evans. Privacy protection for social networking platforms. In Workshop on Web 2.0 Security and Privacy, 2008.

[13]

P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007.

Digital Library

[14]

T. N. Jagatic, N. A. Johnson, M. Jakobsson, and F. Menczer. Social phishing. Commun. ACM, 50(10):94--100, 2007.

Digital Library

[15]

D. H. Joshua Goodman and R. Rounthwaite. Stopping spam. Scientific American, 292(4):42--88, April 2005.

[16]

C. Kreibich and J. Crowcroft. Honeycomb: creating intrusion detection signatures using honeypots. SIGCOMM Comput. Commun. Rev., 34(1):51--56, 2004.

Digital Library

[17]

Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Splog detection using self-similarity analysis on blog temporal dynamics. In WWW Workshop on Adversarial Information Retrieval on the Web, 2007.

Digital Library

[18]

A. Nazir, S. Raza, and C.-N. Chuah. Unveiling facebook: a measurement study of social network based applications. In SIGCOMM, 2008.

Digital Library

[19]

M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.

[20]

M. B. Prince, B. M. Dahl, L. Holloway, A. M. Keller, and E. Langheinrich. Understanding how spammers steal your e-mail address: An analysis of the first six months of data from project honey pot. In the Conference on Email and Anti-Spam (CEAS), 2005.

[21]

M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk E-mail. In ICML Workshop on Learning for Text Categorization, 1998.

[22]

L. Spitzner. The honeynet project: Trapping the hackers. IEEE Security and Privacy, 1(2):15--23, 2003.

Digital Library

[23]

S. Webb, J. Caverlee, and C. Pu. Social honeypots: Making friends with a spammer near you. In the Conference on Email and Anti-Spam (CEAS), 2008.

[24]

I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition). Morgan Kaufmann, June 2005.

Digital Library

[25]

K. Yoshida, F. Adachi, T. Washio, H. Motoda, T. Homma, A. Nakashima, H. Fujikawa, and K. Yamazaki. Density-based spam detector. In SIGKDD, 2004.

Digital Library

[26]

A. Zinman and J. S. Donath. Is britney spears spam? In the Conference on Email and Anti-Spam (CEAS), 2007.

Cited By

Gurrapu NBaydeti N(2025)Rumor detection from online social media through feature selection and machine learningMultimedia Tools and Applications10.1007/s11042-024-20139-5Online publication date: 25-Jan-2025
https://doi.org/10.1007/s11042-024-20139-5
Kihal MHamza L(2025)Efficient arabic and english social spam detection using a transformer and 2D convolutional neural network-based deep learning filterInternational Journal of Information Security10.1007/s10207-024-00975-024:1Online publication date: 7-Jan-2025
https://doi.org/10.1007/s10207-024-00975-0
Nyassi VTchakounté FYenké BDanga DNgoran MFendji J(2024)Emoti-Shing: Detecting Vishing Attacks by Learning Emotion Dynamics through Hidden Markov ModelsJournal of Intelligent Learning Systems and Applications10.4236/jilsa.2024.16301516:03(274-315)Online publication date: 2024
https://doi.org/10.4236/jilsa.2024.163015
Show More Cited By

Index Terms

Uncovering social spammers: social honeypots + machine learning
1. Applied computing
  1. Law, social and behavioral sciences
2. Information systems
  1. World Wide Web
    1. Web applications
    2. Web services

Recommendations

Detecting spammers on social networks
ACSAC '10: Proceedings of the 26th Annual Computer Security Applications Conference

Social networking has become a popular way for users to meet and interact online. Users spend a significant amount of time on popular social network platforms (such as Facebook, MySpace, or Twitter), storing and sharing a wealth of personal information. ...
The social honeypot project: protecting online communities from spammers
WWW '10: Proceedings of the 19th international conference on World wide web

We present the conceptual framework of the Social Honeypot Project for uncovering social spammers who target online communities and initial empirical results from Twitter and MySpace. Two of the key components of the Social Honeypot Project are: (1) The ...
Uses and gratifications of social networking sites for bridging and bonding social capital

Applying uses and gratifications theory (UGT) and social capital theory, our study examined users of four social networking sites (SNSs) (Facebook, Twitter, Instagram, and Snapchat), and their influence on online bridging and bonding social capital. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

July 2010

944 pages

ISBN:9781450301534

DOI:10.1145/1835449

General Chairs:
Fabio Crestani
University of Lugano, CH
,
Stéphane Marchand-Maillet
University of Geneva, CH
,
Program Chairs:
Hsin-Hsi Chen
National Taiwan University, TW
,
Efthimis N. Efthimiadis
University of Washington, USA
,
Jacques Savoy
University of Neuchatel, CH

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '10

Sponsor:

SIGIR

SIGIR '10: The 33rd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2010

Geneva, Switzerland

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

433
Total Citations
View Citations
3,325
Total Downloads

Downloads (Last 12 months)105
Downloads (Last 6 weeks)15

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gurrapu NBaydeti N(2025)Rumor detection from online social media through feature selection and machine learningMultimedia Tools and Applications10.1007/s11042-024-20139-5Online publication date: 25-Jan-2025
https://doi.org/10.1007/s11042-024-20139-5
Kihal MHamza L(2025)Efficient arabic and english social spam detection using a transformer and 2D convolutional neural network-based deep learning filterInternational Journal of Information Security10.1007/s10207-024-00975-024:1Online publication date: 7-Jan-2025
https://doi.org/10.1007/s10207-024-00975-0
Nyassi VTchakounté FYenké BDanga DNgoran MFendji J(2024)Emoti-Shing: Detecting Vishing Attacks by Learning Emotion Dynamics through Hidden Markov ModelsJournal of Intelligent Learning Systems and Applications10.4236/jilsa.2024.16301516:03(274-315)Online publication date: 2024
https://doi.org/10.4236/jilsa.2024.163015
Lepipas ABorovykh ADemetriou SQuek TGao DZhou JCardenas A(2024)Username Squatting on Online Social Networks: A Study on XProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637637(621-637)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3637637
Khuntia SFan SJuan PLiou CHuang YSingh KOgwo CTai L(2024)Empowering Portable Optoelectronics With Computer Vision for Intraoral Cavity DetectionIEEE Sensors Journal10.1109/JSEN.2024.341302524:16(25911-25919)Online publication date: 15-Aug-2024
https://doi.org/10.1109/JSEN.2024.3413025
Nittur PJadav BSharma ATeja JMani SSahni N(2024)A Method to Detect Threat in Advertisement URL and its Content2024 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)10.1109/CONECCT62155.2024.10677073(1-6)Online publication date: 12-Jul-2024
https://doi.org/10.1109/CONECCT62155.2024.10677073
Castaldo MFrasca PVenturini TGargiulo F(2024)Fake views removal and popularity on YouTubeScientific Reports10.1038/s41598-024-63649-w14:1Online publication date: 4-Jul-2024
https://doi.org/10.1038/s41598-024-63649-w
Mendoza MProvidel ESantos MValenzuela S(2024)Detection and impact estimation of social bots in the Chilean Twitter networkScientific Reports10.1038/s41598-024-57227-314:1Online publication date: 19-Mar-2024
https://doi.org/10.1038/s41598-024-57227-3
Dimitriadis IDialektakis GVakali A(2024)CALEB: A Conditional Adversarial Learning Framework to enhance bot detectionData & Knowledge Engineering10.1016/j.datak.2023.102245149(102245)Online publication date: Jan-2024
https://doi.org/10.1016/j.datak.2023.102245
Riza LPutra ZZain MTrihutama FUtama JSamah KHerdiwijaya DNQZ RMumpuni EPriyatikanto R(2024)Real-time anomaly detection in sky quality meter data using probabilistic exponential weighted moving averageInternational Journal of Data Science and Analytics10.1007/s41060-024-00535-8Online publication date: 20-Apr-2024
https://doi.org/10.1007/s41060-024-00535-8
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten