[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1835449.1835522acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Uncovering social spammers: social honeypots + machine learning

Published: 19 July 2010 Publication History

Abstract

Web-based social systems enable new community-based opportunities for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware disseminators. In an effort to preserve community value and ensure longterm success, we propose and evaluate a honeypot-based approach for uncovering social spammers in online social systems. Two of the key components of the proposed approach are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers. We describe the conceptual framework and design considerations of the proposed approach, and we present concrete observations from the deployment of social honeypots in MySpace and Twitter. We find that the deployed social honeypots identify social spammers with low false positive rates and that the harvested spam data contains signals that are strongly correlated with observable profile features (e.g., content, friend information, posting patterns, etc.). Based on these profile features, we develop machine learning based classifiers for identifying previously unknown spammers with high precision and a low rate of false positives.

References

[1]
L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In WWW, 2008.
[2]
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of web spam. In SIGIR Workshop on Adversarial Information Retrieval on the Web, 2006.
[3]
A. A. Benczur, K. Csalogany, and T. Sarlos. Link-based similarity search to fight web spam. In SIGIR Workshop on Adversarial Information Retrieval on the Web, 2006.
[4]
F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Goncalves. Detecting spammers and content promoters in online video social networks. In SIGIR, 2009.
[5]
D. Boyd and J. Heer. Profiles as conversation: Networked identity performance on friendster. In HICSS, 2006.
[6]
A. Bratko, B. FilipiÇc, G. V. Cormack, T. R. Lynam, and B. Zupan. Spam filtering using statistical data compression models. J. Mach. Learn. Res., 7:2673--2698, 2006.
[7]
G. Brown, T. Howe, M. Ihbe, A. Prakash, and K. Borders. Social networks and context-aware spam. In CSCW, 2008.
[8]
J. Caverlee and S. Webb. A large-scale study of myspace: Observations and implications for online social networks. In ICWSM, 2008.
[9]
G. V. Cormack. Email spam filtering: A systematic review. Found. Trends Inf. Retr., 1(4):335--455, 2007.
[10]
D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In WWW, 2009.
[11]
N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classification. In SIGKDD, 2004.
[12]
A. Felt and D. Evans. Privacy protection for social networking platforms. In Workshop on Web 2.0 Security and Privacy, 2008.
[13]
P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007.
[14]
T. N. Jagatic, N. A. Johnson, M. Jakobsson, and F. Menczer. Social phishing. Commun. ACM, 50(10):94--100, 2007.
[15]
D. H. Joshua Goodman and R. Rounthwaite. Stopping spam. Scientific American, 292(4):42--88, April 2005.
[16]
C. Kreibich and J. Crowcroft. Honeycomb: creating intrusion detection signatures using honeypots. SIGCOMM Comput. Commun. Rev., 34(1):51--56, 2004.
[17]
Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Splog detection using self-similarity analysis on blog temporal dynamics. In WWW Workshop on Adversarial Information Retrieval on the Web, 2007.
[18]
A. Nazir, S. Raza, and C.-N. Chuah. Unveiling facebook: a measurement study of social network based applications. In SIGCOMM, 2008.
[19]
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
[20]
M. B. Prince, B. M. Dahl, L. Holloway, A. M. Keller, and E. Langheinrich. Understanding how spammers steal your e-mail address: An analysis of the first six months of data from project honey pot. In the Conference on Email and Anti-Spam (CEAS), 2005.
[21]
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk E-mail. In ICML Workshop on Learning for Text Categorization, 1998.
[22]
L. Spitzner. The honeynet project: Trapping the hackers. IEEE Security and Privacy, 1(2):15--23, 2003.
[23]
S. Webb, J. Caverlee, and C. Pu. Social honeypots: Making friends with a spammer near you. In the Conference on Email and Anti-Spam (CEAS), 2008.
[24]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition). Morgan Kaufmann, June 2005.
[25]
K. Yoshida, F. Adachi, T. Washio, H. Motoda, T. Homma, A. Nakashima, H. Fujikawa, and K. Yamazaki. Density-based spam detector. In SIGKDD, 2004.
[26]
A. Zinman and J. S. Donath. Is britney spears spam? In the Conference on Email and Anti-Spam (CEAS), 2007.

Cited By

View all
  • (2025)Rumor detection from online social media through feature selection and machine learningMultimedia Tools and Applications10.1007/s11042-024-20139-5Online publication date: 25-Jan-2025
  • (2025)Efficient arabic and english social spam detection using a transformer and 2D convolutional neural network-based deep learning filterInternational Journal of Information Security10.1007/s10207-024-00975-024:1Online publication date: 7-Jan-2025
  • (2024)Emoti-Shing: Detecting Vishing Attacks by Learning Emotion Dynamics through Hidden Markov ModelsJournal of Intelligent Learning Systems and Applications10.4236/jilsa.2024.16301516:03(274-315)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Uncovering social spammers: social honeypots + machine learning

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
        July 2010
        944 pages
        ISBN:9781450301534
        DOI:10.1145/1835449
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 19 July 2010

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. social honeypots
        2. social media
        3. spam

        Qualifiers

        • Research-article

        Conference

        SIGIR '10
        Sponsor:

        Acceptance Rates

        SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
        Overall Acceptance Rate 792 of 3,983 submissions, 20%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)105
        • Downloads (Last 6 weeks)15
        Reflects downloads up to 26 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Rumor detection from online social media through feature selection and machine learningMultimedia Tools and Applications10.1007/s11042-024-20139-5Online publication date: 25-Jan-2025
        • (2025)Efficient arabic and english social spam detection using a transformer and 2D convolutional neural network-based deep learning filterInternational Journal of Information Security10.1007/s10207-024-00975-024:1Online publication date: 7-Jan-2025
        • (2024)Emoti-Shing: Detecting Vishing Attacks by Learning Emotion Dynamics through Hidden Markov ModelsJournal of Intelligent Learning Systems and Applications10.4236/jilsa.2024.16301516:03(274-315)Online publication date: 2024
        • (2024)Username Squatting on Online Social Networks: A Study on XProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637637(621-637)Online publication date: 1-Jul-2024
        • (2024)Empowering Portable Optoelectronics With Computer Vision for Intraoral Cavity DetectionIEEE Sensors Journal10.1109/JSEN.2024.341302524:16(25911-25919)Online publication date: 15-Aug-2024
        • (2024)A Method to Detect Threat in Advertisement URL and its Content2024 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)10.1109/CONECCT62155.2024.10677073(1-6)Online publication date: 12-Jul-2024
        • (2024)Fake views removal and popularity on YouTubeScientific Reports10.1038/s41598-024-63649-w14:1Online publication date: 4-Jul-2024
        • (2024)Detection and impact estimation of social bots in the Chilean Twitter networkScientific Reports10.1038/s41598-024-57227-314:1Online publication date: 19-Mar-2024
        • (2024)CALEB: A Conditional Adversarial Learning Framework to enhance bot detectionData & Knowledge Engineering10.1016/j.datak.2023.102245149(102245)Online publication date: Jan-2024
        • (2024)Real-time anomaly detection in sky quality meter data using probabilistic exponential weighted moving averageInternational Journal of Data Science and Analytics10.1007/s41060-024-00535-8Online publication date: 20-Apr-2024
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media