[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2872427.2882972acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Public Access

In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale

Published: 11 April 2016 Publication History

Abstract

How can web services that depend on user generated content discern fake social engagement activities by spammers from legitimate ones? In this paper, we focus on the social site of YouTube and the problem of identifying bad actors posting inorganic contents and inflating the count of social engagement metrics. We propose an effective method, Leas (Local Expansion at Scale), and show how the fake engagement activities on YouTube can be tracked over time by analyzing the temporal graph based on the engagement behavior pattern between users and YouTube videos. With the domain knowledge of spammer seeds, we formulate and tackle the problem in a semi-supervised manner --- with the objective of searching for individuals that have similar pattern of behavior as the known seeds --- based on a graph diffusion process via local spectral subspace. We offer a fast, scalable MapReduce deployment adapted from the localized spectral clustering algorithm. We demonstrate the effectiveness of our deployment at Google by achieving a manual review accuracy of 98% on YouTube Comments graph in practice. Comparing with the state-of-the-art algorithm CopyCatch, Leas achieves 10 times faster running time on average. Leas is now actively in use at Google, searching for daily deceptive practices on YouTube's engagement graph spanning over a billion users.

References

[1]
GNU linear programming kit. http://www.gnu.org/software/glpk.
[2]
Stanford Network Analysis Project. http://snap.stanford.edu.
[3]
B. Abrahao, S. Soundarajan, J. Hopcroft, and R. Kleinberg. A separability framework for analyzing community structure. ACM Transactions on Knowledge Discovery from Data (TKDD), 8(1):5, 2014.
[4]
Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466(7307):761--764, 2010.
[5]
R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, pages 475--486. IEEE, 2006.
[6]
R. Andersen and K. J. Lang. Communities from seed sets. In WWW, pages 223--232. ACM, 2006.
[7]
M.-F. Balcan, C. Borgs, M. Braverman, J. Chayes, and S.-H. Teng. Finding endogenously formed communities. In SIAM, pages 767--783. ACM, 2013.
[8]
A. Clauset, C. R. Shalizi, and M. E. Newman. Power-law distributions in empirical data. SIAM review, 51(4):661--703, 2009.
[9]
M. Coscia, G. Rossetti, F. Giannotti, and D. Pedreschi. Demon: a local-first discovery method for overlapping communities. In KDD, pages 615--623. ACM, 2012.
[10]
S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.
[11]
T. J. Hansen and M. W. Mahoney. Semi-supervised eigenvectors for large-scale locally-biased learning. Journal of Machine Learning Research, 15:3691--3734, 2014.
[12]
D. Jin, B. Yang, C. Baquero, D. Liu, D. He, and J. Liu. A markov random walk under constraint for discovering overlapping communities in complex networks. Journal of Statistical Mechanics: Theory and Experiment, 2011(05):P05031, 2011.
[13]
K. Kloster and D. F. Gleich. Heat kernel based community detection. In KDD. ACM, 2014.
[14]
I. M. Kloumann and J. M. Kleinberg. Community membership identification from small seed sets. In KDD. ACM, 2014.
[15]
A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Physical Review E, 78(4):046110, 2008.
[16]
A. Lancichinetti, F. Radicchi, J. J. Ramasco, and S. Fortunato. Finding statistically significant communities in networks. PloS one, 6(4):e18961, 2011.
[17]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In WWW, pages 695--704. ACM, 2008.
[18]
M. Meila and J. Shi. A random walks view of spectral segmentation. 2001.
[19]
M. Molloy and B. Reed. A critical point for random graphs with a given degree sequence. Random structures & algorithms, 6(2--3):161--180, 1995.
[20]
A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849--856, 2002.
[21]
P. Pons and M. Latapy. Computing communities in large networks using random walks. In Computer and Information Sciences-ISCIS 2005, pages 284--293. Springer, 2005.
[22]
U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106, 2007.
[23]
M. Rosvall and C. T. Bergstrom. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PloS one, 6(4):e18209, 2011.
[24]
D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC, pages 81--90. ACM, 2004.
[25]
J. J. Whang, D. F. Gleich, and I. S. Dhillon. Overlapping community detection using seed set expansion. In CIKM, pages 2099--2108. ACM, 2013.
[26]
J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys (CSUR), 45(4):43, 2013.
[27]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In ICDM, pages 10--13, 2012.
[28]
V. Zlatić, A. Gabrielli, and G. Caldarelli. Topologically biased random walk and community finding in networks. Physical Review E, 82(6):066109, 2010.

Cited By

View all
  • (2024)Leveraging Machine Learning algorithms for Fake Profile Detection on Instagram2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT)10.1109/ICCPCT61902.2024.10673398(869-876)Online publication date: 8-Aug-2024
  • (2023)Graph Mining for Cybersecurity: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/361022818:2(1-52)Online publication date: 13-Nov-2023
  • (2023)Understanding Communication Strategies and Viewer Engagement with Science Knowledge Videos on BilibiliProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581476(1-18)Online publication date: 19-Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '16: Proceedings of the 25th International Conference on World Wide Web
April 2016
1482 pages
ISBN:9781450341431

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 11 April 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anomaly detection
  2. fake social engagement
  3. local spectral clustering
  4. seed expansion
  5. social networks

Qualifiers

  • Research-article

Funding Sources

  • US Army Research Office

Conference

WWW '16
Sponsor:
  • IW3C2
WWW '16: 25th International World Wide Web Conference
April 11 - 15, 2016
Québec, Montréal, Canada

Acceptance Rates

WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)101
  • Downloads (Last 6 weeks)20
Reflects downloads up to 09 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Leveraging Machine Learning algorithms for Fake Profile Detection on Instagram2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT)10.1109/ICCPCT61902.2024.10673398(869-876)Online publication date: 8-Aug-2024
  • (2023)Graph Mining for Cybersecurity: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/361022818:2(1-52)Online publication date: 13-Nov-2023
  • (2023)Understanding Communication Strategies and Viewer Engagement with Science Knowledge Videos on BilibiliProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581476(1-18)Online publication date: 19-Apr-2023
  • (2023)An analysis of fake social media engagement servicesComputers and Security10.1016/j.cose.2022.103013124:COnline publication date: 1-Jan-2023
  • (2022)A View into YouTube View FraudProceedings of the ACM Web Conference 202210.1145/3485447.3512216(555-563)Online publication date: 25-Apr-2022
  • (2022)to Protect the Public Opinion Against New Types of Bots?2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020758(1671-1680)Online publication date: 17-Dec-2022
  • (2021)Correct and stable sorting for overflow streaming data with a limited storage size and a uniprocessorPeerJ Computer Science10.7717/peerj-cs.3557(e355)Online publication date: 12-Feb-2021
  • (2021)From Symbols to Embeddings: A Tale of Two Representations in Computational Social ScienceJournal of Social Computing10.23919/JSC.2021.00112:2(103-156)Online publication date: Jun-2021
  • (2021)Detecting and Analyzing Collusive Entities on YouTubeACM Transactions on Intelligent Systems and Technology10.1145/347730012:5(1-28)Online publication date: 24-Nov-2021
  • (2020)TIESProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403364(3128-3135)Online publication date: 23-Aug-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media