[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2783258.2783267acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Panther: Fast Top-k Similarity Search on Large Networks

Published: 10 August 2015 Publication History

Abstract

Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scale up to handle large networks (with billions of nodes). In this paper, we propose a sampling method that provably and accurately estimates the similarity between vertices. The algorithm is based on a novel idea of random path. Specifically, given a network, we perform R random walks, each starting from a randomly picked vertex and walking T steps. Theoretically, the algorithm guarantees that the sampling size R = O(2ε-2 log2 T) depends on the error-bound ε, the confidence level (1 -- δ), and the path length T of each random walk.
We perform extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return top-k similar vertices for any vertex in a network 300× faster than the state-of-the-art methods. We also use two applications-identity resolution and structural hole spanner finding--to evaluate the accuracy of the estimated similarities. Our results demonstrate that the proposed algorithm achieves clearly better performance than several alternative methods.

Supplementary Material

MP4 File (p1445.mp4)

References

[1]
K. Aoyama, K. Saito, H. Sawada, and N. Ueda. Fast approximate similarity search based on degree-reduced neighborhood graphs. In KDD'11, pages 1055--1063, 2011.
[2]
R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press, 1999.
[3]
V. D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. Van Dooren. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM review, 46(4):647--666, 2004.
[4]
R. S. Burt. Detecting role equivalence. Social Networks, 12(1):83--97, 1990.
[5]
R. S. Burt. Structural holes: The social structure of competition. Harvard university press, 2009.
[6]
Y. Dong, Y. Yang, J. Tang, Y. Yang, and N. V. Chawla. Inferring user demographics and social strategies in mobile social networks. In KDD'14, pages 15--24, 2014.
[7]
W. Feller. An introduction to probability theory and its applications, volume 2. John Wiley & Sons, 2008.
[8]
L. C. Freeman. A set of measures of centrality based on betweenness. Sociometry, pages 35--41, 1977.
[9]
L. C. Freeman. Centrality in social networks conceptual clarification. Social networks, 1(3):215--239, 1979.
[10]
Y. Fujiwara, M. Nakatsuji, H. Shiokawa, T. Mishima, and M. Onizuka. Efficient ad-hoc search for personalized pagerank. In SIGMOD'13, pages 445--456, 2013.
[11]
S. Gilpin, T. Eliassi-Rad, and I. Davidson. Guided learning for role discovery (glrd): framework, algorithms, and applications. In KDD'13, pages 113--121, 2013.
[12]
K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, C. Faloutsos, and L. Li. Rolx: structural role extraction & mining in large graphs. In KDD'12, pages 1231--1239, 2012.
[13]
K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos. It's who you know: graph mining using recursive structural features. In KDD'11, pages 663--671, 2011.
[14]
P. W. Holland and S. Leinhardt. An exponential family of probability distributions for directed graphs. Journal of the american Statistical association, 76(373):33--50, 1981.
[15]
J. Hopcroft, T. Lou, and J. Tang. Who will follow you back? reciprocal relationship prediction. In CIKM'11, pages 1137--1146, 2011.
[16]
P. Jaccard. Étude comparative de le distribution florale dans une portion de alpes et du jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37:547--579, 1901.
[17]
G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD'02, pages 538--543, 2002.
[18]
R. Jin, V. E. Lee, and H. Hong. Axiomatic ranking of network role similarity. In KDD'11, pages 922--930, 2011.
[19]
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39--43, 1953.
[20]
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14(1):10--25, 1963.
[21]
M. Kusumoto, T. Maehara, and K.-i. Kawarabayashi. Scalable similarity search for simrank. In SIGMOD'14, pages 325--336, 2014.
[22]
P. Lee, L. V. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE'12, pages 774--785, 2012.
[23]
E. Leicht, P. Holme, and M. E. Newman. Vertex similarity in networks. Physical Review E, 73(2):026120, 2006.
[24]
F. Lorrain and H. C. White. Structural equivalence of individuals in social networks. The Journal of mathematical sociology, 1(1):49--80, 1971.
[25]
T. Lou and J. Tang. Mining structural hole spanners through information diffusion in social networks. In WWW'13, pages 837--848, 2013.
[26]
M. E. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical review E, 74(3):036104, 2006.
[27]
J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In KDD'04, pages 653--658, 2004.
[28]
M. Riondato and E. M. Kornaropoulos. Fast approximation of betweenness centrality through sampling. In WSD'14, pages 413--422, 2014.
[29]
R. A. Rossi and N. K. Ahmed. Role discovery in networks. IEEE TKDE, 2015.
[30]
P. Sarkar and A. W. Moore. Fast nearest-neighbor search in disk-resident graphs. In KDD'10, pages 513--522, 2010.
[31]
H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for information Science, 24(4):265--269, 1973.
[32]
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB'11, pages 992--1003, 2011.
[33]
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD'08, pages 990--998, 2008.
[34]
C. E. Tsourakakis. Toward quantifying vertex similarity in networks. Internet Mathematics, 10(3--4):263--286, 2014.
[35]
V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264--280, 1971.
[36]
I. Wald and V. Havran. On building fast kd-trees for ray tracing, and on doing that in o (n log n). In Interactive Ray Tracing 2006, IEEE Symposium on, pages 61--69, 2006.
[37]
Y. Yang, J. Tang, C. W.-k. Leung, Y. Sun, Q. Chen, J. Li, and Q. Yang. Rain: Social role-aware information diffusion. In AAAI'14, 2014.

Cited By

View all
  • (2024)PyMulSim: a method for computing node similarities between multilayer networks via graph isomorphism networksBMC Bioinformatics10.1186/s12859-024-05830-625:1Online publication date: 13-Jun-2024
  • (2024)Manipulating Recommender Systems: A Survey of Poisoning Attacks and CountermeasuresACM Computing Surveys10.1145/367732857:1(1-39)Online publication date: 7-Oct-2024
  • (2024)Exploiting Temporal Vulnerabilities for Unauthorized Access in Intent-based NetworkingProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670301(3630-3644)Online publication date: 2-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. random path
  2. social network
  3. vertex similarity

Qualifiers

  • Research-article

Funding Sources

  • Natural Science Foundation of China
  • Tsinghua University Initiative Scientific Research Program
  • National High-tech R&D Program
  • National Basic Research Program of China

Conference

KDD '15
Sponsor:

Acceptance Rates

KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)7
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PyMulSim: a method for computing node similarities between multilayer networks via graph isomorphism networksBMC Bioinformatics10.1186/s12859-024-05830-625:1Online publication date: 13-Jun-2024
  • (2024)Manipulating Recommender Systems: A Survey of Poisoning Attacks and CountermeasuresACM Computing Surveys10.1145/367732857:1(1-39)Online publication date: 7-Oct-2024
  • (2024)Exploiting Temporal Vulnerabilities for Unauthorized Access in Intent-based NetworkingProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670301(3630-3644)Online publication date: 2-Dec-2024
  • (2023)dsJSON: A Distributed SQL JSON ProcessorProceedings of the ACM on Management of Data10.1145/35889571:1(1-25)Online publication date: 30-May-2023
  • (2023)NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural NetworksProceedings of the ACM on Management of Data10.1145/35889541:1(1-26)Online publication date: 30-May-2023
  • (2023)Hierarchical Residual Encoding for Multiresolution Time Series CompressionProceedings of the ACM on Management of Data10.1145/35889531:1(1-26)Online publication date: 30-May-2023
  • (2023)When Private Blockchain Meets Deterministic DatabaseProceedings of the ACM on Management of Data10.1145/35889521:1(1-28)Online publication date: 30-May-2023
  • (2023)AutoCTS+: Joint Neural Architecture and Hyperparameter Search for Correlated Time Series ForecastingProceedings of the ACM on Management of Data10.1145/35889511:1(1-26)Online publication date: 30-May-2023
  • (2023)Efficient Tree-SVD for Subset Node Embedding over Large Dynamic GraphsProceedings of the ACM on Management of Data10.1145/35889501:1(1-26)Online publication date: 30-May-2023
  • (2023)LiteHST: A Tree Embedding based Method for Similarity SearchProceedings of the ACM on Management of Data10.1145/35887151:1(1-26)Online publication date: 30-May-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media