[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3299869.3319873acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

Published: 25 June 2019 Publication History

Abstract

SimRank is a classic measure of the similarities of nodes in a graph. Given a node u in graph $G =(V, E)$, a \em single-source SimRank query returns the SimRank similarities $s(u, v)$ between node u and each node $v \in V$. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes n, which renders them inapplicable for real-time and interactive analysis. This paper proposes \prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. \prsim uses an index of size $O(m)$, where m is the number of edges in the graph, and guarantees a query time that depends on the \em reverse PageRank distribution of the input graph. In particular, we prove that \prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph. Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that \prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.

References

[1]
http://snap.stanford.edu/data/index.html.
[2]
http://law.di.unimi.it/datasets.php.
[3]
Rodrigo Aldecoa, Chiara Orsini, and Dmitri Krioukov. Hyperbolic graph generator. Computer Physics Communications, 196:492--496, 2015.
[4]
Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. Simrank
[5]
: query rewriting through link analysis of the click graph. PVLDB, 1(1):408--421, 2008.
[6]
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010.
[7]
Mansurul Bhuiyan and Mohammad Al Hasan. Representing graphs as bag of vertices and partitions for graph classification. Data Science and Engineering, 3(2):150--165, 2018.
[8]
Bé la Bollobá s, Christian Borgs, Jennifer T. Chayes, and Oliver Riordan. Directed scale-free graphs. In SODA, pages 132--139, 2003.
[9]
Pawel Brach, Marek Cygan, Jakub Lkacki, and Piotr Sankowski. Algorithmic complexity of power law networks. In SODA, pages 1306--1325. Society for Industrial and Applied Mathematics, 2016.
[10]
Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703. Springer, 2002.
[11]
Fan R. K. Chung and Lincoln Lu. Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.
[12]
Dá niel Fogaras and Balá zs Rá cz. Scaling link-based similarity search. In WWW, pages 641--650, 2005.
[13]
Dá niel Fogaras, Balá zs Rá cz, Ká roly Csalogá ny, and Tamá s Sarló s. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.
[14]
Yuichiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, and Makoto Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013.
[15]
Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. Parallel simrank computation on large graphs with iterative aggregation. In KDD, pages 543--552, 2010.
[16]
Glen Jeh and Jennifer Widom. Simrank: a measure of structural-context similarity. In SIGKDD, pages 538--543, 2002.
[17]
Minhao Jiang, Ada Wai-Chee Fu, and Raymond Chi-Wing Wong. Reads: a random walk approach for efficient and accurate dynamic simrank. PPVLDB, 10(9):937--948, 2017.
[18]
Ruoming Jin, Victor E Lee, and Hui Hong. Axiomatic ranking of network role similarity. In KDD, pages 922--930, 2011.
[19]
Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014.
[20]
JooYoung Lee and Rustam Tukhvatov. Evaluations of similarity measures on vk for link prediction. Data Science and Engineering, 3(3):277--289, 2018.
[21]
Pei Lee, Laks V. S. Lakshmanan, and Jeffrey Xu Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012.
[22]
Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010.
[23]
Zhenguo Li, Yixiang Fang, Qin Liu, Jiefeng Cheng, Reynold Cheng, and John Lui. Walking in the cloud: Parallel simrank at scale. PVLDB, 9(1):24--35, 2015.
[24]
David Liben-Nowell and Jon M. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019--1031, 2007.
[25]
Yu Liu, Jiaheng Lu, Hua Yang, Xiaokui Xiao, and Zhewei Wei. Towards maximum independent sets on massive graphs. VLDB, 8(13):2122--2133, 2015.
[26]
Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. Probesim: scalable single-source and top-k simrank computations on dynamic graphs. PVLDB, 11(1):14--26, 2017.
[27]
Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. Accuracy estimate and optimization techniques for simrank computation. VLDB J., 19(1):45--66, 2010.
[28]
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016.
[29]
Takanori Maehara, Mitsuru Kusumoto, and Ken-ichi Kawarabayashi. Efficient simrank computation via linearization. CoRR, abs/1411.7228, 2014.
[30]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: bringing order to the web. 1999.
[31]
Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015.
[32]
Nikita Spirin and Jiawei Han. Survey on web spam detection: principles and algorithms. SIGKDD Explorations, 13(2):50--64, 2011.
[33]
Boyu Tian and Xiaokui Xiao. SLING: A near-optimal index structure for simrank. In SIGMOD, pages 1859--1874, 2016.
[34]
Sibo Wang and Yufei Tao. Efficient algorithms for finding approximate heavy hitters in personalized pageranks. In SIGMOD, pages 1113--1127. ACM, 2018.
[35]
Yue Wang, Xiang Lian, and Lei Chen. Efficient simrank tracking in dynamic graphs. In ICDE, pages 545--556. IEEE, 2018.
[36]
Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456. ACM, 2018.
[37]
Wensi Xi, Edward A Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR, pages 130--137. ACM, 2005.
[38]
Qi Ye, Changlei Zhu, Gang Li, Zhimin Liu, and Feng Wang. Using node identifiers and community prior for graph-based classification. Data Science and Engineering, 3(1):68--83, 2018.
[39]
Weiren Yu, Xuemin Lin, and Wenjie Zhang. Fast incremental simrank on link-evolving graphs. In ICDE, pages 304--315, 2014.
[40]
Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, and Jian Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013.
[41]
Weiren Yu and Julie McCann. Gauging correct relative rankings for similarity search. In CIKM, pages 1791--1794, 2015.
[42]
Weiren Yu and Julie A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015.
[43]
Weiren Yu and Julie Ann McCann. High quality graph-based similarity search. In SIGIR, pages 83--92, 2015.
[44]
Weiren Yu, Wenjie Zhang, Xuemin Lin, Qing Zhang, and Jiajin Le. A space and time efficient algorithm for simrank computation. World Wide Web, 15(3):327--353, 2012.
[45]
Hongyang Zhang, Peter Lofgren, and Ashish Goel. Approximate personalized pagerank on dynamic graphs. In KDD, pages 1315--1324, 2016.
[46]
Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. Panther: Fast top-k similarity search on large networks. In SIGKDD, pages 1445--1454. ACM, 2015.
[47]
Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of simrank-based similarity search algorithms. PVLDB, 10(5):601--612, 2017.
[48]
Peixiang Zhao, Jiawei Han, and Yizhou Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562. ACM, 2009.
[49]
Peixiang Zhao, Jiawei Han, and Yizhou Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562, 2009.

Cited By

View all
  • (2024)A Unified Graph Framework for Storage-Compute Coupled Cluster and High-Density Computing ClusterProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3663741.3664790(1-6)Online publication date: 9-Jun-2024
  • (2024)Revisiting Local Computation of PageRank: Simple and OptimalProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649661(911-922)Online publication date: 10-Jun-2024
  • (2024)Efficient Algorithms for Group Hitting Probability Queries on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3349164(1-13)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. personalized pagerank
  2. power-law graphs
  3. simrank

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '19
Sponsor:
SIGMOD/PODS '19: International Conference on Management of Data
June 30 - July 5, 2019
Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)3
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Unified Graph Framework for Storage-Compute Coupled Cluster and High-Density Computing ClusterProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3663741.3664790(1-6)Online publication date: 9-Jun-2024
  • (2024)Revisiting Local Computation of PageRank: Simple and OptimalProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649661(911-922)Online publication date: 10-Jun-2024
  • (2024)Efficient Algorithms for Group Hitting Probability Queries on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3349164(1-13)Online publication date: 2024
  • (2024)Resource Allocation with Service Affinity in Large-Scale Cloud Environments2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00397(5280-5293)Online publication date: 13-May-2024
  • (2023)Estimating Single-Node PageRank in Õ (min{d, √m}) TimeProceedings of the VLDB Endowment10.14778/3611479.361150016:11(2949-2961)Online publication date: 24-Aug-2023
  • (2023)ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy GuaranteeProceedings of the ACM on Management of Data10.1145/35887071:1(1-26)Online publication date: 30-May-2023
  • (2023)Effective and Scalable Manifold Ranking-Based Image Retrieval with Output BoundACM Transactions on Knowledge Discovery from Data10.1145/356557417:5(1-31)Online publication date: 7-Apr-2023
  • (2023)All-Pairs SimRank Updates on Dynamic Graphs2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00050(131-138)Online publication date: 21-Dec-2023
  • (2023)Hierarchical All-Pairs SimRank CalculationDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_17(252-268)Online publication date: 15-Apr-2023
  • (2022)Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge GraphsACM Transactions on Information Systems10.1145/349520940:4(1-45)Online publication date: 11-Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media