[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Efficient partial-pairs simrank search on large networks

Published: 01 January 2015 Publication History

Abstract

The assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that "two nodes are similar if their in-neighbors are similar". While most existing work retrieving SimRank only considers all-pairs SimRank s(*, *) and single-source SimRank s(*, j) (scores between every node and query j), there are appealing applications for partial-pairs SimRank, e.g., similarity join. Given two node subsets A and B in a graph, partial-pairs SimRank assessment aims to retrieve only {s(a, b)}aεA,∀bεB. However, the best-known solution appears not self-contained since it hinges on the premise that the SimRank scores with node-pairs in an h-go cover set must be given beforehand.
This paper focuses on efficient assessment of partial-pairs SimRank in a self-contained manner. (1) We devise a novel "seed germination" model that computes partial-pairs SimRank in O(k|E| min{|A|, |B|}) time and O(|E| + k|V|) memory for k iterations on a graph of |V| nodes and |E| edges. (2) We further eliminate unnecessary edge access to improve the time of partial-pairs SimRank to O(m min{|A|, |B|}), where m ≤ min{k|E|, Δ2k}, and Δ is the maximum degree. (3) We show that our partial-pairs SimRank model also can handle the computations of all-pairs and single-source SimRanks. (4) We empirically verify that our algorithms are (a) 38x faster than the best-known competitors, and (b) memory-efficient, allowing scores to be assessed accurately on graphs with tens of millions of links.

References

[1]
I. Antonellis, H. G. Molina, and C. Chang. SimRank++: Query rewriting through link analysis of the click graph. PVLDB, 1(1): 408--421, 2008.
[2]
P. Berkhin. Survey: A survey on PageRank computing. Internet Mathematics, 2(1): 73--120, 2005.
[3]
D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW, pages 641--650, 2005.
[4]
Y. Fujiwara, M. Nakatsuji, H. Shiokawa, and M. Onizuka. Efficient search algorithm for SimRank. In ICDE, pages 589--600, 2013.
[5]
G. He, H. Feng, C. Li, and H. Chen. Parallel SimRank computation on large graphs with iterative aggregation. In KDD, pages 543--552, 2010.
[6]
J. He, H. Liu, J. X. Yu, P. Li, W. He, and X. Du. Assessing single-pair similarity over graphs by aggregating first-meeting probabilities. Inf. Syst., 42: 107--122, 2014.
[7]
G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. In KDD, pages 538--543, 2002.
[8]
R. Jin, V. E. Lee, and H. Hong. Axiomatic ranking of network role similarity. In KDD, pages 922--930, 2011.
[9]
M. Kusumoto, T. Maehara, and K. ichi Kawarabayashi. Scalable similarity search for SimRank. In SIGMOD Conference, pages 325--336, 2014.
[10]
P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012.
[11]
C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of SimRank for static and dynamic information networks. In EDBT, pages 465--476, 2010.
[12]
P. Li, H. Liu, J. X. Yu, J. He, and X. Du. Fast single-pair SimRank computation. In SDM, pages 571--582, 2010.
[13]
D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for SimRank computation. VLDB J., 19(1): 45--66, 2010.
[14]
L. Sun, R. Cheng, X. Li, D. W. Cheung, and J. Han. On link-based similarity join. PVLDB, 4(11): 714--725, 2011.
[15]
W. Yu, X. Lin, and J. Le. Taming computational complexity: Efficient and parallel SimRank optimizations on undirected graphs. In WAIM, pages 280--296, 2010.
[16]
W. Yu, X. Lin, and W. Zhang. Towards efficient SimRank computation on large networks. In ICDE, pages 601--612, 2013.
[17]
W. Yu, X. Lin, and W. Zhang. Fast incremental SimRank on link-evolving graphs. In ICDE, pages 304--315, 2014.
[18]
W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7: 13--24, 2013.
[19]
W. Yu and J. A. McCann. Sig-SR: SimRank search over singular graphs. In SIGIR, pages 859--862, 2014.
[20]
W. Zheng, L. Zou, Y. Feng, L. Chen, and D. Zhao. Efficient SimRank-based similarity join over large graphs. PVLDB, 6(7): 493--504, 2013.

Cited By

View all
  • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
  • (2023)ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy GuaranteeProceedings of the ACM on Management of Data10.1145/35887071:1(1-26)Online publication date: 30-May-2023
  • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
  • Show More Cited By
  1. Efficient partial-pairs simrank search on large networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 8, Issue 5
    January 2015
    181 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 January 2015
    Published in PVLDB Volume 8, Issue 5

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
    • (2023)ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy GuaranteeProceedings of the ACM on Management of Data10.1145/35887071:1(1-26)Online publication date: 30-May-2023
    • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
    • (2023)SimSky: An Accuracy-Aware Algorithm for Single-Source SimRank SearchMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43418-1_14(226-241)Online publication date: 18-Sep-2023
    • (2022)Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge GraphsACM Transactions on Information Systems10.1145/349520940:4(1-45)Online publication date: 11-Jan-2022
    • (2021)DiskProceedings of the VLDB Endowment10.14778/3430915.343092514:3(351-363)Online publication date: 9-Dec-2021
    • (2021)RoleSim*: Scaling axiomatic role-based similarity ranking on large graphsWorld Wide Web10.1007/s11280-021-00925-z25:2(785-829)Online publication date: 11-Aug-2021
    • (2021)ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truthsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00672-730:6(989-1015)Online publication date: 5-Jun-2021
    • (2021)Efficient structural node similarity computation on billion-scale graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00654-930:3(471-493)Online publication date: 23-Feb-2021
    • (2020)SimTabProceedings of the VLDB Endowment10.14778/3407790.340781913:12(2202-2214)Online publication date: 14-Sep-2020
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media