[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy Guarantee

Published: 30 May 2023 Publication History

Abstract

SimRank is an important metric to measure the topological similarity between two nodes in a graph. In particular, single-source and top-k SimRank has numerous applications in recommendation systems, network analysis, and web mining, etc. Mathematically, given a vertex, the computation of single-machine and single-source SimRank mainly lies in matrix-matrix operations. However, it is almost impossible to directly compute on large graphs. Thus, existing works yield to two main operations: a series of random walks, and sparse matrix and dense vector multiplication operations. This brings about high computation cost for SimRank on large graphs. In real-world applications, there is always the query time and accuracy trade-off, which hinders the computation of high-precision SimRank on large-scale graphs.
To handle this problem, this paper proposesClipSim, the first GPU-friendly parallel framework that accelerates the single-source SimRank on GPU with accuracy guarantee. We design a novel data structure and GPU-friendly parallel algorithms for efficient computation of all the operations of SimRank on GPU. Moreover, our theoretical derivation enables ClipSim to largely reduce the number of random walks required for each node, while maintaining the same theoretical accuracy as the state-of-the-art algorithm, ExactSim. We conduct extensive experiments on real-world and synthetic datasets to demonstrate the accuracy and efficiency of ClipSim. The results show that compared with ExactSim, ClipSim obtains single-source SimRank vectors with the same accuracy and up to 160× faster computation time.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023

References

[1]
Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. SimRank: Query rewriting through link analysis of the clickgraph (poster). In Proceedings of the 17th International Conference on World Wide Web. Beijing, China, 1177--1178.
[2]
Piotr Bialas and Adam Strzelecki. 2015. Benchmarking the cost of thread divergence in CUDA. In International Conference on Parallel Processing and Applied Mathematics. Springer, Krakow, Poland, 570--579.
[3]
Aydin Buluc and John R Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, Miami, Florida, USA, 1--11.
[4]
Rohit Chandra, Leo Dagum, David Kohr, Ramesh Menon, Dror Maydan, and Jeff McDonald. 2001. Parallel programming in OpenMP. Morgan kaufmann.
[5]
Nasrollah Etemadi. 1981. An elementary proof of the strong law of large numbers. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, Vol. 55, 1 (1981), 119--122.
[6]
Dániel Fogaras and Balázs Rácz. 2005. Scaling link-based similarity search. In Proceedings of the 14th International Conference on World Wide Web. Chiba, Japan, 641--650.
[7]
Joseph L Greathouse and Mayank Daga. 2014. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, Louisiana, USA, 769--780.
[8]
Taher H Haveliwala. 2003. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, Vol. 15, 4 (2003), 784--796.
[9]
W. Hoeffding. 1963. Probability inequalities for sums of bounded random variables. In J. Amer. Statist. Assoc. 13--30.
[10]
William George Horner. 1819. XXI. A new method of solving numerical equations of all orders, by continuous approximation. Philosophical Transactions of the Royal Society of London 109 ( 1819), 308--335.
[11]
Ankush Jain, Surendra Nagar, Pramod Kumar Singh, and Joydip Dhar. 2020. EMUCF: Enhanced multistage user-based collaborative filtering through non-linear similarity for recommendation systems. Expert Systems with Applications, Vol. 161 (2020), 113724.
[12]
Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton Alberta, Canada, 538--543.
[13]
Ruoming Jin, Victor E Lee, and Hui Hong. 2011. Axiomatic ranking of network role similarity. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, 922--930.
[14]
Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. 2014. Scalable similarity search for SimRank. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. Snowbird, Utah, USA, 325--336.
[15]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web. Raleigh, North Carolina, USA, 591--600.
[16]
Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. 2017. ProbeSim: scalable single-source and top-k simrank computations on dynamic graphs. Proceedings of the VLDB Endowment, Vol. 11, 1 (2017), 14--26.
[17]
Shengliang Lu, Bingsheng He, Yuchen Li, and Hao Fu. 2020. Accelerating exact constrained shortest paths on GPUs. Proceedings of the VLDB Endowment, Vol. 14, 4 (2020), 547--559.
[18]
Takanori Maehara, Mitsuru Kusumoto, and Ken-ichi Kawarabayashi. 2014. Efficient SimRank computation via linearization. arXiv preprint arXiv:1411.7228 (2014).
[19]
Sara Makki, Rafiqul Haque, Yehia Taher, Zainab Assaghir, Mohand-Said Hacid, and Hassan Zeineddine. 2018. A cost-sensitive cosine similarity K-nearest neighbor for credit card fraud detection. In Big Data and Cyber-Security Intelligence. Hadath, Lebanon, 1--6.
[20]
Maxim Naumov, L Chien, Philippe Vandermersch, and Ujval Kapasi. 2010. Cusparse library. In GPU Technology Conference. San Jose, California, USA, 1--96.
[21]
Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S Li, and Hang Liu. 2020. C-SAW: A framework for graph sampling and random walk on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Atlanta, Georgia, USA, 1--15.
[22]
Ryan A Rossi and Nesreen K Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Austin Texas, USA, 4292--4293.
[23]
Jieming Shi, Tianyuan Jin, Renchi Yang, Xiaokui Xiao, and Yin Yang. 2020. Realtime index-free single source SimRank processing on web-scale graphs. Proceedings of the VLDB Endowment, Vol. 13, 7 (2020), 966--980.
[24]
Jieming Shi, Renchi Yang, Tianyuan Jin, Xiaokui Xiao, and Yin Yang. 2019. Realtime top-k personalized pagerank over large graphs on GPUs. Proceedings of the VLDB Endowment, Vol. 13, 1 (2019), 15--28.
[25]
Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Shenzhen, China, 135--146.
[26]
Boyu Tian and Xiaokui Xiao. 2016. SLING: A near-optimal index structure for SimRank. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data. San Francisco, California, USA, 1859--1874.
[27]
Xiang Tian and Khaled Benkrid. 2009. Mersenne twister random number generation on FPGA, CPU and GPU. In NASA/ESA Conference on Adaptive Hardware and Systems. IEEE, San Francisco, California, USA, 460--464.
[28]
Darya Gennadievna Vorotnikova and Dmitriy L Golovashkin. 2014. CUBLAS-aided long vector algorithms. Journal of Mathematical Modelling and Algorithms in Operations Research, Vol. 13, 4 (2014), 425--431.
[29]
Hanzhi Wang, Zhewei Wei, Yu Liu, Ye Yuan, Xiaoyong Du, and Ji-Rong Wen. 2021. ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths. The VLDB Journal, Vol. 30, 6 (2021), 989--1015.
[30]
Hanzhi Wang, Zhewei Wei, Ye Yuan, Xiaoyong Du, and Ji-Rong Wen. 2020b. Exact single-source SimRank computation on large graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Portland, Oregon, USA, 653--663.
[31]
Yue Wang, Yulin Che, Xiang Lian, Lei Chen, and Qiong Luo. 2020a. Fast and accurate simrank computation via forward local push and its parallelization. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 12 (2020), 3686--3700.
[32]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1--12.
[33]
Yue Wang, Ruiqi Xu, Zonghao Feng, Yulin Che, Lei Chen, Qiong Luo, and Rui Mao. 2020c. DISK: a distributed framework for single-source simrank with accuracy guarantee. Proceedings of the VLDB Endowment, Vol. 14, 3 (2020), 351--363.
[34]
Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen. 2019. PRSim: Sublinear time simrank computation on large power-law graphs. In Proceedings of the 2019 ACM SIGMOD International Conference on Management of Data. Amsterdam, Netherlands, 1042--1059.
[35]
Weiren Yu, Sima Iranmanesh, Aparajita Haldar, Maoyin Zhang, and Hakan Ferhatosmanoglu. 2021. RoleSim*: Scaling axiomatic role-based similarity ranking on large graphs. World Wide Web (2021), 1--45.
[36]
Weiren Yu, Xuemin Lin, Wenjie Zhang, Jian Pei, and Julie A McCann. 2019. SimRank*: Effective and scalable pairwise similarity search based on graph topology. The VLDB Journal, Vol. 28, 3 (2019), 401--426.
[37]
Weiren Yu and Julie A McCann. 2015. Efficient partial-pairs SimRank search on large networks. Proceedings of the VLDB Endowment, Vol. 8, 5 (2015), 569--580.
[38]
Fan Zhang, Xuemin Lin, Ying Zhang, Lu Qin, and Wenjie Zhang. 2019. Efficient community discovery with user engagement and similarity. The VLDB Journal, Vol. 28, 6 (2019), 987--1012.
[39]
Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. 2015. Panther: Fast top-k similarity search on large networks. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. Sydney, Australia, 1445--1454.

Cited By

View all
  • (2024)TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsProceedings of the VLDB Endowment10.14778/3665844.366586317:9(2363-2377)Online publication date: 6-Aug-2024
  • (2024)Precision Meets Resilience: Cross-Database Generalization with Uncertainty Quantification for Robust Cost EstimationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679632(581-590)Online publication date: 21-Oct-2024
  • (2024)Automating localized learning for cardinality estimation based on XGBoostKnowledge and Information Systems10.1007/s10115-024-02142-266:7(3825-3854)Online publication date: 1-Jun-2024
  • Show More Cited By

Index Terms

  1. ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy Guarantee

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 1
    PACMMOD
    May 2023
    2807 pages
    EISSN:2836-6573
    DOI:10.1145/3603164
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2023
    Published in PACMMOD Volume 1, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. GPU acceleration
    2. SimRank
    3. exact computation
    4. random walk

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)540
    • Downloads (Last 6 weeks)87
    Reflects downloads up to 19 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsProceedings of the VLDB Endowment10.14778/3665844.366586317:9(2363-2377)Online publication date: 6-Aug-2024
    • (2024)Precision Meets Resilience: Cross-Database Generalization with Uncertainty Quantification for Robust Cost EstimationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679632(581-590)Online publication date: 21-Oct-2024
    • (2024)Automating localized learning for cardinality estimation based on XGBoostKnowledge and Information Systems10.1007/s10115-024-02142-266:7(3825-3854)Online publication date: 1-Jun-2024
    • (2024)AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecastingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00872-x33:5(1743-1770)Online publication date: 30-Jul-2024
    • (2023)Certified minimax unlearning with generalization rates and deletion capacityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668866(62821-62852)Online publication date: 10-Dec-2023
    • (2023)Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity ResolutionProceedings of the VLDB Endowment10.14778/3632093.363209617:3(292-304)Online publication date: 1-Nov-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media