More Web Proxy on the site http://driver.im/

research-article

Open access

ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy Guarantee

Authors:

Bo BaiAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 1

Article No.: 27, Pages 1 - 26

https://doi.org/10.1145/3588707

Published: 30 May 2023 Publication History

Abstract

SimRank is an important metric to measure the topological similarity between two nodes in a graph. In particular, single-source and top-k SimRank has numerous applications in recommendation systems, network analysis, and web mining, etc. Mathematically, given a vertex, the computation of single-machine and single-source SimRank mainly lies in matrix-matrix operations. However, it is almost impossible to directly compute on large graphs. Thus, existing works yield to two main operations: a series of random walks, and sparse matrix and dense vector multiplication operations. This brings about high computation cost for SimRank on large graphs. In real-world applications, there is always the query time and accuracy trade-off, which hinders the computation of high-precision SimRank on large-scale graphs.

To handle this problem, this paper proposesClipSim, the first GPU-friendly parallel framework that accelerates the single-source SimRank on GPU with accuracy guarantee. We design a novel data structure and GPU-friendly parallel algorithms for efficient computation of all the operations of SimRank on GPU. Moreover, our theoretical derivation enables ClipSim to largely reduce the number of random walks required for each node, while maintaining the same theoretical accuracy as the state-of-the-art algorithm, ExactSim. We conduct extensive experiments on real-world and synthetic datasets to demonstrate the accuracy and efficiency of ClipSim. The results show that compared with ExactSim, ClipSim obtains single-source SimRank vectors with the same accuracy and up to 160× faster computation time.

Supplemental Material

MP4 File

Presentation video for SIGMOD 2023

Download
68.13 MB

References

[1]

Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. SimRank: Query rewriting through link analysis of the clickgraph (poster). In Proceedings of the 17th International Conference on World Wide Web. Beijing, China, 1177--1178.

Digital Library

[2]

Piotr Bialas and Adam Strzelecki. 2015. Benchmarking the cost of thread divergence in CUDA. In International Conference on Parallel Processing and Applied Mathematics. Springer, Krakow, Poland, 570--579.

[3]

Aydin Buluc and John R Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, Miami, Florida, USA, 1--11.

[4]

Rohit Chandra, Leo Dagum, David Kohr, Ramesh Menon, Dror Maydan, and Jeff McDonald. 2001. Parallel programming in OpenMP. Morgan kaufmann.

[5]

Nasrollah Etemadi. 1981. An elementary proof of the strong law of large numbers. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, Vol. 55, 1 (1981), 119--122.

[6]

Dániel Fogaras and Balázs Rácz. 2005. Scaling link-based similarity search. In Proceedings of the 14th International Conference on World Wide Web. Chiba, Japan, 641--650.

Digital Library

[7]

Joseph L Greathouse and Mayank Daga. 2014. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, Louisiana, USA, 769--780.

Digital Library

[8]

Taher H Haveliwala. 2003. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, Vol. 15, 4 (2003), 784--796.

Digital Library

[9]

W. Hoeffding. 1963. Probability inequalities for sums of bounded random variables. In J. Amer. Statist. Assoc. 13--30.

[10]

William George Horner. 1819. XXI. A new method of solving numerical equations of all orders, by continuous approximation. Philosophical Transactions of the Royal Society of London 109 ( 1819), 308--335.

[11]

Ankush Jain, Surendra Nagar, Pramod Kumar Singh, and Joydip Dhar. 2020. EMUCF: Enhanced multistage user-based collaborative filtering through non-linear similarity for recommendation systems. Expert Systems with Applications, Vol. 161 (2020), 113724.

[12]

Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton Alberta, Canada, 538--543.

Digital Library

[13]

Ruoming Jin, Victor E Lee, and Hui Hong. 2011. Axiomatic ranking of network role similarity. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, 922--930.

Digital Library

[14]

Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. 2014. Scalable similarity search for SimRank. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. Snowbird, Utah, USA, 325--336.

Digital Library

[15]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web. Raleigh, North Carolina, USA, 591--600.

Digital Library

[16]

Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. 2017. ProbeSim: scalable single-source and top-k simrank computations on dynamic graphs. Proceedings of the VLDB Endowment, Vol. 11, 1 (2017), 14--26.

Digital Library

[17]

Shengliang Lu, Bingsheng He, Yuchen Li, and Hao Fu. 2020. Accelerating exact constrained shortest paths on GPUs. Proceedings of the VLDB Endowment, Vol. 14, 4 (2020), 547--559.

Digital Library

[18]

Takanori Maehara, Mitsuru Kusumoto, and Ken-ichi Kawarabayashi. 2014. Efficient SimRank computation via linearization. arXiv preprint arXiv:1411.7228 (2014).

[19]

Sara Makki, Rafiqul Haque, Yehia Taher, Zainab Assaghir, Mohand-Said Hacid, and Hassan Zeineddine. 2018. A cost-sensitive cosine similarity K-nearest neighbor for credit card fraud detection. In Big Data and Cyber-Security Intelligence. Hadath, Lebanon, 1--6.

[20]

Maxim Naumov, L Chien, Philippe Vandermersch, and Ujval Kapasi. 2010. Cusparse library. In GPU Technology Conference. San Jose, California, USA, 1--96.

[21]

Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S Li, and Hang Liu. 2020. C-SAW: A framework for graph sampling and random walk on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Atlanta, Georgia, USA, 1--15.

[22]

Ryan A Rossi and Nesreen K Ahmed. 2015. The network data repository with interactive graph analytics and visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Austin Texas, USA, 4292--4293.

Digital Library

[23]

Jieming Shi, Tianyuan Jin, Renchi Yang, Xiaokui Xiao, and Yin Yang. 2020. Realtime index-free single source SimRank processing on web-scale graphs. Proceedings of the VLDB Endowment, Vol. 13, 7 (2020), 966--980.

Digital Library

[24]

Jieming Shi, Renchi Yang, Tianyuan Jin, Xiaokui Xiao, and Yin Yang. 2019. Realtime top-k personalized pagerank over large graphs on GPUs. Proceedings of the VLDB Endowment, Vol. 13, 1 (2019), 15--28.

Digital Library

[25]

Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Shenzhen, China, 135--146.

Digital Library

[26]

Boyu Tian and Xiaokui Xiao. 2016. SLING: A near-optimal index structure for SimRank. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data. San Francisco, California, USA, 1859--1874.

Digital Library

[27]

Xiang Tian and Khaled Benkrid. 2009. Mersenne twister random number generation on FPGA, CPU and GPU. In NASA/ESA Conference on Adaptive Hardware and Systems. IEEE, San Francisco, California, USA, 460--464.

[28]

Darya Gennadievna Vorotnikova and Dmitriy L Golovashkin. 2014. CUBLAS-aided long vector algorithms. Journal of Mathematical Modelling and Algorithms in Operations Research, Vol. 13, 4 (2014), 425--431.

[29]

Hanzhi Wang, Zhewei Wei, Yu Liu, Ye Yuan, Xiaoyong Du, and Ji-Rong Wen. 2021. ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths. The VLDB Journal, Vol. 30, 6 (2021), 989--1015.

Digital Library

[30]

Hanzhi Wang, Zhewei Wei, Ye Yuan, Xiaoyong Du, and Ji-Rong Wen. 2020b. Exact single-source SimRank computation on large graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Portland, Oregon, USA, 653--663.

Digital Library

[31]

Yue Wang, Yulin Che, Xiang Lian, Lei Chen, and Qiong Luo. 2020a. Fast and accurate simrank computation via forward local push and its parallelization. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 12 (2020), 3686--3700.

[32]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1--12.

Digital Library

[33]

Yue Wang, Ruiqi Xu, Zonghao Feng, Yulin Che, Lei Chen, Qiong Luo, and Rui Mao. 2020c. DISK: a distributed framework for single-source simrank with accuracy guarantee. Proceedings of the VLDB Endowment, Vol. 14, 3 (2020), 351--363.

Digital Library

[34]

Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen. 2019. PRSim: Sublinear time simrank computation on large power-law graphs. In Proceedings of the 2019 ACM SIGMOD International Conference on Management of Data. Amsterdam, Netherlands, 1042--1059.

Digital Library

[35]

Weiren Yu, Sima Iranmanesh, Aparajita Haldar, Maoyin Zhang, and Hakan Ferhatosmanoglu. 2021. RoleSim*: Scaling axiomatic role-based similarity ranking on large graphs. World Wide Web (2021), 1--45.

[36]

Weiren Yu, Xuemin Lin, Wenjie Zhang, Jian Pei, and Julie A McCann. 2019. SimRank*: Effective and scalable pairwise similarity search based on graph topology. The VLDB Journal, Vol. 28, 3 (2019), 401--426.

Digital Library

[37]

Weiren Yu and Julie A McCann. 2015. Efficient partial-pairs SimRank search on large networks. Proceedings of the VLDB Endowment, Vol. 8, 5 (2015), 569--580.

Digital Library

[38]

Fan Zhang, Xuemin Lin, Ying Zhang, Lu Qin, and Wenjie Zhang. 2019. Efficient community discovery with user engagement and similarity. The VLDB Journal, Vol. 28, 6 (2019), 987--1012.

[39]

Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. 2015. Panther: Fast top-k similarity search on large networks. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. Sydney, Australia, 1445--1454.

Digital Library

Cited By

Qiu XHu JZhou LWu XDu JZhang BGuo CZhou AJensen CSheng ZYang B(2024)TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsProceedings of the VLDB Endowment10.14778/3665844.366586317:9(2363-2377)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3665844.3665863
Fan SHou MXi RMa WSerra ESpezzano F(2024)Precision Meets Resilience: Cross-Database Generalization with Uncertainty Quantification for Robust Cost EstimationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679632(581-590)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679632
Feng JLi ZChen QLiu H(2024)Automating localized learning for cardinality estimation based on XGBoostKnowledge and Information Systems10.1007/s10115-024-02142-266:7(3825-3854)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10115-024-02142-2
Show More Cited By

Index Terms

ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy Guarantee
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Exact Single-Source SimRank Computation on Large Graphs
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-k SimRank queries have received increasing attention due to their applications in web mining, social ...
ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths
Abstract
SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-k SimRank queries have received increasing attention due to their applications in web mining, social ... $^{}$
Accelerating pairwise SimRank estimation over static and dynamic graphs

Measuring similarities among different vertices is a fundamental problem in graph analysis. Among different similarity measurements, SimRank is one of the most promising and popular. In reality, instead of computing the whole similarity matrix, people ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 1

PACMMOD

May 2023

2807 pages

EISSN:2836-6573

DOI:10.1145/3603164

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023

Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
789
Total Downloads

Downloads (Last 12 months)540
Downloads (Last 6 weeks)87

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qiu XHu JZhou LWu XDu JZhang BGuo CZhou AJensen CSheng ZYang B(2024)TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsProceedings of the VLDB Endowment10.14778/3665844.366586317:9(2363-2377)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3665844.3665863
Fan SHou MXi RMa WSerra ESpezzano F(2024)Precision Meets Resilience: Cross-Database Generalization with Uncertainty Quantification for Robust Cost EstimationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679632(581-590)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679632
Feng JLi ZChen QLiu H(2024)Automating localized learning for cardinality estimation based on XGBoostKnowledge and Information Systems10.1007/s10115-024-02142-266:7(3825-3854)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10115-024-02142-2
Wu XWu XYang BZhou LGuo CQiu XHu JSheng ZJensen C(2024)AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecastingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00872-x33:5(1743-1770)Online publication date: 30-Jul-2024
https://dl.acm.org/doi/10.1007/s00778-024-00872-x
Liu JLou JQin ZRen KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Certified minimax unlearning with generalization rates and deletion capacityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668866(62821-62852)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668866
Wu SWu QDong HHua WZhou X(2023)Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity ResolutionProceedings of the VLDB Endowment10.14778/3632093.363209617:3(292-304)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.14778/3632093.3632096

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents