[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3511808.3557443acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Scalable Graph Sampling on GPUs with Compressed Graph

Published: 17 October 2022 Publication History

Abstract

GPU is a powerful accelerator for parallel computation. Graph sampling is a fundamental technology for large-scale graph analysis and learning. To accelerate graph sampling using GPUs, recently some solutions like NextDoor, C-SAW have been proposed. However, these solutions cannot handle large graphs efficiently because of the massive memory footprint and expensive transfer cost between CPU and GPU. In this work, we introduce a Chunk-wise Graph Compression format (CGC) to effectively reduce the graph size and save the graph transfer cost. Meanwhile, CGC supports fast visiting any single neighbor of a vertex and is friendly to the graph sampling task. Specifically, CGC first balances the graph compression ratio and decompression efficiency by dividing a neighbor vertex list into chunks. Then it applies a new compression strategy called linear estimation to compress each chunk and allows users to visit a single vertex in O(1) time complexity. Finally, based on the CGC, we develop a scalable GPU-based graph sampling framework GraSS, and evaluate the efficiency and scalability of GraSS on both real-world and synthetic graphs. The empirical results demonstrate that GraSS can support various graph sampling methods on large graphs with high efficiency when the state-of-the-art solutions are out-of-memory or exceed the time limit.

References

[1]
Alberto Apostolico and Guido Drovandi. 2009. Graph compression by BFS. Algorithms, Vol. 2, 3 (2009), 1031--1044.
[2]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th international conference on World Wide Web. 587--596.
[3]
Paolo Boldi and Sebastiano Vigna. 2004 a. The webgraph framework I: compression techniques. In Proceedings of the 13th international conference on World Wide Web. 595--602.
[4]
Paolo Boldi and Sebastiano Vigna. 2004 b. The webgraph framework ii: Codes for the world-wide web. In Data Compression Conference, 2004. Proceedings. DCC 2004. IEEE, 528.
[5]
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 international conference on web search and data mining. 95--106.
[6]
Gregory Buehrer, Roberto L de Oliveira, David Fuhry, and Srinivasan Parthasarathy. 2015. Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In 2015 IEEE 31st International Conference on Data Engineering. IEEE, 1071--1082.
[7]
Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations. https://openreview.net/forum?id=rytstxWAW
[8]
Francisco Claude and Gonzalo Navarro. 2010. Fast and compact web graph representations. ACM Transactions on the Web (TWEB), Vol. 4, 4 (2010), 1--31.
[9]
Jialin Dong, Da Zheng, Lin F. Yang, and George Karypis. 2021. Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (Virtual Event, Singapore) (KDD'21). Association for Computing Machinery, New York, NY, USA, 289--299. https://doi.org/10.1145/3447548.3467437
[10]
Tomás Feder and Rajeev Motwani. 1995. Clique partitions, graph compression and speeding-up algorithms. J. Comput. System Sci., Vol. 51, 2 (1995), 261--272.
[11]
Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
[12]
Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of workshop on GRAph data management experiences and systems. 1--6.
[13]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.
[14]
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035.
[15]
Qi He, Han Wang, and Yue Zhang. 2020. Enhancing Generalization in Natural Language Inference by Syntax. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 4973--4978. https://doi.org/10.18653/v1/2020.findings-emnlp.447
[16]
Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Accelerating graph sampling for graph machine learning using gpus. In Proceedings of the Sixteenth European Conference on Computer Systems. 311--326.
[17]
Krzysztof Kaczmarski, Piotr Przymus, and Paweł Rzka. zewski. 2015. Improving high-performance GPU graph traversal with compression. In New Trends in Database and Information Systems II. Springer, 201--214.
[18]
Chinmay Karande, Kumar Chellapilla, and Reid Andersen. 2009. Speeding up algorithms on compressed web graphs. Internet Mathematics, Vol. 6, 3 (2009), 373--398.
[19]
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. 239--252.
[20]
N.J. Larsson and A. Moffat. 1999. Offline dictionary-based compression. In Proceedings DCC'99 Data Compression Conference. 296--305.
[21]
Hongzheng Li, Yingxia Shao, Junping Du, Bin Cui, and Lei Chen. 2022. An I/O-Efficient Disk-Based Graph System for Scalable Second-Order Random Walk of Large Graphs. Proc. VLDB Endow., Vol. 15, 8 (apr 2022), 1619--1631. https://doi.org/10.14778/3529337.3529346
[22]
Yongsub Lim, U Kang, and Christos Faloutsos. 2014. SlashBurn: Graph Compression and Mining beyond Caveman Communities. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 12 (2014), 3077--3089.
[23]
Hang Liu and H Howie Huang. 2019. Simd-x: Programming and processing of graph algorithms on gpus. In 2019 USENIX Annual Technical Conference. 411--428.
[24]
Jiaxu Liu, Yingxia Shao, and Sen Su. 2021. Multiple Local Community Detection via High-Quality Seed Identification over Both Static and Dynamic Networks. Data Science and Engineering, Vol. 6, 3 (2021), 249--264.
[25]
Seung Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu. 2021. Large graph convolutional network training with GPU-oriented data communication architecture. Proceedings of the VLDB Endowment, Vol. 14 (07 2021), 2087--2100. https://doi.org/10.14778/3476249.3476264
[26]
Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming irregular graphs for gpu-friendly graph processing. ACM SIGPLAN Notices, Vol. 53, 2 (2018), 622--636.
[27]
Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S Li, and Hang Liu. 2020. C-SAW: A framework for graph sampling and random walk on GPUs. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1--15.
[28]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.
[29]
Ryan A Rossi and Rong Zhou. 2018. Graphzip: a clique-based sparse graph compression method. Journal of Big Data, Vol. 5, 1 (2018), 1--14.
[30]
Mo Sha, Yuchen Li, and Kian-Lee Tan. 2019. Gpu-based graph traversal on compressed graphs. In Proceedings of the 2019 International Conference on Management of Data. 775--792.
[31]
Yingxia Shao, Shiyue Huang, Xupeng Miao, Bin Cui, and Lei Chen. 2020. Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1797--1812. https://doi.org/10.1145/3318464.3380562
[32]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. 1067--1077.
[33]
Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315 (2019).
[34]
Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, and Minyi Guo. 2021. Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques. 304--317.
[35]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming. 1--12.
[36]
Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data. 1813--1828.
[37]
Shiwen Wu, Yuanxing Zhang, Chengliang Gao, Kaigui Bian, and Bin Cui. 2020. GARG: Anonymous Recommendation of Point-of-Interest in Mobile Networks by Graph Convolution Network. Data Science and Engineering, Vol. 5 (12 2020). https://doi.org/10.1007/s41019-020-00135-z
[38]
Yang Yang and Hongyun Ning. 2017. Block linked list index structure for large data full text retrieval. In 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE, 2123--2128.
[39]
Xingyu Yao, Yingxia Shao, Bin Cui, and Lei Chen. 2021. Uninet: Scalable network representation learning with metropolis-hastings sampling. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 516--527.
[40]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In International Conference on Learning Representations.
[41]
Jianlong Zhong and Bingsheng He. 2013. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, Vol. 25, 6 (2013), 1543--1552.

Cited By

View all
  • (2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 31-May-2024
  • (2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
  • (2024)Large-Scale Graph Label Propagation on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333632936:10(5234-5248)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
October 2022
5274 pages
ISBN:9781450392365
DOI:10.1145/3511808
  • General Chairs:
  • Mohammad Al Hasan,
  • Li Xiong
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. gpu
  2. graph compressing
  3. graph sampling

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '22
Sponsor:

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)8
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 31-May-2024
  • (2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
  • (2024)Large-Scale Graph Label Propagation on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333632936:10(5234-5248)Online publication date: Oct-2024
  • (2024)RapidGKC: GPU-Accelerated K-Mer Counting2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00292(3810-3822)Online publication date: 13-May-2024
  • (2023)Beyond homophilyProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/234(2104-2113)Online publication date: 19-Aug-2023
  • (2023)NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph StreamsProceedings of the VLDB Endowment10.14778/3632093.363210817:3(455-468)Online publication date: 1-Nov-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media