More Web Proxy on the site http://driver.im/

research-article

Scalable Graph Sampling on GPUs with Compressed Graph

Authors:

Bin CuiAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 2383 - 2392

https://doi.org/10.1145/3511808.3557443

Published: 17 October 2022 Publication History

Abstract

GPU is a powerful accelerator for parallel computation. Graph sampling is a fundamental technology for large-scale graph analysis and learning. To accelerate graph sampling using GPUs, recently some solutions like NextDoor, C-SAW have been proposed. However, these solutions cannot handle large graphs efficiently because of the massive memory footprint and expensive transfer cost between CPU and GPU. In this work, we introduce a Chunk-wise Graph Compression format (CGC) to effectively reduce the graph size and save the graph transfer cost. Meanwhile, CGC supports fast visiting any single neighbor of a vertex and is friendly to the graph sampling task. Specifically, CGC first balances the graph compression ratio and decompression efficiency by dividing a neighbor vertex list into chunks. Then it applies a new compression strategy called linear estimation to compress each chunk and allows users to visit a single vertex in O(1) time complexity. Finally, based on the CGC, we develop a scalable GPU-based graph sampling framework GraSS, and evaluate the efficiency and scalability of GraSS on both real-world and synthetic graphs. The empirical results demonstrate that GraSS can support various graph sampling methods on large graphs with high efficiency when the state-of-the-art solutions are out-of-memory or exceed the time limit.

References

[1]

Alberto Apostolico and Guido Drovandi. 2009. Graph compression by BFS. Algorithms, Vol. 2, 3 (2009), 1031--1044.

[2]

Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th international conference on World Wide Web. 587--596.

Digital Library

[3]

Paolo Boldi and Sebastiano Vigna. 2004 a. The webgraph framework I: compression techniques. In Proceedings of the 13th international conference on World Wide Web. 595--602.

Digital Library

[4]

Paolo Boldi and Sebastiano Vigna. 2004 b. The webgraph framework ii: Codes for the world-wide web. In Data Compression Conference, 2004. Proceedings. DCC 2004. IEEE, 528.

[5]

Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 international conference on web search and data mining. 95--106.

Digital Library

[6]

Gregory Buehrer, Roberto L de Oliveira, David Fuhry, and Srinivasan Parthasarathy. 2015. Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In 2015 IEEE 31st International Conference on Data Engineering. IEEE, 1071--1082.

[7]

Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations. https://openreview.net/forum?id=rytstxWAW

[8]

Francisco Claude and Gonzalo Navarro. 2010. Fast and compact web graph representations. ACM Transactions on the Web (TWEB), Vol. 4, 4 (2010), 1--31.

Digital Library

[9]

Jialin Dong, Da Zheng, Lin F. Yang, and George Karypis. 2021. Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (Virtual Event, Singapore) (KDD'21). Association for Computing Machinery, New York, NY, USA, 289--299. https://doi.org/10.1145/3447548.3467437

Digital Library

[10]

Tomás Feder and Rajeev Motwani. 1995. Clique partitions, graph compression and speeding-up algorithms. J. Comput. System Sci., Vol. 51, 2 (1995), 261--272.

Digital Library

[11]

Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.

[12]

Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of workshop on GRAph data management experiences and systems. 1--6.

Digital Library

[13]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.

Digital Library

[14]

William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035.

[15]

Qi He, Han Wang, and Yue Zhang. 2020. Enhancing Generalization in Natural Language Inference by Syntax. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 4973--4978. https://doi.org/10.18653/v1/2020.findings-emnlp.447

[16]

Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Accelerating graph sampling for graph machine learning using gpus. In Proceedings of the Sixteenth European Conference on Computer Systems. 311--326.

Digital Library

[17]

Krzysztof Kaczmarski, Piotr Przymus, and Paweł Rzka. zewski. 2015. Improving high-performance GPU graph traversal with compression. In New Trends in Database and Information Systems II. Springer, 201--214.

[18]

Chinmay Karande, Kumar Chellapilla, and Reid Andersen. 2009. Speeding up algorithms on compressed web graphs. Internet Mathematics, Vol. 6, 3 (2009), 373--398.

[19]

Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. 239--252.

Digital Library

[20]

N.J. Larsson and A. Moffat. 1999. Offline dictionary-based compression. In Proceedings DCC'99 Data Compression Conference. 296--305.

[21]

Hongzheng Li, Yingxia Shao, Junping Du, Bin Cui, and Lei Chen. 2022. An I/O-Efficient Disk-Based Graph System for Scalable Second-Order Random Walk of Large Graphs. Proc. VLDB Endow., Vol. 15, 8 (apr 2022), 1619--1631. https://doi.org/10.14778/3529337.3529346

Digital Library

[22]

Yongsub Lim, U Kang, and Christos Faloutsos. 2014. SlashBurn: Graph Compression and Mining beyond Caveman Communities. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 12 (2014), 3077--3089.

[23]

Hang Liu and H Howie Huang. 2019. Simd-x: Programming and processing of graph algorithms on gpus. In 2019 USENIX Annual Technical Conference. 411--428.

[24]

Jiaxu Liu, Yingxia Shao, and Sen Su. 2021. Multiple Local Community Detection via High-Quality Seed Identification over Both Static and Dynamic Networks. Data Science and Engineering, Vol. 6, 3 (2021), 249--264.

[25]

Seung Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu. 2021. Large graph convolutional network training with GPU-oriented data communication architecture. Proceedings of the VLDB Endowment, Vol. 14 (07 2021), 2087--2100. https://doi.org/10.14778/3476249.3476264

Digital Library

[26]

Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming irregular graphs for gpu-friendly graph processing. ACM SIGPLAN Notices, Vol. 53, 2 (2018), 622--636.

Digital Library

[27]

Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S Li, and Hang Liu. 2020. C-SAW: A framework for graph sampling and random walk on GPUs. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1--15.

[28]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.

Digital Library

[29]

Ryan A Rossi and Rong Zhou. 2018. Graphzip: a clique-based sparse graph compression method. Journal of Big Data, Vol. 5, 1 (2018), 1--14.

[30]

Mo Sha, Yuchen Li, and Kian-Lee Tan. 2019. Gpu-based graph traversal on compressed graphs. In Proceedings of the 2019 International Conference on Management of Data. 775--792.

Digital Library

[31]

Yingxia Shao, Shiyue Huang, Xupeng Miao, Bin Cui, and Lei Chen. 2020. Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1797--1812. https://doi.org/10.1145/3318464.3380562

Digital Library

[32]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. 1067--1077.

Digital Library

[33]

Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315 (2019).

[34]

Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, and Minyi Guo. 2021. Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques. 304--317.

[35]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming. 1--12.

Digital Library

[36]

Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data. 1813--1828.

Digital Library

[37]

Shiwen Wu, Yuanxing Zhang, Chengliang Gao, Kaigui Bian, and Bin Cui. 2020. GARG: Anonymous Recommendation of Point-of-Interest in Mobile Networks by Graph Convolution Network. Data Science and Engineering, Vol. 5 (12 2020). https://doi.org/10.1007/s41019-020-00135-z

[38]

Yang Yang and Hongyun Ning. 2017. Block linked list index structure for large data full text retrieval. In 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE, 2123--2128.

[39]

Xingyu Yao, Yingxia Shao, Bin Cui, and Lei Chen. 2021. Uninet: Scalable network representation learning with metropolis-hastings sampling. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 516--527.

[40]

Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In International Conference on Learning Representations.

[41]

Jianlong Zhong and Bingsheng He. 2013. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, Vol. 25, 6 (2013), 1543--1552.

Digital Library

Cited By

Mei JSun SLi CXu CChen CLiu YWang JZhao CHou XGuo MHe BCong X(2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659438
Shao YLi HGu XYin HLi YMiao XZhang WCui BChen L(2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1145/3648358
Ye CLi YHe BLi ZSun J(2024)Large-Scale Graph Label Propagation on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333632936:10(5234-5248)Online publication date: Oct-2024
https://doi.org/10.1109/TKDE.2023.3336329
Show More Cited By

Index Terms

Scalable Graph Sampling on GPUs with Compressed Graph
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
2. Information systems
  1. Data management systems
    1. Data structures

Recommendations

gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning
SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles

Graph sampling prepares training samples for graph learning and can dominate the training time. Due to the increasing algorithm diversity and complexity, existing sampling frameworks are insufficient in the generality of expression and the efficiency ...
Parallel graph component labelling with GPUs and CUDA

Graph component labelling, which is a subset of the general graph colouring problem, is a computationally expensive operation that is of importance in many applications and simulations. A number of data-parallel algorithmic variations to the component ...
Generalized sampling of graph signals with the prior information based on graph fractional Fourier transform
Abstract
The graph fractional Fourier transform (GFRFT) has been applied to graph signal processing and has become an important tool in graph signal processing. However, most of the graph signals are usually non-bandlimited in the GFRFT domain. How to ...
Highlights
- we develop a generalized sampling framework for graph signals based on the GFRFT.
- It allows arbitrary input of the graph signal with or without bandlimited in the GFRFT domain.
- When the space and smoothness are unknown, the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

October 2022

5274 pages

ISBN:9781450392365

DOI:10.1145/3511808

General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

CIKM '22

Sponsor:

CIKM '22: The 31st ACM International Conference on Information and Knowledge Management

October 17 - 21, 2022

GA, Atlanta, USA

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
345
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)8

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mei JSun SLi CXu CChen CLiu YWang JZhao CHou XGuo MHe BCong X(2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659438
Shao YLi HGu XYin HLi YMiao XZhang WCui BChen L(2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1145/3648358
Ye CLi YHe BLi ZSun J(2024)Large-Scale Graph Label Propagation on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333632936:10(5234-5248)Online publication date: Oct-2024
https://doi.org/10.1109/TKDE.2023.3336329
Cheng YSun XLuo Q(2024)RapidGKC: GPU-Accelerated K-Mer Counting2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00292(3810-3822)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00292
Gong ZWang GSun YLiu QNing YXiong HPeng JElkind E(2023)Beyond homophilyProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/234(2104-2113)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/234
Chen CGao DZhang YWang QFu ZZhang XZhu JGu YYu G(2023)NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph StreamsProceedings of the VLDB Endowment10.14778/3632093.363210817:3(455-468)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.14778/3632093.3632108

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents