More Web Proxy on the site http://driver.im/

research-article

Public Access

Accelerating graph sampling for graph machine learning using GPUs

Authors:

Abhinav Jangda,

Sandeep Polisetty,

Marco SerafiniAuthors Info & Claims

EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems

Pages 311 - 326

https://doi.org/10.1145/3447786.3456244

Published: 21 April 2021 Publication History

Abstract

Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and Graph-SAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling.

Sampling is an "embarrassingly parallel" problem and may appear to lend itself to GPU acceleration, but the irregularity of graphs makes it hard to use GPU resources effectively. This paper presents NextDoor, a system designed to effectively perform graph sampling on GPUs. NextDoor employs a new approach to graph sampling that we call transit-parallelism, which allows load balancing and caching of edges. NextDoor provides end-users with a high-level abstraction for writing a variety of graph sampling algorithms. We implement several graph sampling applications, and show that NextDoor runs them orders of magnitude faster than existing systems.

References

[1]

Accessed in Feb 2021. NVIDIA CUB. https://nvlabs.github.io/cub/

[2]

Hongzhi Chen, Xiaoxi Wang, Chenghuan Huang, Juncheng Fang, Yifan Hou, Changji Li, and James Cheng. 2019. Large Scale Graph Mining with G-Miner. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19).

Digital Library

[3]

Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations (ICLR'18).

[4]

Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. ACM Trans. Parallel Comput. (2019).

[5]

Xuhao Chen, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2020. Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU. Proc. VLDB Endow. (2020).

Digital Library

[6]

Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19).

Digital Library

[7]

Weilin Cong, Rana Forsati, Mahmut Kandemir, and Mehrdad Mahdavi. 2020. Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20).

Digital Library

[8]

Vinicius Dias, Carlos H. C. Teixeira, Dorgival Guedes, Wagner Meira, and Srinivasan Parthasarathy. 2019. Fractal: A General-Purpose Graph Pattern Mining System. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19).

Digital Library

[9]

Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. Map-Graph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs. In Proceedings of Workshop on GRAph Data Management Experiences and Systems (GRADES'14).

Digital Library

[10]

Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-Scale Learnable Graph Convolutional Networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD' 18).

Digital Library

[11]

Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12).

Digital Library

[12]

Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16).

Digital Library

[13]

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17).

Digital Library

[14]

Taher H. Haveliwala. 2002. Topic-Sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web (WWW '02).

[15]

Anand Padmanabha Iyer, Zaoxing Liu, Xin Jin, Shivaram Venkataraman, Vladimir Braverman, and Ion Stoica. 2018. ASAP: Fast, Approximate Graph Pattern Mining at Scale. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'18).

[16]

Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems (2020).

[17]

Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-Centric Graph Processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC '14).

[18]

Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.

[19]

Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and Processing of Graph Algorithms on GPUs. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '19).

[20]

Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph Hellerstein. 2010. GraphLab: A New Framework for Parallel Machine Learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI'10).

[21]

Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: parallel deep neural network computation on large graphs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19).

[22]

Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10).

Digital Library

[23]

Daniel Mawhirter and Bo Wu. 2019. AutoMine: Harmonizing High-Level Abstraction and High Performance for Graph Mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19).

Digital Library

[24]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[25]

Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13).

Digital Library

[26]

Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18).

Digital Library

[27]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.

[28]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14).

Digital Library

[29]

Abdul Quamar, Amol Deshpande, and Jimmy Lin. 2016. NScale: neighborhood-centric large-scale graph analytics in the cloud. The VLDB Journal (2016).

[30]

Bruno Ribeiro and Don Towsley. 2010. Estimating and Sampling Graphs with Multidimensional Random Walks. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC '10).

Digital Library

[31]

Amir Hossein Nodehi Sabet, Zhijia Zhao, and Rajiv Gupta. 2020. Subway: Minimizing Data Transfer during out-of-GPU-Memory Graph Processing. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys '20).

Digital Library

[32]

Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph processing on GPUs: A survey. ACM Computing Surveys (CSUR) (2018).

Digital Library

[33]

Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. SIGPLAN Not. (2013).

Digital Library

[34]

Elias Stehle and Hans-Arno Jacobsen. 2017. A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17).

Digital Library

[35]

Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: A System for Distributed Graph Mining. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15).

Digital Library

[36]

Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, and Guoqing Harry Xu. 2018. RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on a Single Machine. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'18).

[37]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A High-Performance Graph Processing Library on the GPU. SIGPLAN Not. (2016).

Digital Library

[38]

Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang. 2019. KnightKing: A Fast Distributed Graph Random Walk Engine. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19).

Digital Library

[39]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18).

Digital Library

[40]

Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In International Conference on Learning Representations (ICLR '20).

[41]

J. Zhong and B. He. 2014. Medusa: Simplified Graph Processing on GPUs. IEEE Transactions on Parallel and Distributed Systems (2014).

Digital Library

[42]

Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).

Digital Library

[43]

Youwei Zhuo, Jingji Chen, Qinyi Luo, Yanzhi Wang, Hailong Yang, Depei Qian, and Xuehai Qian. 2020. SympleGraph: Distributed Graph Processing with Precise Loop-Carried Dependency Guarantee. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020).

Digital Library

[44]

Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, and Quanquan Gu. 2019. Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks. In Advances in Neural Information Processing Systems (Nuerips '19).

Cited By

Ma YGong PWu TYi JYang CLi CPeng QXie GBao YLiu HXu Y(2024)Eliminating Data Processing Bottlenecks in GNN Training over Large Graphs via Two-level Feature CompressionProceedings of the VLDB Endowment10.14778/3681954.368196817:11(2854-2866)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681968
Ai XWang QCao CZhang YChen CYuan HGu YYu G(2024)NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous EnvironmentsProceedings of the VLDB Endowment10.14778/3659437.365945317:8(1995-2008)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659453
Deng YYan MTang B(2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659443
Show More Cited By

Recommendations

Scalable Graph Sampling on GPUs with Compressed Graph
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

GPU is a powerful accelerator for parallel computation. Graph sampling is a fundamental technology for large-scale graph analysis and learning. To accelerate graph sampling using GPUs, recently some solutions like NextDoor, C-SAW have been proposed. ...
gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning
SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles

Graph sampling prepares training samples for graph learning and can dominate the training time. Due to the increasing algorithm diversity and complexity, existing sampling frameworks are insufficient in the generality of expression and the efficiency ...
Unbiased Sampling of Bipartite Graph
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

Increasing size of online social networks (OSNs) has given rise to sampling method studies that provide a relatively small but representative sample of large-scale OSNs so that the measurement and analysis burden can be affordable. So far, a number of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems

April 2021

631 pages

ISBN:9781450383349

DOI:10.1145/3447786

General Chairs:
Antonio Barbalace
The University of Edinburgh
,
Pramod Bhatotia
Technical University of Munich
,
Program Chairs:
Lorenzo Alvisi
Cornell University
,
Cristian Cadar
Imperial College London

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Qualifiers

Research-article

Funding Sources

Amazon Web Services
Facebook Systems for Machine Learning
National Science Foundation

Conference

EuroSys '21

Sponsor:

SIGOPS

EuroSys '21: Sixteenth European Conference on Computer Systems

April 26 - 28, 2021

Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
1,694
Total Downloads

Downloads (Last 12 months)440
Downloads (Last 6 weeks)50

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ma YGong PWu TYi JYang CLi CPeng QXie GBao YLiu HXu Y(2024)Eliminating Data Processing Bottlenecks in GNN Training over Large Graphs via Two-level Feature CompressionProceedings of the VLDB Endowment10.14778/3681954.368196817:11(2854-2866)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681968
Ai XWang QCao CZhang YChen CYuan HGu YYu G(2024)NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous EnvironmentsProceedings of the VLDB Endowment10.14778/3659437.365945317:8(1995-2008)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659453
Deng YYan MTang B(2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659443
Mei JSun SLi CXu CChen CLiu YWang JZhao CHou XGuo MHe BCong X(2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.14778/3659437.3659438
Huang KJiang HWang MXiao GWipf DSong XGan QHuang ZZhai JZhang Z(2024)FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network TrainingProceedings of the VLDB Endowment10.14778/3648160.364818417:6(1473-1486)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648184
Mahmud SShen HIyer A(2024)PACER: Accelerating Distributed GNN Training Using Communication-Efficient Partition Refinement and CachingProceedings of the ACM on Networking10.1145/36978052:CoNEXT4(1-18)Online publication date: 25-Nov-2024
https://dl.acm.org/doi/10.1145/3697805
Song YChen PLu YAbrar NKalavri V(2024)In situ neighborhood sampling for large-scale GNN trainingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663443(1-5)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663443
Shao YLi HGu XYin HLi YMiao XZhang WCui BChen L(2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1145/3648358
Ye CLi YSun SGuo W(2024)gSWORD: GPU-accelerated Sampling for Subgraph CountingProceedings of the ACM on Management of Data10.1145/36392882:1(1-26)Online publication date: 26-Mar-2024
https://doi.org/10.1145/3639288
Gupta VChen XHuang RMeng FChen JYan YSerra ESpezzano F(2024)GraphScale: A Framework to Enable Machine Learning over Billion-node GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680021(4514-4521)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680021
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents