[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3447786.3456244acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Public Access

Accelerating graph sampling for graph machine learning using GPUs

Published: 21 April 2021 Publication History

Abstract

Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and Graph-SAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling.
Sampling is an "embarrassingly parallel" problem and may appear to lend itself to GPU acceleration, but the irregularity of graphs makes it hard to use GPU resources effectively. This paper presents NextDoor, a system designed to effectively perform graph sampling on GPUs. NextDoor employs a new approach to graph sampling that we call transit-parallelism, which allows load balancing and caching of edges. NextDoor provides end-users with a high-level abstraction for writing a variety of graph sampling algorithms. We implement several graph sampling applications, and show that NextDoor runs them orders of magnitude faster than existing systems.

References

[1]
Accessed in Feb 2021. NVIDIA CUB. https://nvlabs.github.io/cub/
[2]
Hongzhi Chen, Xiaoxi Wang, Chenghuan Huang, Juncheng Fang, Yifan Hou, Changji Li, and James Cheng. 2019. Large Scale Graph Mining with G-Miner. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19).
[3]
Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations (ICLR'18).
[4]
Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. ACM Trans. Parallel Comput. (2019).
[5]
Xuhao Chen, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2020. Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU. Proc. VLDB Endow. (2020).
[6]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19).
[7]
Weilin Cong, Rana Forsati, Mahmut Kandemir, and Mehrdad Mahdavi. 2020. Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20).
[8]
Vinicius Dias, Carlos H. C. Teixeira, Dorgival Guedes, Wagner Meira, and Srinivasan Parthasarathy. 2019. Fractal: A General-Purpose Graph Pattern Mining System. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19).
[9]
Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. Map-Graph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs. In Proceedings of Workshop on GRAph Data Management Experiences and Systems (GRADES'14).
[10]
Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-Scale Learnable Graph Convolutional Networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD' 18).
[11]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12).
[12]
Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16).
[13]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17).
[14]
Taher H. Haveliwala. 2002. Topic-Sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web (WWW '02).
[15]
Anand Padmanabha Iyer, Zaoxing Liu, Xin Jin, Shivaram Venkataraman, Vladimir Braverman, and Ion Stoica. 2018. ASAP: Fast, Approximate Graph Pattern Mining at Scale. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'18).
[16]
Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems (2020).
[17]
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-Centric Graph Processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC '14).
[18]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[19]
Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and Processing of Graph Algorithms on GPUs. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '19).
[20]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph Hellerstein. 2010. GraphLab: A New Framework for Parallel Machine Learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI'10).
[21]
Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: parallel deep neural network computation on large graphs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19).
[22]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10).
[23]
Daniel Mawhirter and Bo Wu. 2019. AutoMine: Harmonizing High-Level Abstraction and High Performance for Graph Mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19).
[24]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[25]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13).
[26]
Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18).
[27]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
[28]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14).
[29]
Abdul Quamar, Amol Deshpande, and Jimmy Lin. 2016. NScale: neighborhood-centric large-scale graph analytics in the cloud. The VLDB Journal (2016).
[30]
Bruno Ribeiro and Don Towsley. 2010. Estimating and Sampling Graphs with Multidimensional Random Walks. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC '10).
[31]
Amir Hossein Nodehi Sabet, Zhijia Zhao, and Rajiv Gupta. 2020. Subway: Minimizing Data Transfer during out-of-GPU-Memory Graph Processing. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys '20).
[32]
Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph processing on GPUs: A survey. ACM Computing Surveys (CSUR) (2018).
[33]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. SIGPLAN Not. (2013).
[34]
Elias Stehle and Hans-Arno Jacobsen. 2017. A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17).
[35]
Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: A System for Distributed Graph Mining. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15).
[36]
Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, and Guoqing Harry Xu. 2018. RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on a Single Machine. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'18).
[37]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A High-Performance Graph Processing Library on the GPU. SIGPLAN Not. (2016).
[38]
Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang. 2019. KnightKing: A Fast Distributed Graph Random Walk Engine. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19).
[39]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18).
[40]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning Method. In International Conference on Learning Representations (ICLR '20).
[41]
J. Zhong and B. He. 2014. Medusa: Simplified Graph Processing on GPUs. IEEE Transactions on Parallel and Distributed Systems (2014).
[42]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).
[43]
Youwei Zhuo, Jingji Chen, Qinyi Luo, Yanzhi Wang, Hailong Yang, Depei Qian, and Xuehai Qian. 2020. SympleGraph: Distributed Graph Processing with Precise Loop-Carried Dependency Guarantee. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020).
[44]
Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, and Quanquan Gu. 2019. Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks. In Advances in Neural Information Processing Systems (Nuerips '19).

Cited By

View all
  • (2024)Eliminating Data Processing Bottlenecks in GNN Training over Large Graphs via Two-level Feature CompressionProceedings of the VLDB Endowment10.14778/3681954.368196817:11(2854-2866)Online publication date: 1-Jul-2024
  • (2024)NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous EnvironmentsProceedings of the VLDB Endowment10.14778/3659437.365945317:8(1995-2008)Online publication date: 31-May-2024
  • (2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems
April 2021
631 pages
ISBN:9781450383349
DOI:10.1145/3447786
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Qualifiers

  • Research-article

Funding Sources

Conference

EuroSys '21
Sponsor:
EuroSys '21: Sixteenth European Conference on Computer Systems
April 26 - 28, 2021
Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)440
  • Downloads (Last 6 weeks)50
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Eliminating Data Processing Bottlenecks in GNN Training over Large Graphs via Two-level Feature CompressionProceedings of the VLDB Endowment10.14778/3681954.368196817:11(2854-2866)Online publication date: 1-Jul-2024
  • (2024)NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous EnvironmentsProceedings of the VLDB Endowment10.14778/3659437.365945317:8(1995-2008)Online publication date: 31-May-2024
  • (2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
  • (2024)FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk FrameworkProceedings of the VLDB Endowment10.14778/3659437.365943817:8(1788-1801)Online publication date: 31-May-2024
  • (2024)FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network TrainingProceedings of the VLDB Endowment10.14778/3648160.364818417:6(1473-1486)Online publication date: 3-May-2024
  • (2024)PACER: Accelerating Distributed GNN Training Using Communication-Efficient Partition Refinement and CachingProceedings of the ACM on Networking10.1145/36978052:CoNEXT4(1-18)Online publication date: 25-Nov-2024
  • (2024)In situ neighborhood sampling for large-scale GNN trainingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663443(1-5)Online publication date: 10-Jun-2024
  • (2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
  • (2024)gSWORD: GPU-accelerated Sampling for Subgraph CountingProceedings of the ACM on Management of Data10.1145/36392882:1(1-26)Online publication date: 26-Mar-2024
  • (2024)GraphScale: A Framework to Enable Machine Learning over Billion-node GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680021(4514-4521)Online publication date: 21-Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media