More Web Proxy on the site http://driver.im/

research-article

Open access

LargeGraph: An Efficient Dependency-Aware GPU-Accelerated Large-Scale Graph Processing

Authors:

Bingsheng HeAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 4

Article No.: 58, Pages 1 - 24

https://doi.org/10.1145/3477603

Published: 29 September 2021 Publication History

All formats PDF

Abstract

Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system LargeGraph. Different from existing out-of-GPU-memory systems, LargeGraph proposes a dependency-aware data-driven execution approach, which can significantly accelerate active vertices’ state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states’ propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19–11.62×), Graphie (3.02–9.41×), Garaph (2.75–8.36×), and Subway (2.45–4.15×).

References

[1]

Lemur. 2020. ClueWeb12 Web Graph. Retrieved August 17, 2021 fromhttp://www.lemurproject.org/clueweb12/webgraph.php/.

[2]

Web Data Commons. 2020. Hyperlink Graphs. Retrieved August 17, 2021 fromhttp://webdatacommons.org/hyperlinkgraph/.

[3]

Laboratory for Web Algorithmics. 2020. Datasets. Retrieved August 17, 2021 fromhttp://law.di.unimi.it/datasets.php.

[4]

Stanford. 2020. Stanford Large Network Dataset Collection.http://snap.stanford.edu/data/index.html.

[5]

Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An asynchronous multi-GPU programming model for irregular computations. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 235–248.

Digital Library

[6]

Hanhua Chen, Hai Jin, and Xiaolong Cui. 2017. Hybrid followee recommendation in microblogging systems. Science China Information Sciences 60, 1 (2017), 1–14.

[7]

Yu Chen, Ivy B. Peng, Zhen Peng, Xu Liu, and Bin Ren. 2020. ATMem: Adaptive data placement in graph applications on heterogeneous memories. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. 293–304.

Digital Library

[8]

Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. 2016. NXgraph: An efficient graph processing system on a single machine. In Proceedings of the 2016 IEEE International Conference on Data Engineering. 409–420.

[9]

Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 752–768.

Digital Library

[10]

R. Dathathri, G. Gill, L. Hoang, V. Jatala, K. Pingali, V. K. Nandivada, H. Dang, and M. Snir. 2019. Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics. In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques. 15–28.

[11]

Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2020. Adaptive page migration for irregular data-intensive applications under GPU memory oversubscription. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium. 451–461.

[12]

Prasun Gera, Hyojong Kim, Piyush Sao, Hyesoon Kim, and David A. Bader. 2020. Traversing large graphs on GPUs with unified memory. Proceedings of the VLDB Endowment 13, 7 (2020), 1119–1133.

Digital Library

[13]

Abdullah Gharaibeh, Lauro Beltro Costa, Elizeu Santos-Neto, and Matei Ripeanu. 2012. A yoke of oxen and a thousand chickens for heavy lifting graph processing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. 345–354.

Digital Library

[14]

Abdullah Gharaibeh, Tahsin Reza, Elizeu Santosneto, Lauro Beltrao Costa, Scott Sallinen, and Matei Ripeanu. 2014. Efficient large-scale graph processing on hybrid CPU and GPU systems. arXiv:1312.3018.

[15]

Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17–30.

Digital Library

[16]

Wei Han, Daniel Mawhirter, Bo Wu, and Matthew Buland. 2017. Graphie: Large-scale asynchronous graph traversals on just a GPU. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 233–245.

[17]

Changwan Hong, Aravind Sukumaranrajam, Jinsung Kim, and P. Sadayappan. 2017. MultiGraph: Efficient graph processing on GPUs. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27–40.

[18]

Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming irregular graphs for GPU-friendly graph processing. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 622–636.

Digital Library

[19]

Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. 2017. A distributed multi-GPU system for fast graph processing. Proceedings of the VLDB Endowment 11, 3 (2017), 297–310.

Digital Library

[20]

Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239–252.

[21]

Hyojong Kim, Jaewoong Sim, Prasun Gera, Ramyad Hadidi, and Hyesoon Kim. 2020. Batch-aware unified memory management in GPUs for irregular workloads. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. 1357–1370.

Digital Library

[22]

Min Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim. 2016. GTS: A fast and scalable graph processing method based on streaming topology to GPUs. In Proceedings of the 2016 International Conference on Management of Data. 447–461.

Digital Library

[23]

Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31–46.

[24]

Xiaofei Liao, Yicheng Chen, Yu Zhang, Hai Jin, Haikun Liu, and Jin Zhao. 2019. An efficient incremental strongly connected components algorithm for evolving directed graphs. Scientia Sinica Informationis 49, 8 (2019), 988–1004.

[25]

Xiaofei Liao, Jin Zhao, Yu Zhang, Bingsheng He, Ligang He, Hai Jin, and Lin Gu. 2021. A structure-aware storage optimization for out-of-core concurrent graph processing. IEEE Transactions on Computers. Early access, July 26, 2021.

[26]

Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and processing of graph algorithms on GPUs. In Proceedings of the 2019 USENIX Annual Technical Conference. 411–428.

[27]

Wei Liu, Haikun Liu, Xiaofei Liao, Hai Jin, and Yu Zhang. 2019. NGraph: Parallel graph processing in hybrid memory systems. IEEE Access 7 (2019), 103517–103529.

[28]

Xinqiao Lv, Wei Xiao, Yu Zhang, Xiaofei Liao, Hai Jin, and Qiangsheng Hua. 2019. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science 13, 3 (2019), 539–551.

Digital Library

[29]

Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In Proceedings of the 2017 USENIX Annual Technical Conference. 195–207.

[30]

Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU graph analytics. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479–490.

[31]

Amir Hossein Nodehi Sabet, Zhijia Zhao, and Rajiv Gupta. 2020. Subway: Minimizing data transfer during out-of-GPU-memory graph processing. In Proceedings of the 15th European Conference on Computer Systems. Article 12, 16 pages.

[32]

Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, and Karsten Schwan. 2015. GraphReduce: Processing large-scale graphs on accelerator-based systems. In Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage, and Analysis. Article 28, 12 pages.

Digital Library

[33]

Zhiyuan Shao, Lin Hou, Yan Ai, Yu Zhang, and Hai Jin. 2015. Is your graph algorithm eligible for nondeterministic execution? In Proceedings of the 44th International Conference on Parallel Processing. 430–439.

Digital Library

[34]

Beibei Si, Yuxuan Liang, Jin Zhao, Yu Zhang, Xiaofei Liao, Hai Jin, Haikun Liu, and Lin Gu. 2020. GGraph: An efficient structure-aware approach for iterative graph processing. IEEE Transactions on Big Data. Early access, August 26, 2020.

[35]

Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment 5, 9 (2012), 788–799.

Digital Library

[36]

Ha-Nguyen Tran, Jung Jae Kim, and Bingsheng He. 2015. Fast subgraph matching on large graphs using graphics processors. In Proceedings of the 20th International Conference on Database Systems for Advanced Applications. 299–315.

[37]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Article 11, 12 pages.

[38]

Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2014. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems 25, 8 (2014), 2091–2100.

[39]

Yu Zhang, Lin Gu, Xiaofei Liao, Hai Jin, Deze Zeng, and Bing Bing Zhou. 2017. FRANK: A fast node ranking approach in large-scale networks. IEEE Network 31, 1 (2017), 36–43.

Digital Library

[40]

Yu Zhang, Xiaofei Liao, Lin Gu, Hai Jin, Kan Hu, Haikun Liu, and Bingsheng He. 2020. AsynGraph: Maximizing data parallelism for efficient iterative graph processing on GPUs. ACM Transactions on Architecture and Code Optimization 17, 4 (2020), Article 29, 21 pages.

Digital Library

[41]

Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Ligang He, Bingsheng He, and Haikun Liu. 2018. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proceedings of the 2018 USENIX Annual Technical Conference. 441–452.

[42]

Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Guang Tan, and Bing Bing Zhou. 2017. HotGraph: Efficient asynchronous processing for real-world graphs. IEEE Transactions on Computers 66, 5 (2017), 799–809.

Digital Library

[43]

Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, and Bing Bing Zhou. 2018. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2018), 895–907.

[44]

Yu Zhang, Xiaofei Liao, Hai Jin, Bingsheng He, Haikun Liu, and Lin Gu. 2019. DiGraph: An efficient path-based iterative directed graph processing system on multiple GPUs. In Proceedings of the 2019 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 601–614.

Digital Library

[45]

Yu Zhang, Xiaofei Liao, Hai Jin, Ligang He, Bingsheng He, Haikun Liu, and Lin Gu. 2021. DepGraph: A dependency-driven accelerator for efficient iterative graph processing. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture. 371–384.

[46]

Yu Zhang, Xiaofei Liao, Hai Jin, Li Lin, and Feng Lu. 2014. An adaptive switching scheme for iterative computing in the cloud. Frontiers of Computer Science 8, 6 (2014), 872–884.

Digital Library

[47]

Yu Zhang, Xiaofei Liao, Hai Jin, and Geyong Min. 2015. Resisting skew-accumulation for time-stepped applications in the cloud via exploiting parallelism. IEEE Transactions on Cloud Computing 3, 1 (2015), 54–65.

[48]

Yu Zhang, Xiaofei Liao, Hai Jin, and Guang Tan. 2017. SAE: Toward efficient cloud data analysis service for large-scale social networks. IEEE Transactions on Cloud Computing 5, 3 (2017), 563–575.

[49]

Yu Zhang, Xiaofei Liao, Hai Jin, Guang Tan, and Geyong Min. 2015. Inc-Part: Incremental partitioning for load balancing in large-scale behavioral simulations. IEEE Transactions on Parallel and Distributed Systems 26, 7 (2015), 1900–1909.

Digital Library

[50]

Yu Zhang, Xiaofei Liao, Hai Jin, and Bing Bing Zhou. 2014. AsyIter: Tolerating computational skew of synchronous iterative applications via computing decomposition. Knowledge and Information Systems 41, 2 (2014), 379–400.

Digital Library

[51]

Yu Zhang, Xiaofei Liao, Xiang Shi, Hai Jin, and Bingsheng He. 2018. Efficient disk-based directed graph processing: A strongly connected component approach. IEEE Transactions on Parallel and Distributed Systems 29, 4 (2018), 830–842.

[52]

Yu Zhang, Jin Zhao, Xiaofei Liao, Hai Jin, Lin Gu, Haikun Liu, Bingsheng He, and Ligang He. 2019. CGraph: A distributed storage and processing system for concurrent iterative graph analysis jobs. ACM Transactions on Storage 15, 2 (2019), Article 10, 26 pages.

Digital Library

[53]

Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, and Yicheng Chen. 2019. GraphM: An efficient storage system for high throughput of concurrent graph processing. In Proceedings of the 2019 International Conference for High Performance Computing, Networking, Storage, and Analysis. Article 3, 14 pages.

Digital Library

[54]

Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1543–1552.

Digital Library

[55]

Willy Zwaenepoel, Willy Zwaenepoel, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. 472–488.

Cited By

Cui PLiu HTang BYuan Y(2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648179
Tang YLi QYin LLi DZhang YWang CZhang XQiao LZhang ZLu K(2024)DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and SwappingACM Transactions on Architecture and Code Optimization10.1145/368933821:4(1-25)Online publication date: 20-Aug-2024
https://dl.acm.org/doi/10.1145/3689338
Tian YWang YWang YTong XLiu XQu W(2024)Device-Free Human Tracking and Gait Recognition Based on the Smart SpeakerIEEE Transactions on Mobile Computing10.1109/TMC.2024.337964723:11(10610-10627)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TMC.2024.3379647
Show More Cited By

Index Terms

LargeGraph: An Efficient Dependency-Aware GPU-Accelerated Large-Scale Graph Processing
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
    2. Parallel architectures
      1. Single instruction, multiple data

Recommendations

GraphPEG: Accelerating Graph Processing on GPUs

Due to massive thread-level parallelism, GPUs have become an attractive platform for accelerating large-scale data parallel computations, such as graph processing. However, achieving high performance for graph processing with GPUs is non-trivial. ...
AsynGraph: Maximizing Data Parallelism for Efficient Iterative Graph Processing on GPUs

Recently, iterative graph algorithms are proposed to be handled by GPU-accelerated systems. However, in iterative graph processing, the parallelism of GPU is still underutilized by existing GPU-based solutions. In fact, because of the power-law property ...
Graph Processing on GPUs: A Survey

In the big data era, much real-world data can be naturally represented as graphs. Consequently, many application domains can be modeled as graph processing. Graph processing, especially the processing of the large-scale graphs with the number of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 18, Issue 4

December 2021

497 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3476575

Editor:
David Kaeli
Northeastern University, USA

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2021

Accepted: 01 July 2021

Revised: 01 June 2021

Received: 01 January 2021

Published in TACO Volume 18, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China
Zhejiang Lab
Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
1,319
Total Downloads

Downloads (Last 12 months)293
Downloads (Last 6 weeks)40

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cui PLiu HTang BYuan Y(2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648179
Tang YLi QYin LLi DZhang YWang CZhang XQiao LZhang ZLu K(2024)DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and SwappingACM Transactions on Architecture and Code Optimization10.1145/368933821:4(1-25)Online publication date: 20-Aug-2024
https://dl.acm.org/doi/10.1145/3689338
Tian YWang YWang YTong XLiu XQu W(2024)Device-Free Human Tracking and Gait Recognition Based on the Smart SpeakerIEEE Transactions on Mobile Computing10.1109/TMC.2024.337964723:11(10610-10627)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TMC.2024.3379647
Zhang LJiang YMa YMao SHuang WYu ZZheng XShu LFan XXu GDong C(2024)Toward Robust and Effective Behavior Based User Authentication With Off-the-Shelf Wi-FiIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.342836719(8731-8746)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2024.3428367
Wang YLi BJaleel AYang JTang X(2024)GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00085(1080-1094)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00085
Yao MZhang LLai RHan MGuo LZhang JSun J(2024)ID-Gait: Fine-Grained Human Gait State Recognition Using Wi-Fi SignalWireless Artificial Intelligent Computing Systems and Applications10.1007/978-3-031-71464-1_6(66-78)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1007/978-3-031-71464-1_6
Zhang YCao HZhang JSun YDun MHuang JAn XYe X(2023)A Bucket-aware Asynchronous Single-Source Shortest Path Algorithm on GPUProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3605746(50-60)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605731.3605746
Guo ZYuan WGui LSheng BXiao F(2023)BreatheBand: A Fine-grained and Robust Respiration Monitor System Using WiFi SignalsACM Transactions on Sensor Networks10.1145/358207919:4(1-18)Online publication date: 16-May-2023
https://dl.acm.org/doi/10.1145/3582079
Li YLiu LQin HDeng SEl-Yacoubi MZhou G(2023)Adaptive Deep Feature Fusion for Continuous Authentication With Data AugmentationIEEE Transactions on Mobile Computing10.1109/TMC.2022.318661422:10(5690-5705)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TMC.2022.3186614
Corbalán-Navarro DAragón JAnglada MParcerisa JGonzález A(2022)Triangle Dropping: An Occluded-geometry Predictor for Energy-efficient Mobile GPUsACM Transactions on Architecture and Code Optimization10.1145/352786119:3(1-20)Online publication date: 25-May-2022
https://dl.acm.org/doi/10.1145/3527861
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents