[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

LargeGraph: An Efficient Dependency-Aware GPU-Accelerated Large-Scale Graph Processing

Published: 29 September 2021 Publication History

Abstract

Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system LargeGraph. Different from existing out-of-GPU-memory systems, LargeGraph proposes a dependency-aware data-driven execution approach, which can significantly accelerate active vertices’ state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states’ propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19–11.62×), Graphie (3.02–9.41×), Garaph (2.75–8.36×), and Subway (2.45–4.15×).

References

[1]
Lemur. 2020. ClueWeb12 Web Graph. Retrieved August 17, 2021 fromhttp://www.lemurproject.org/clueweb12/webgraph.php/.
[2]
Web Data Commons. 2020. Hyperlink Graphs. Retrieved August 17, 2021 fromhttp://webdatacommons.org/hyperlinkgraph/.
[3]
Laboratory for Web Algorithmics. 2020. Datasets. Retrieved August 17, 2021 fromhttp://law.di.unimi.it/datasets.php.
[4]
Stanford. 2020. Stanford Large Network Dataset Collection.http://snap.stanford.edu/data/index.html.
[5]
Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An asynchronous multi-GPU programming model for irregular computations. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 235–248.
[6]
Hanhua Chen, Hai Jin, and Xiaolong Cui. 2017. Hybrid followee recommendation in microblogging systems. Science China Information Sciences 60, 1 (2017), 1–14.
[7]
Yu Chen, Ivy B. Peng, Zhen Peng, Xu Liu, and Bin Ren. 2020. ATMem: Adaptive data placement in graph applications on heterogeneous memories. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. 293–304.
[8]
Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. 2016. NXgraph: An efficient graph processing system on a single machine. In Proceedings of the 2016 IEEE International Conference on Data Engineering. 409–420.
[9]
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 752–768.
[10]
R. Dathathri, G. Gill, L. Hoang, V. Jatala, K. Pingali, V. K. Nandivada, H. Dang, and M. Snir. 2019. Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics. In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques. 15–28.
[11]
Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2020. Adaptive page migration for irregular data-intensive applications under GPU memory oversubscription. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium. 451–461.
[12]
Prasun Gera, Hyojong Kim, Piyush Sao, Hyesoon Kim, and David A. Bader. 2020. Traversing large graphs on GPUs with unified memory. Proceedings of the VLDB Endowment 13, 7 (2020), 1119–1133.
[13]
Abdullah Gharaibeh, Lauro Beltro Costa, Elizeu Santos-Neto, and Matei Ripeanu. 2012. A yoke of oxen and a thousand chickens for heavy lifting graph processing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. 345–354.
[14]
Abdullah Gharaibeh, Tahsin Reza, Elizeu Santosneto, Lauro Beltrao Costa, Scott Sallinen, and Matei Ripeanu. 2014. Efficient large-scale graph processing on hybrid CPU and GPU systems. arXiv:1312.3018.
[15]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17–30.
[16]
Wei Han, Daniel Mawhirter, Bo Wu, and Matthew Buland. 2017. Graphie: Large-scale asynchronous graph traversals on just a GPU. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 233–245.
[17]
Changwan Hong, Aravind Sukumaranrajam, Jinsung Kim, and P. Sadayappan. 2017. MultiGraph: Efficient graph processing on GPUs. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 27–40.
[18]
Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming irregular graphs for GPU-friendly graph processing. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 622–636.
[19]
Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. 2017. A distributed multi-GPU system for fast graph processing. Proceedings of the VLDB Endowment 11, 3 (2017), 297–310.
[20]
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239–252.
[21]
Hyojong Kim, Jaewoong Sim, Prasun Gera, Ramyad Hadidi, and Hyesoon Kim. 2020. Batch-aware unified memory management in GPUs for irregular workloads. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. 1357–1370.
[22]
Min Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim. 2016. GTS: A fast and scalable graph processing method based on streaming topology to GPUs. In Proceedings of the 2016 International Conference on Management of Data. 447–461.
[23]
Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31–46.
[24]
Xiaofei Liao, Yicheng Chen, Yu Zhang, Hai Jin, Haikun Liu, and Jin Zhao. 2019. An efficient incremental strongly connected components algorithm for evolving directed graphs. Scientia Sinica Informationis 49, 8 (2019), 988–1004.
[25]
Xiaofei Liao, Jin Zhao, Yu Zhang, Bingsheng He, Ligang He, Hai Jin, and Lin Gu. 2021. A structure-aware storage optimization for out-of-core concurrent graph processing. IEEE Transactions on Computers. Early access, July 26, 2021.
[26]
Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and processing of graph algorithms on GPUs. In Proceedings of the 2019 USENIX Annual Technical Conference. 411–428.
[27]
Wei Liu, Haikun Liu, Xiaofei Liao, Hai Jin, and Yu Zhang. 2019. NGraph: Parallel graph processing in hybrid memory systems. IEEE Access 7 (2019), 103517–103529.
[28]
Xinqiao Lv, Wei Xiao, Yu Zhang, Xiaofei Liao, Hai Jin, and Qiangsheng Hua. 2019. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science 13, 3 (2019), 539–551.
[29]
Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In Proceedings of the 2017 USENIX Annual Technical Conference. 195–207.
[30]
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU graph analytics. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479–490.
[31]
Amir Hossein Nodehi Sabet, Zhijia Zhao, and Rajiv Gupta. 2020. Subway: Minimizing data transfer during out-of-GPU-memory graph processing. In Proceedings of the 15th European Conference on Computer Systems. Article 12, 16 pages.
[32]
Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, and Karsten Schwan. 2015. GraphReduce: Processing large-scale graphs on accelerator-based systems. In Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage, and Analysis. Article 28, 12 pages.
[33]
Zhiyuan Shao, Lin Hou, Yan Ai, Yu Zhang, and Hai Jin. 2015. Is your graph algorithm eligible for nondeterministic execution? In Proceedings of the 44th International Conference on Parallel Processing. 430–439.
[34]
Beibei Si, Yuxuan Liang, Jin Zhao, Yu Zhang, Xiaofei Liao, Hai Jin, Haikun Liu, and Lin Gu. 2020. GGraph: An efficient structure-aware approach for iterative graph processing. IEEE Transactions on Big Data. Early access, August 26, 2020.
[35]
Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment 5, 9 (2012), 788–799.
[36]
Ha-Nguyen Tran, Jung Jae Kim, and Bingsheng He. 2015. Fast subgraph matching on large graphs using graphics processors. In Proceedings of the 20th International Conference on Database Systems for Advanced Applications. 299–315.
[37]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Article 11, 12 pages.
[38]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2014. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems 25, 8 (2014), 2091–2100.
[39]
Yu Zhang, Lin Gu, Xiaofei Liao, Hai Jin, Deze Zeng, and Bing Bing Zhou. 2017. FRANK: A fast node ranking approach in large-scale networks. IEEE Network 31, 1 (2017), 36–43.
[40]
Yu Zhang, Xiaofei Liao, Lin Gu, Hai Jin, Kan Hu, Haikun Liu, and Bingsheng He. 2020. AsynGraph: Maximizing data parallelism for efficient iterative graph processing on GPUs. ACM Transactions on Architecture and Code Optimization 17, 4 (2020), Article 29, 21 pages.
[41]
Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Ligang He, Bingsheng He, and Haikun Liu. 2018. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proceedings of the 2018 USENIX Annual Technical Conference. 441–452.
[42]
Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Guang Tan, and Bing Bing Zhou. 2017. HotGraph: Efficient asynchronous processing for real-world graphs. IEEE Transactions on Computers 66, 5 (2017), 799–809.
[43]
Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, and Bing Bing Zhou. 2018. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2018), 895–907.
[44]
Yu Zhang, Xiaofei Liao, Hai Jin, Bingsheng He, Haikun Liu, and Lin Gu. 2019. DiGraph: An efficient path-based iterative directed graph processing system on multiple GPUs. In Proceedings of the 2019 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 601–614.
[45]
Yu Zhang, Xiaofei Liao, Hai Jin, Ligang He, Bingsheng He, Haikun Liu, and Lin Gu. 2021. DepGraph: A dependency-driven accelerator for efficient iterative graph processing. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture. 371–384.
[46]
Yu Zhang, Xiaofei Liao, Hai Jin, Li Lin, and Feng Lu. 2014. An adaptive switching scheme for iterative computing in the cloud. Frontiers of Computer Science 8, 6 (2014), 872–884.
[47]
Yu Zhang, Xiaofei Liao, Hai Jin, and Geyong Min. 2015. Resisting skew-accumulation for time-stepped applications in the cloud via exploiting parallelism. IEEE Transactions on Cloud Computing 3, 1 (2015), 54–65.
[48]
Yu Zhang, Xiaofei Liao, Hai Jin, and Guang Tan. 2017. SAE: Toward efficient cloud data analysis service for large-scale social networks. IEEE Transactions on Cloud Computing 5, 3 (2017), 563–575.
[49]
Yu Zhang, Xiaofei Liao, Hai Jin, Guang Tan, and Geyong Min. 2015. Inc-Part: Incremental partitioning for load balancing in large-scale behavioral simulations. IEEE Transactions on Parallel and Distributed Systems 26, 7 (2015), 1900–1909.
[50]
Yu Zhang, Xiaofei Liao, Hai Jin, and Bing Bing Zhou. 2014. AsyIter: Tolerating computational skew of synchronous iterative applications via computing decomposition. Knowledge and Information Systems 41, 2 (2014), 379–400.
[51]
Yu Zhang, Xiaofei Liao, Xiang Shi, Hai Jin, and Bingsheng He. 2018. Efficient disk-based directed graph processing: A strongly connected component approach. IEEE Transactions on Parallel and Distributed Systems 29, 4 (2018), 830–842.
[52]
Yu Zhang, Jin Zhao, Xiaofei Liao, Hai Jin, Lin Gu, Haikun Liu, Bingsheng He, and Ligang He. 2019. CGraph: A distributed storage and processing system for concurrent iterative graph analysis jobs. ACM Transactions on Storage 15, 2 (2019), Article 10, 26 pages.
[53]
Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, and Yicheng Chen. 2019. GraphM: An efficient storage system for high throughput of concurrent graph processing. In Proceedings of the 2019 International Conference for High Performance Computing, Networking, Storage, and Analysis. Article 3, 14 pages.
[54]
Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1543–1552.
[55]
Willy Zwaenepoel, Willy Zwaenepoel, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. 472–488.

Cited By

View all
  • (2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 3-May-2024
  • (2024)DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and SwappingACM Transactions on Architecture and Code Optimization10.1145/368933821:4(1-25)Online publication date: 20-Aug-2024
  • (2024)Device-Free Human Tracking and Gait Recognition Based on the Smart SpeakerIEEE Transactions on Mobile Computing10.1109/TMC.2024.337964723:11(10610-10627)Online publication date: 1-Nov-2024
  • Show More Cited By

Index Terms

  1. LargeGraph: An Efficient Dependency-Aware GPU-Accelerated Large-Scale Graph Processing

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 4
      December 2021
      497 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3476575
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 September 2021
      Accepted: 01 July 2021
      Revised: 01 June 2021
      Received: 01 January 2021
      Published in TACO Volume 18, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPU
      2. Graph processing
      3. access cost
      4. out-of-GPU-memory

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • National Natural Science Foundation of China
      • Zhejiang Lab
      • Fundamental Research Funds for the Central Universities

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)293
      • Downloads (Last 6 weeks)40
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 3-May-2024
      • (2024)DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and SwappingACM Transactions on Architecture and Code Optimization10.1145/368933821:4(1-25)Online publication date: 20-Aug-2024
      • (2024)Device-Free Human Tracking and Gait Recognition Based on the Smart SpeakerIEEE Transactions on Mobile Computing10.1109/TMC.2024.337964723:11(10610-10627)Online publication date: 1-Nov-2024
      • (2024)Toward Robust and Effective Behavior Based User Authentication With Off-the-Shelf Wi-FiIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.342836719(8731-8746)Online publication date: 1-Jan-2024
      • (2024)GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00085(1080-1094)Online publication date: 2-Mar-2024
      • (2024)ID-Gait: Fine-Grained Human Gait State Recognition Using Wi-Fi SignalWireless Artificial Intelligent Computing Systems and Applications10.1007/978-3-031-71464-1_6(66-78)Online publication date: 21-Jun-2024
      • (2023)A Bucket-aware Asynchronous Single-Source Shortest Path Algorithm on GPUProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3605746(50-60)Online publication date: 7-Aug-2023
      • (2023)BreatheBand: A Fine-grained and Robust Respiration Monitor System Using WiFi SignalsACM Transactions on Sensor Networks10.1145/358207919:4(1-18)Online publication date: 16-May-2023
      • (2023)Adaptive Deep Feature Fusion for Continuous Authentication With Data AugmentationIEEE Transactions on Mobile Computing10.1109/TMC.2022.318661422:10(5690-5705)Online publication date: 1-Oct-2023
      • (2022)Triangle Dropping: An Occluded-geometry Predictor for Energy-efficient Mobile GPUsACM Transactions on Architecture and Code Optimization10.1145/352786119:3(1-20)Online publication date: 25-May-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media