More Web Proxy on the site http://driver.im/

research-article

Open access

JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware Accelerator

Authors:

Shafiur Rahman,

Nael Abu-Ghazaleh,

Rajiv GuptaAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 1091 - 1105

https://doi.org/10.1145/3466752.3480126

Published: 17 October 2021 Publication History

All formats PDF

Abstract

Graph Processing is at the core of many critical emerging workloads operating on unstructured data, including social network analysis, bioinformatics, and many others. Many applications operate on graphs that are constantly changing, i.e., new nodes and edges are added or removed over time. In this paper, we present JetStream, a hardware accelerator for evaluating queries over streaming graphs and capable of handling additions, deletions, and updates of edges. JetStream extends a recently proposed event-based accelerator for graph workloads to support streaming updates. It handles both accumulative and monotonic graph algorithms via an event-driven computation model that limits accesses to a smaller subset of the graph vertices, efficiently reuses the prior query results to eliminate redundancy, and optimizes the memory access pattern for enhanced memory bandwidth utilization. To the best of our knowledge, JetStream is the first graph accelerator that supports streaming graphs, reducing the computation time by 90% compared with cold-start computation using an existing accelerator. In addition, JetStream achieves about 18 × speedup over KickStarter and GraphBolt software frameworks at the large baseline batch sizes that these systems use with significantly higher speedup at smaller batch sizes.

References

[1]

Maleen Abeydeera and Daniel Sanchez. 2020. Chronos: Efficient Speculative Parallelism for Accelerators. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[2]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. SIGARCH Computer Architecture News 43, 3 (June 2015).

[3]

A. Ayupov, S. Yesil, M. M. Ozdal, T. Kim, S. Burns, and O. Ozturk. 2018. A Template-Based Design Methodology for Graph-Parallel Hardware Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 2 (Feb 2018), 420–430. https://doi.org/10.1109/TCAD.2017.2706562

Digital Library

[4]

Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proc. International Conference on Knowledge Discovery and Data Mining (SIGKDD). 44–54.

Digital Library

[5]

Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Trans. Archit. Code Optim. 14, 2 (June 2017).

Digital Library

[6]

Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW). 595–601.

Digital Library

[7]

Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Trans. on Parallel Computing (TOPC) 5, 3 (2019).

[8]

Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proc. 7th ACM european conference on Computer Systems. 85–98.

Digital Library

[9]

Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. 2017. ForeGraph: Exploring Large-Scale Graph Processing on Multi-FPGA Architecture. In Proc. International Symposium on Field-Programmable Gate Arrays (FPGA). 217–226.

Digital Library

[10]

Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Mathematical Software 38, 1, Article 1 (Dec. 2011), 25 pages. https://doi.org/10.1145/2049662.2049663

Digital Library

[11]

Laxman Dhulipala, Guy E Blelloch, and Julian Shun. 2019. Low-latency graph streaming using compressed purely-functional trees. In Proc. 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 918–934.

Digital Library

[12]

David Ediger, Rob McColl, Jason Riedy, and David A Bader. 2012. Stinger: High performance data structure for streaming graphs. In Proc. IEEE Conference on High Performance Extreme Computing. 1–5.

[13]

T. J. Ham, L. Wu, N. Sundaram, N. Satish, and M. Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–13. https://doi.org/10.1109/MICRO.2016.7783759

[14]

Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proc. European Conference on Computer Systems (Eurosys).

Digital Library

[15]

Jiewen Huang and Daniel J. Abadi. 2016. Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs. Proc. VLDB Endowment 9, 7 (March 2016).

Digital Library

[16]

Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. 2016. Time-evolving graph processing at scale. In Proc. Fourth International Workshop on Graph Data Management Experiences and Systems. 1–6.

Digital Library

[17]

Mark C. Jeffrey, Suvinay Subramanian, Maleen Abeydeera, Joel Emer, and Daniel Sanchez. 2016. Data-Centric Execution of Speculative Parallel Programs. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (Taipei, Taiwan) (MICRO-49). IEEE Press, Article 5, 13 pages.

[18]

M. C. Jeffrey, S. Subramanian, C. Yan, J. Emer, and D. Sanchez. 2015. A scalable architecture for ordered parallelism. In Proc. 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 228–241. https://doi.org/10.1145/2830772.2830777

Digital Library

[19]

Xiaolin Jiang, Chengshuo Xu, Xizhe Yin, Zhijia Zhao, and Rajiv Gupta. 2021. Tripoline: generalized incremental graph processing via graph triangle inequality. In Proc. Sixteenth European Conference on Computer Systems. 17–32.

Digital Library

[20]

Xiaoen Ju, Dan Williams, Hani Jamjoom, and Kang G. Shin. 2016. Version Traveler: Fast and Memory-Efficient Version Switching in Graph Processing Systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (Denver, CO, USA) (USENIX ATC ’16). USENIX Association, USA, 523–536.

[21]

Pradeep Kumar and H. Howie Huang. 2019. GraphOne: A Data Store for Real-time Analytics on Evolving Graphs. In Proc. USENIX Conference on File and Storage Technologies (FAST).

[22]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proc. 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). ACM, New York, NY, USA, 591–600. https://doi.org/10.1145/1772690.1772751

Digital Library

[23]

Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, and Joseph Hellerstein. 2014. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1408.2041(2014).

[24]

Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD International Conference on Management of data. 135–146.

Digital Library

[25]

Mugilan Mariappan, Joanna Che, and Keval Vora. 2021. DZiG: Sparsity-Aware Incremental Processing of Streaming Graphs. In Proc. Sixteenth European Conference on Computer Systems. 83–98.

Digital Library

[26]

Mugilan Mariappan and Keval Vora. 2019. Graphbolt: Dependency-driven synchronous processing of streaming graphs. In Proc. Fourteenth European Conference on Computer Systems. 1–16.

Digital Library

[27]

Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling. In Proc. International Symposium on Microarchitecture (MICRO).

Digital Library

[28]

Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2019. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates. In Proc. International Symposium on Microarchitecture (Micro). 1009–1022.

Digital Library

[29]

Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proc. Twenty-Fourth ACM Symposium on Operating Systems Principles. 439–455.

Digital Library

[30]

L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In IEEE International Symposium on High Performance Computer Architecture (HPCA). 457–468. https://doi.org/10.1109/HPCA.2017.54

[31]

Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proc. ACM Symposium on Operating Systems Principles (SOSP). 456–471.

Digital Library

[32]

M. M. Ozdal, S. Yesil, T. Kim, A. Ayupov, J. Greth, S. Burns, and O. Ozturk. 2016. Energy Efficient Architecture for Graph Analytics Accelerators. In Proc. ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 166–177. https://doi.org/10.1109/ISCA.2016.24

Digital Library

[33]

S. Rahman, N. Abu-Ghazaleh, and R. Gupta. 2020. GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing. In Proc. 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 908–921. https://doi.org/10.1109/MICRO50266.2020.00078

[34]

Shafiur Rahman, Nael Abu-Ghazaleh, and Walid Najjar. 2017. PDES-A: A parallel discrete event simulation accelerator for FPGAs. In Proc. ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 133–144.

Digital Library

[35]

A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. The Structural Simulation Toolkit. SIGMETRICS Perform. Eval. Rev. 38, 4 (March 2011), 37–42.

Digital Library

[36]

P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters 10, 1 (2011), 16–19. https://doi.org/10.1109/L-CA.2011.4

Digital Library

[37]

Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L. Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. 2016. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Proc. European Conference on Parallel Processing (EuroPar). 319–333.

Digital Library

[38]

Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. 2016. Tornado: A system for real-time iterative analysis over evolving data. In Proc. International Conference on Management of Data. 417–430.

Digital Library

[39]

Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proc. SIGPLAN Symposium on Principles and Practice of Parallel Programming. 135–146.

Digital Library

[40]

G. M. Slota, K. Madduri, and S. Rajamanickam. 2014. PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks. In IEEE International Conference on Big Data (Big Data). 481–490. https://doi.org/10.1109/BigData.2014.7004265

[41]

Amanda L Traud, Peter J Mucha, and Mason A Porter. 2012. Social structure of Facebook networks. Phys. A 391, 16 (Aug 2012).

[42]

Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.

Digital Library

[43]

Luis M. Vaquero, Felix Cuadrado, Dionysios Logothetis, and Claudio Martella. 2014. Adaptive Partitioning for Large-Scale Dynamic Graphs. In Proc. International Conference on Distributed Computing Systems. 144–153.

[44]

Keval Vora, Rajiv Gupta, and Guoqing Xu. 2016. Synergistic analysis of evolving graphs. ACM Transactions on Architecture and Code Optimization (TACO) 13, 4(2016), 1–27.

Digital Library

[45]

Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. Kickstarter: Fast and accurate computations on streaming graphs via trimmed approximations. In Proc. 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 237–251.

Digital Library

[46]

Qinggang Wang, Long Zheng, Yu Huang, Pengcheng Yao, Chuangyi Gui, Xiaofei Liao, Hai Jin, Wenbin Jiang, and Fubing Mao. 2021. GraSU: A Fast Graph Update Library for FPGA-Based Dynamic Graph Processing. In Proc. International Symposium on Field-Programmable Gate Arrays. 149–159.

Digital Library

[47]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proc. ACM Symposium on Principles and Practice of Parallel Programming (PPoPP).

Digital Library

[48]

G. Zhang, W. Horn, and D. Sanchez. 2015. Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems. In Proc. 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 13–25. https://doi.org/10.1145/2830772.2830774

Digital Library

[49]

Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In IEEE International Symposium on High Performance Computer Architecture (HPCA). 544–557. https://doi.org/10.1109/HPCA.2018.00053

[50]

Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2017. Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation. CoRR abs/1710.05785(2017). arxiv:1710.05785

[51]

S. Zhou, C. Chelmis, and V. K. Prasanna. 2016. High-Throughput and Energy-Efficient Graph Processing on FPGA. In Proc. IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 103–110. https://doi.org/10.1109/FCCM.2016.35

[52]

Shijie Zhou, Rajgopal Kannan, Hanqing Zeng, and Viktor K. Prasanna. 2018. An FPGA Framework for Edge-centric Graph Processing. In Proc. 15th ACM International Conference on Computing Frontiers (Ischia, Italy). 69–77.

Digital Library

[53]

Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-Based Graph Processing. In Proc. International Symposium on Microarchitecture (Micro). 712–725.

Digital Library

Cited By

Yin XZhao ZGupta R(2024)IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight UpdatesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698524(915-932)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698524
Mao FLiu XZhang YLiu HLiao XJin HZhang WZhou JWu YNie LGuo YJiang ZLiu J(2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3689337
Li JKang Y(2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3661998
Show More Cited By

Recommendations

Cross-Accelerator Performance Profiling
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

The computing requirements of scientific applications have influenced processor design, and have motivated the introduction and use of many-core processors, i.e., accelerators, for high performance computing (HPC). Consequently, it is now common for the ...
Modeling and predicting performance of high performance computing applications on hardware accelerators

Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...
An OpenACC-based unified programming model for multi-accelerator systems
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSF (National Science Foundation)

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,465
Total Downloads

Downloads (Last 12 months)405
Downloads (Last 6 weeks)53

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yin XZhao ZGupta R(2024)IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight UpdatesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698524(915-932)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698524
Mao FLiu XZhang YLiu HLiao XJin HZhang WZhou JWu YNie LGuo YJiang ZLiu J(2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3689337
Li JKang Y(2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3661998
Mahapatra RGhodrati SAhn BKinzer SWang SXu HKarthikeyan LSharma HYazdanbakhsh AAlian MEsmaeilzadeh HTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)In-Storage Domain-Specific Acceleration for Serverless ComputingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640413(530-548)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640413
Zhang XLiu CNi JCheng YZhang LLi HLi X(2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
https://doi.org/10.1109/TCAD.2023.3335880
Liu XPu ZQu PZheng WZhang Y(2024)ActiveN: A Scalable and Flexibly-Programmable Event-Driven Neuromorphic Processor2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00085(1122-1137)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00085
Wang SXu HMamandipoor AMahapatra RAhn BGhodrati SKailas KAlian MEsmaeilzadeh H(2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00083
Fu GXia TLuo ZChen RZhao WRen P(2024)Differential-Matching Prefetcher for Indirect Memory Access2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00040(439-453)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00040
YANG YYU HZHAO JZHANG YLIAO XJIANG XJIN HLIU HMAO FZHANG JWANG B(2023)An efficient hardware accelerator for monotonic graph algorithms on dynamic directed graphsSCIENTIA SINICA Informationis10.1360/SSI-2022-019153:8(1575)Online publication date: 15-Aug-2023
https://doi.org/10.1360/SSI-2022-0191
Yu HZhang YZhao JLiao YHuang ZHe DGu LJin HLiao XLiu HHe BYue J(2023)RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural NetworkACM Transactions on Architecture and Code Optimization10.1145/361768520:4(1-26)Online publication date: 14-Dec-2023
https://dl.acm.org/doi/10.1145/3617685
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents