[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3466752.3480126acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Open access

JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware Accelerator

Published: 17 October 2021 Publication History

Abstract

Graph Processing is at the core of many critical emerging workloads operating on unstructured data, including social network analysis, bioinformatics, and many others. Many applications operate on graphs that are constantly changing, i.e., new nodes and edges are added or removed over time. In this paper, we present JetStream, a hardware accelerator for evaluating queries over streaming graphs and capable of handling additions, deletions, and updates of edges. JetStream extends a recently proposed event-based accelerator for graph workloads to support streaming updates. It handles both accumulative and monotonic graph algorithms via an event-driven computation model that limits accesses to a smaller subset of the graph vertices, efficiently reuses the prior query results to eliminate redundancy, and optimizes the memory access pattern for enhanced memory bandwidth utilization. To the best of our knowledge, JetStream is the first graph accelerator that supports streaming graphs, reducing the computation time by 90% compared with cold-start computation using an existing accelerator. In addition, JetStream achieves about 18 × speedup over KickStarter and GraphBolt software frameworks at the large baseline batch sizes that these systems use with significantly higher speedup at smaller batch sizes.

References

[1]
Maleen Abeydeera and Daniel Sanchez. 2020. Chronos: Efficient Speculative Parallelism for Accelerators. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[2]
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. SIGARCH Computer Architecture News 43, 3 (June 2015).
[3]
A. Ayupov, S. Yesil, M. M. Ozdal, T. Kim, S. Burns, and O. Ozturk. 2018. A Template-Based Design Methodology for Graph-Parallel Hardware Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 2 (Feb 2018), 420–430. https://doi.org/10.1109/TCAD.2017.2706562
[4]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proc. International Conference on Knowledge Discovery and Data Mining (SIGKDD). 44–54.
[5]
Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Trans. Archit. Code Optim. 14, 2 (June 2017).
[6]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW). 595–601.
[7]
Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Trans. on Parallel Computing (TOPC) 5, 3 (2019).
[8]
Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proc. 7th ACM european conference on Computer Systems. 85–98.
[9]
Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. 2017. ForeGraph: Exploring Large-Scale Graph Processing on Multi-FPGA Architecture. In Proc. International Symposium on Field-Programmable Gate Arrays (FPGA). 217–226.
[10]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Mathematical Software 38, 1, Article 1 (Dec. 2011), 25 pages. https://doi.org/10.1145/2049662.2049663
[11]
Laxman Dhulipala, Guy E Blelloch, and Julian Shun. 2019. Low-latency graph streaming using compressed purely-functional trees. In Proc. 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 918–934.
[12]
David Ediger, Rob McColl, Jason Riedy, and David A Bader. 2012. Stinger: High performance data structure for streaming graphs. In Proc. IEEE Conference on High Performance Extreme Computing. 1–5.
[13]
T. J. Ham, L. Wu, N. Sundaram, N. Satish, and M. Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–13. https://doi.org/10.1109/MICRO.2016.7783759
[14]
Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proc. European Conference on Computer Systems (Eurosys).
[15]
Jiewen Huang and Daniel J. Abadi. 2016. Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs. Proc. VLDB Endowment 9, 7 (March 2016).
[16]
Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. 2016. Time-evolving graph processing at scale. In Proc. Fourth International Workshop on Graph Data Management Experiences and Systems. 1–6.
[17]
Mark C. Jeffrey, Suvinay Subramanian, Maleen Abeydeera, Joel Emer, and Daniel Sanchez. 2016. Data-Centric Execution of Speculative Parallel Programs. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (Taipei, Taiwan) (MICRO-49). IEEE Press, Article 5, 13 pages.
[18]
M. C. Jeffrey, S. Subramanian, C. Yan, J. Emer, and D. Sanchez. 2015. A scalable architecture for ordered parallelism. In Proc. 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 228–241. https://doi.org/10.1145/2830772.2830777
[19]
Xiaolin Jiang, Chengshuo Xu, Xizhe Yin, Zhijia Zhao, and Rajiv Gupta. 2021. Tripoline: generalized incremental graph processing via graph triangle inequality. In Proc. Sixteenth European Conference on Computer Systems. 17–32.
[20]
Xiaoen Ju, Dan Williams, Hani Jamjoom, and Kang G. Shin. 2016. Version Traveler: Fast and Memory-Efficient Version Switching in Graph Processing Systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (Denver, CO, USA) (USENIX ATC ’16). USENIX Association, USA, 523–536.
[21]
Pradeep Kumar and H. Howie Huang. 2019. GraphOne: A Data Store for Real-time Analytics on Evolving Graphs. In Proc. USENIX Conference on File and Storage Technologies (FAST).
[22]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proc. 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). ACM, New York, NY, USA, 591–600. https://doi.org/10.1145/1772690.1772751
[23]
Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, and Joseph Hellerstein. 2014. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1408.2041(2014).
[24]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD International Conference on Management of data. 135–146.
[25]
Mugilan Mariappan, Joanna Che, and Keval Vora. 2021. DZiG: Sparsity-Aware Incremental Processing of Streaming Graphs. In Proc. Sixteenth European Conference on Computer Systems. 83–98.
[26]
Mugilan Mariappan and Keval Vora. 2019. Graphbolt: Dependency-driven synchronous processing of streaming graphs. In Proc. Fourteenth European Conference on Computer Systems. 1–16.
[27]
Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling. In Proc. International Symposium on Microarchitecture (MICRO).
[28]
Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2019. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates. In Proc. International Symposium on Microarchitecture (Micro). 1009–1022.
[29]
Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proc. Twenty-Fourth ACM Symposium on Operating Systems Principles. 439–455.
[30]
L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In IEEE International Symposium on High Performance Computer Architecture (HPCA). 457–468. https://doi.org/10.1109/HPCA.2017.54
[31]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proc. ACM Symposium on Operating Systems Principles (SOSP). 456–471.
[32]
M. M. Ozdal, S. Yesil, T. Kim, A. Ayupov, J. Greth, S. Burns, and O. Ozturk. 2016. Energy Efficient Architecture for Graph Analytics Accelerators. In Proc. ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 166–177. https://doi.org/10.1109/ISCA.2016.24
[33]
S. Rahman, N. Abu-Ghazaleh, and R. Gupta. 2020. GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing. In Proc. 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 908–921. https://doi.org/10.1109/MICRO50266.2020.00078
[34]
Shafiur Rahman, Nael Abu-Ghazaleh, and Walid Najjar. 2017. PDES-A: A parallel discrete event simulation accelerator for FPGAs. In Proc. ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 133–144.
[35]
A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. The Structural Simulation Toolkit. SIGMETRICS Perform. Eval. Rev. 38, 4 (March 2011), 37–42.
[36]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters 10, 1 (2011), 16–19. https://doi.org/10.1109/L-CA.2011.4
[37]
Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L. Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. 2016. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Proc. European Conference on Parallel Processing (EuroPar). 319–333.
[38]
Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. 2016. Tornado: A system for real-time iterative analysis over evolving data. In Proc. International Conference on Management of Data. 417–430.
[39]
Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proc. SIGPLAN Symposium on Principles and Practice of Parallel Programming. 135–146.
[40]
G. M. Slota, K. Madduri, and S. Rajamanickam. 2014. PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks. In IEEE International Conference on Big Data (Big Data). 481–490. https://doi.org/10.1109/BigData.2014.7004265
[41]
Amanda L Traud, Peter J Mucha, and Mason A Porter. 2012. Social structure of Facebook networks. Phys. A 391, 16 (Aug 2012).
[42]
Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.
[43]
Luis M. Vaquero, Felix Cuadrado, Dionysios Logothetis, and Claudio Martella. 2014. Adaptive Partitioning for Large-Scale Dynamic Graphs. In Proc. International Conference on Distributed Computing Systems. 144–153.
[44]
Keval Vora, Rajiv Gupta, and Guoqing Xu. 2016. Synergistic analysis of evolving graphs. ACM Transactions on Architecture and Code Optimization (TACO) 13, 4(2016), 1–27.
[45]
Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. Kickstarter: Fast and accurate computations on streaming graphs via trimmed approximations. In Proc. 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 237–251.
[46]
Qinggang Wang, Long Zheng, Yu Huang, Pengcheng Yao, Chuangyi Gui, Xiaofei Liao, Hai Jin, Wenbin Jiang, and Fubing Mao. 2021. GraSU: A Fast Graph Update Library for FPGA-Based Dynamic Graph Processing. In Proc. International Symposium on Field-Programmable Gate Arrays. 149–159.
[47]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proc. ACM Symposium on Principles and Practice of Parallel Programming (PPoPP).
[48]
G. Zhang, W. Horn, and D. Sanchez. 2015. Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems. In Proc. 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 13–25. https://doi.org/10.1145/2830772.2830774
[49]
Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In IEEE International Symposium on High Performance Computer Architecture (HPCA). 544–557. https://doi.org/10.1109/HPCA.2018.00053
[50]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2017. Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation. CoRR abs/1710.05785(2017). arxiv:1710.05785
[51]
S. Zhou, C. Chelmis, and V. K. Prasanna. 2016. High-Throughput and Energy-Efficient Graph Processing on FPGA. In Proc. IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 103–110. https://doi.org/10.1109/FCCM.2016.35
[52]
Shijie Zhou, Rajgopal Kannan, Hanqing Zeng, and Viktor K. Prasanna. 2018. An FPGA Framework for Edge-centric Graph Processing. In Proc. 15th ACM International Conference on Computing Frontiers (Ischia, Italy). 69–77.
[53]
Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-Based Graph Processing. In Proc. International Symposium on Microarchitecture (Micro). 712–725.

Cited By

View all
  • (2024)IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight UpdatesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698524(915-932)Online publication date: 20-Nov-2024
  • (2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
  • (2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Check for updates

Author Tags

  1. accelerators
  2. incremental algorithms
  3. streaming graphs

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MICRO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)405
  • Downloads (Last 6 weeks)53
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight UpdatesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698524(915-932)Online publication date: 20-Nov-2024
  • (2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
  • (2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
  • (2024)In-Storage Domain-Specific Acceleration for Serverless ComputingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640413(530-548)Online publication date: 27-Apr-2024
  • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
  • (2024)ActiveN: A Scalable and Flexibly-Programmable Event-Driven Neuromorphic Processor2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00085(1122-1137)Online publication date: 2-Nov-2024
  • (2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
  • (2024)Differential-Matching Prefetcher for Indirect Memory Access2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00040(439-453)Online publication date: 2-Mar-2024
  • (2023)An efficient hardware accelerator for monotonic graph algorithms on dynamic directed graphsSCIENTIA SINICA Informationis10.1360/SSI-2022-019153:8(1575)Online publication date: 15-Aug-2023
  • (2023)RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural NetworkACM Transactions on Architecture and Code Optimization10.1145/361768520:4(1-26)Online publication date: 14-Dec-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media