[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3302424.3303974acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs

Published: 25 March 2019 Publication History

Abstract

Efficient streaming graph processing systems leverage incremental processing by updating computed results to reflect the change in graph structure for the latest graph snapshot. Although certain monotonic path-based algorithms produce correct results by refining intermediate values via numerical comparisons, directly reusing values that were computed before mutation does not work correctly for algorithms that require BSP semantics. Since structural mutations in streaming graphs render the intermediate results unusable, exploiting incremental computation while simultaneously providing synchronous processing guarantees is challenging.
In this paper we develop GraphBolt which incrementally processes streaming graphs while guaranteeing BSP semantics. GraphBolt incorporates dependency-driven incremental processing where it first tracks dependencies to capture how intermediate values get computed, and then uses this information to incrementally propagate the impact of change across intermediate values. To support wide variety of graph-based analytics, GraphBolt provides a generalized incremental programming model that enables development of incremental versions of complex aggregations. Our evaluation shows that GraphBolt's incremental processing eliminates redundant computations and efficiently processes streaming graphs with varying mutation rates, starting from just a single edge mutation all the way up to 1 million edge mutations at a time. Furthermore, being specialized for graph computations, GraphBolt extracts high performance compared to Differential Dataflow.

References

[1]
Daniel J Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, et al. The Design of the Borealis Stream Processing Engine. In CIDR, volume 5, pages 277--289, 2005.
[2]
Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, and Shivakumar Venkataraman. Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams. In SIGMOD, pages 577--588, New York, NY, USA, 2013.
[3]
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast Incremental and Personalized PageRank. VLDB, 4(3):173--184, December 2010.
[4]
Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Mitch Cherniack, Christian Convey, Eddie Galvez, Jon Salz, Michael Stonebraker, Nesime Tatbul, et al. Retrospective on Aurora. The VLDB Journal, 13(4):370--383, 2004.
[5]
Pramod Bhatotia, Umut A Acar, Flavio P Junqueira, and Rodrigo Rodrigues. Slider: Incremental Sliding Window Analytics. In Middleware, pages 61--72. ACM, 2014.
[6]
Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa. Efficiently Updating Materialized Views. In SIGMOD, pages 61--71, 1986.
[7]
Paolo Boldi and Sebastiano Vigna. The WebGraph Framework I: Compression Techniques. In WWW, pages 595--601, 2004.
[8]
Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. Measuring User Influence in Twitter: The Million Follower Fallacy. In ICWSM, pages 10--17, 2010.
[9]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In EuroSys, pages 1:1--1:15, 2015.
[10]
Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. Kineograph: Taking the Pulse of a Fast-changing and Connected World. In EuroSys, pages 85--98, 2012.
[11]
Prasanna Desikan, Nishith Pathak, Jaideep Srivastava, and Vipin Kumar. Incremental Page Rank Computation on Evolving Graphs. In WWW, pages 1094--1095, 2005.
[12]
David Ediger, Karl Jiang, Jason Riedy, and David A. Bader. Massive Streaming Data Analytics: A Case Study with Clustering Coefficients. In IPDPSW, pages 1--8, 2010.
[13]
David Ediger, Rob Mccoll, Jason Riedy, and David A. Bader. STINGER: High Performance Data Structure for Streaming Graphs. In HPEC, pages 1--5, 2012.
[14]
Friendster network dataset. http://konect.uni-koblenz.de/networks/friendster.KONECT, 2015.
[15]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In OSDI, pages 17--30, 2012.
[16]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI, pages 599--613, 2014.
[17]
Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. Maintaining Views Incrementally. In SIGMOD, pages 157--166, 1993.
[18]
Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. Chronos: A Graph Engine for Temporal Graph Analysis. In EuroSys, pages 1:1--1:14, New York, NY, USA, 2014. ACM.
[19]
Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. Time-Evolving Graph Processing at Scale. In GRADES, page 5. ACM, 2016.
[20]
U Kang, Duen Horng, and Christos Faloutsos. Inference of Beliefs on Billion-Scale Graphs. In KDD-LDMTA, 2010.
[21]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, A Social Network or a News Media? In WWW, pages 591--600, 2010.
[22]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski, and Google Inc. Pregel: A System for Large-Scale Graph Processing. In SIGMOD, pages 135--146, 2010.
[23]
Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better Algorithms for Counting Triangles in Data Streams. In PODS, pages 401--411, 2016.
[24]
Frank McSherry. A Uniform Approach to Accelerated PageRank Computation. In WWW '05, pages 575--582, 2005.
[25]
Frank McSherry, Derek G. Murray, Rebecca Isaacs, and Michael Isard. Differential Dataflow. In CIDR, 2013.
[26]
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: A Timely Dataflow System. In SOSP, pages 439--455. ACM, 2013.
[27]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A Lightweight Infrastructure for Graph Analytics. In SOSP, pages 456--471, 2013.
[28]
Kamal Nigam and Rayid Ghani. Analyzing the Effectiveness and Applicability of Co-training. In CIKM, pages 86--93. ACM, 2000.
[29]
Vivek Nigam, Limin Jia, Boon Thau Loo, and Andre Scedrov. Maintaining Distributed Logic Programs Incrementally. In PPDP, pages 125--136, 2011.
[30]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford University, 1998.
[31]
Frederick Reiss, Kurt Stockinger, Kesheng Wu, Arie Shoshani, and Joseph M. Hellerstein. Enabling Real-Time Querying of Live and Historical Stream Data. In SSDBM, pages 28--, 2007.
[32]
Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, and Reynold Cheng. On Querying Historical Evolving Graph Sequences, 2011.
[33]
Jason Riedy and Henning Meyerhenke. Scalable Algorithms for Analysis of Massive, Streaming Graphs. In SIAM PP, 2012.
[34]
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. Chaos: Scale-out Graph Processing from Secondary Storage. In SOSP, pages 410--424, 2015.
[35]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In SOSP, pages 472--488, 2013.
[36]
Pratanu Roy, Arijit Khan, and Gustavo Alonso. Augmented Sketch: Faster and More Accurate Stream Processing. In SIGMOD, pages 1449--1463, New York, NY, USA, 2016. ACM.
[37]
Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Euro-Par, pages 319--333. Springer, 2016.
[38]
Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. Tornado: A System For Real-Time Iterative Analysis Over Evolving Data. In SIGMOD, pages 417--430, New York, NY, USA, 2016. ACM.
[39]
Julian Shun and Guy E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In PPoPP, pages 135--146, 2013.
[40]
Toyotaro Suzumura, Shunsuke Nishii, and Masaru Ganse. Towards Large-scale Graph Stream Processing Platform. In WWW Companion, pages 1321--1326, 2014.
[41]
Kanat Tangwongsan, A. Pavan, and Srikanta Tirthapura. Parallel Triangle Counting in Massive Streaming Graphs. In CIKM, pages 781--786, 2013.
[42]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. Storm @ Twitter. In SIGMOD, pages 147--156. ACM, 2014.
[43]
Keval Vora, Rajiv Gupta, and Guoqing Xu. Synergistic Analysis of Evolving Graphs. ACM TACO, 13(4):32, 2016.
[44]
Keval Vora, Rajiv Gupta, and Guoqing Xu. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In ASPLOS, pages 237--251, 2017.
[45]
Keval Vora, Sai Charan Koduru, and Rajiv Gupta. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms Using a Relaxed Consistency Based DSM. In OOPSLA, pages 861--878, 2014.
[46]
Keval Vora, Guoqing (Harry) Xu, and Rajiv Gupta. Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing. In USENIX ATC, pages 507--522, 2016.
[47]
Wikipedia links, english network dataset. http://konect.uni-koblenz.de/networks/wikipedia_link_en. KONECT, 2017.
[48]
Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. GraM: Scaling Graph Computation to the Trillions. In SoCC, pages 408--421, New York, NY, USA, 2015. ACM.
[49]
Yahoo! Webscope Program. http://webscope.sandbox.yahoo.com/.
[50]
Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, and Ion Stoica. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. In HotCloud, 2012.
[51]
Erik Zeitler and Tore Risch. Massive Scale-out of Expensive Continuous Queries. In VLDB, pages 1181--1188, 2011.
[52]
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In AAIM, pages 337--348. Springer, 2008.
[53]
Xiaojin Zhu and Zoubin Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. In CMU Technical Report CALD-02-107, 2002.
[54]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A Computation-Centric Distributed Graph Processing System. In OSDI, pages 301--316, 2016.

Cited By

View all
  • (2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 1-Jun-2024
  • (2024)BYO: A Unified Framework for Benchmarking Large-Scale Graph ContainersProceedings of the VLDB Endowment10.14778/3665844.366585917:9(2307-2320)Online publication date: 1-May-2024
  • (2024)IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight UpdatesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698524(915-932)Online publication date: 20-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019
March 2019
714 pages
ISBN:9781450362818
DOI:10.1145/3302424
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Incremental Processing
  2. Streaming Graphs

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

EuroSys '19
Sponsor:
EuroSys '19: Fourteenth EuroSys Conference 2019
March 25 - 28, 2019
Dresden, Germany

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)122
  • Downloads (Last 6 weeks)11
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 1-Jun-2024
  • (2024)BYO: A Unified Framework for Benchmarking Large-Scale Graph ContainersProceedings of the VLDB Endowment10.14778/3665844.366585917:9(2307-2320)Online publication date: 1-May-2024
  • (2024)IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight UpdatesProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698524(915-932)Online publication date: 20-Nov-2024
  • (2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
  • (2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2024)TARIS: Scalable Incremental Processing of Time-Respecting Algorithms on Streaming GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347157435:12(2527-2544)Online publication date: Dec-2024
  • (2024)TeGraph+: Scalable Temporal Graph Processing Enabling Flexible Edge ModificationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339391435:8(1469-1487)Online publication date: Aug-2024
  • (2024)Kairos: Enabling Prompt Monitoring of Information Diffusion Over Temporal NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334762136:12(8607-8621)Online publication date: Dec-2024
  • (2024)A Streaming Graph Partitioning Method to Achieve High Cohesion and Equilibrium via Multiplayer Repeated GameIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.322623011:1(803-814)Online publication date: Feb-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media