[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3662158.3662770acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
research-article
Open access

Streaming Graph Algorithms in the Massively Parallel Computation Model

Published: 17 June 2024 Publication History

Abstract

We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs via large batches of edge insertions and deletions using as little memory as possible.
We focus on the nowadays canonical model for the study of theoretical algorithms for massive networks, the Massively Parallel Computation (MPC) model. We design MPC algorithms that efficiently process evolving graphs: in a constant number of rounds they can handle large batches of edge updates for problems such as connectivity, minimum spanning forest, and approximate matching while adhering to the most restrictive memory regime, in which the local memory per machine is strongly sublinear in the number of vertices and the total memory is sublinear in the graph size. These results improve upon earlier works in this area which rely on using larger total space, proportional to the size of the processed graph. Our work demonstrates that parallel algorithms can process dynamically changing graphs with asymptotically optimal utilization of MPC resources: parallel time, local memory, and total memory, while processing large batches of edge updates.

References

[1]
Umut A. Acar, Daniel Anderson, Guy E. Blelloch, and Laxman Dhulipala. 2019. Parallel Batch-Dynamic Graph Connectivity. In SPAA. 381--392.
[2]
Umut A. Acar, Daniel Anderson, Guy E. Blelloch, Laxman Dhulipala, and Sam Westrick. 2020. Parallel Batch-Dynamic Trees via Change Propagation. In ESA. 2:1--2:23. Also in CoRR abs/2002.05129 (2020).
[3]
Kook Jin Ahn and Sudipto Guha. 2018. Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints. ACM Transactions on Parallel Computing 4, 4 (2018), 17:1--17:40.
[4]
Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Analyzing Graph Structure via Linear Measurements. In SODA. 459--467.
[5]
Daniel Anderson, Guy E. Blelloch, and Kanat Tangwongsan. 2020. Work-Efficient Batch-Incremental Minimum Spanning Trees with Applications to the Sliding-Window Model. In SPAA. 51--61.
[6]
Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel Algorithms for Geometric Graph Problems. In STOC. 574--583.
[7]
Alexandr Andoni, Zhao Song, Clifford Stein, Zhengyu Wang, and Peilin Zhong. 2018. Parallel Graph Connectivity in Log Diameter Rounds. In FOCS. 674--685.
[8]
Sepehr Assadi, MohammadHossein Bateni, Aaron Bernstein, Vahab S. Mirrokni, and Cliff Stein. 2019. Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs. In SODA. 1616--1635.
[9]
Sepehr Assadi, Sanjeev Khanna, and Yang Li. 2017. On Estimating Maximum Matching Size in Graph Streams. In SODA. 1723--1742.
[10]
Sepehr Assadi, Sanjeev Khanna, and Yang Li. 2021. Tight Bounds for Single-Pass Streaming Complexity of the Set Cover Problem. SIAM J. Comput. 50, 3 (2021).
[11]
Sepehr Assadi, Sanjeev Khanna, Yang Li, and Grigory Yaroslavtsev. 2016. Maximum Matchings in Dynamic Graph Streams and the Simultaneous Communication Model. In SODA. 1345--1364.
[12]
Paul Beame, Paraschos Koutris, and Dan Suciu. 2017. Communication Steps for Parallel Query Processing. J. ACM 64, 6 (2017), 40:1--40:58.
[13]
Soheil Behnezhad, Mahsa Derakhshan, and MohammadTaghi Hajiaghayi. 2018. Brief Announcement: Semi-MapReduce Meets Congested Clique. CoRR abs/1802.10297 (2018).
[14]
Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Lacki, and Vahab S. Mirrokni. 2019. Near-Optimal Massively Parallel Graph Connectivity. In FOCS. 1615--1636.
[15]
Soheil Behnezhad, MohammadTaghi Hajiaghayi, and David G. Harris. 2019. Exponentially Faster Massively Parallel Maximal Matching. In FOCS. 1637--1649.
[16]
Omri Ben-Eliezer, Rajesh Jayaram, David P. Woodruff, and Eylon Yogev. 2022. A Framework for Adversarially Robust Streaming Algorithms. J. ACM 69, 2 (2022), 17:1--17:33.
[17]
Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, and Torsten Hoefler. 2023. Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems. IEEE Transactions on Parallel and Distributed Systems 34, 6 (2023), 1860--1876.
[18]
Keren Censor-Hillel, Elad Haramaty, and Zohar S. Karnin. 2016. Optimal Dynamic Distributed MIS. In PODC. 217--226.
[19]
Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One Trillion Edges: Graph Processing at Facebook-Scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815.
[20]
Rajesh Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Andrew McGregor, Morteza Monemizadeh, and Sofya Vorotnikova. 2016. Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams. In SODA. 1326--1344.
[21]
Graham Cormode and Hossein Jowhari. 2019. Lp Samplers and Their Applications: A Survey. Comput. Surveys 52, 1 (2019), 16:1--16:31.
[22]
Graham Cormode and Ke Yi. 2020. Small Summaries for Big Data. Cambridge University Press.
[23]
Artur Czumaj, Peter Davies, and Merav Parter. 2021. Component Stability in Low-Space Massively Parallel Computation. In PODC. 481--491.
[24]
Artur Czumaj, Jakub Lacki, Aleksander Madry, Slobodan Mitrovic, Krzysztof Onak, and Piotr Sankowski. 2018. Round Compression for Parallel Matching Algorithms. In STOC. 471--484.
[25]
Samir Datta, Anish Mukherjee, Nils Vortmeier, and Thomas Zeume. 2018. Reachability and Distances under Multiple Changes. In ICALP. 120:1--120:14.
[26]
Jeffrey Dean and Sanjay Ghemawat. 2010. MapReduce: A Flexible Data Processing Tool. Commun. ACM 53, 1 (2010), 72--77.
[27]
Laxman Dhulipala, David Durfee, Janardhan Kulkarni, Richard Peng, Saurabh Sawlani, and Xiaorui Sun. 2020. Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds. In SODA. 1300--1319.
[28]
Laxman Dhulipala, Quanquan C. Liu, Julian Shun, and Shangdi Yu. 2021. Parallel Batch-Dynamic k-Clique Counting. In APOCS. 129--143.
[29]
Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. 2005. On Graph Problems in a Semi-Streaming Model. Theoretical Computer Science 348, 2-3 (2005), 207--216.
[30]
Mohsen Ghaffari, Themis Gouleakis, Christian Konrad, Slobodan Mitrovic, and Ronitt Rubinfeld. 2018. Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover. In PODC. 129--138.
[31]
Mohsen Ghaffari and Jara Uitto. 2019. Sparsifying Distributed Algorithms with Ramifications in Massively Parallel Computation and Centralized Local Computation. In SODA. 1636--1653.
[32]
Seth Gilbert and Lawrence Er Lu Li. 2020. How Fast Can You Update Your MST?. In SPAA. 531--533.
[33]
Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. 2011. Sorting, Searching, and Simulation in the MapReduce Framework. In ISAAC. 374--383. Also in CoRR abs/1101.1902 (2011).
[34]
Kathrin Hanauer, Monika Henzinger, and Christian Schulz. 2022. Recent Advances in Fully Dynamic Graph Algorithms --- A Quick Reference Guide. ACM Journal of Experimental Algorithmics 27 (2022), 1.11:1--1.11:45. Also in CoRR abs/2102.11169 (2021).
[35]
James W. Hegeman and Sriram V. Pemmaraju. 2015. Lessons from the Congested Clique applied to MapReduce. Theoretical Computer Science 608 (2015), 268--281.
[36]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. SIGOPS Operating Systems Review 41, 3 (2007), 59--72.
[37]
Giuseppe F. Italiano, Silvio Lattanzi, Vahab S. Mirrokni, and Nikos Parotsidis. 2019. Dynamic Algorithms for the Massively Parallel Computation Model. In SPAA. 49--58.
[38]
Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A Model of Computation for MapReduce. In SODA. 938--948.
[39]
Christian Konrad. 2015. Maximum Matching in Turnstile Streams. In ESA. 840--852.
[40]
Pradeep Kumar and H. Howie Huang. 2016. G-Store: High-Performance Graph Store for Trillion-Edge Processing. In SC. 830--841.
[41]
Jakub Lacki, Slobodan Mitrovic, Krzysztof Onak, and Piotr Sankowski. 2020. Walking Randomly, Massively, and Efficiently. In STOC. 364--377.
[42]
Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. 2011. Filtering: A Method for Solving Graph Problems in MapReduce. In SPAA. 85--94.
[43]
Quanquan C. Liu, Jessica Shi, Shangdi Yu, Laxman Dhulipala, and Julian Shun. 2022. Parallel Batch-Dynamic Algorithms for k-Core Decomposition and Related Graph Problems. In SPAA. 191--204.
[44]
Andrew McGregor. 2014. Graph Stream Algorithms: A Survey. SIGMOD Records 43, 1 (2014), 9--20.
[45]
S. Muthukrishnan. 2005. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1, 2 (2005).
[46]
Jelani Nelson and Huacheng Yu. 2019. Optimal Lower Bounds for Distributed and Streaming Spanning Forest Computation. In SODA. 1844--1860.
[47]
Krzysztof Nowicki and Krzysztof Onak. 2021. Dynamic Graph Algorithms with Batch Updates in the Massively Parallel Computation Model. In SODA. 2939--2958.
[48]
Tim Roughgarden, Sergei Vassilvitskii, and Joshua R. Wang. 2018. Shuffles and Circuits (On Lower Bounds for Modern Parallel Computation). J. ACM 65, 6 (2018), 41:1--41:24.
[49]
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out Graph Processing from Secondary Storage. In SOSP. 410--424.
[50]
Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In SIGMOD. 505--516.
[51]
Thomas Tseng, Laxman Dhulipala, and Guy E. Blelloch. 2019. Batch-Parallel Euler Tour Trees. In ALENEX. 92--106.
[52]
Tom Tseng, Laxman Dhulipala, and Julian Shun. 2022. Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering. In SPAA. 233--245.
[53]
Tom White. 2015. Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale (4 ed.). O'Reilly. http://www.oreilly.de/catalog/9781491901632/index.html
[54]
Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. 2015. GraM: scaling graph computation to the trillions. In SoCC. 408--421.
[55]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In HotCloud. https://www.usenix.org/conference/hotcloud -cluster-computing-working-sets

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODC '24: Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing
June 2024
570 pages
ISBN:9798400706684
DOI:10.1145/3662158
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2024

Check for updates

Author Tags

  1. MPC
  2. streaming
  3. connectivity
  4. minimum spanning tree
  5. matching

Qualifiers

  • Research-article

Conference

PODC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 740 of 2,477 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 181
    Total Downloads
  • Downloads (Last 12 months)181
  • Downloads (Last 6 weeks)39
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media