[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

CuSP: A Customizable Streaming Edge Partitioner for Distributed Graph Analytics

Published: 06 June 2021 Publication History

Abstract

Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This requires partitioning the graph among the machines in the cluster. Existing graph analytics systems use a built-in partitioner that incorporates a particular partitioning policy, but the best policy is dependent on the algorithm, input graph, and platform. Therefore, built-in partitioners are not sufficiently flexible. Stand-alone graph partitioners are available, but they too implement only a few policies.
CuSP is a fast streaming edge partitioning framework which permits users to specify the desired partitioning policy at a high level of abstraction and quickly generates highquality graph partitions. For example, it can partition wdc12, the largest publicly available web-crawl graph with 4 billion vertices and 129 billion edges, in under 2 minutes for clusters with 128 machines. Our experiments show that it can produce quality partitions 6× faster on average than the state-of-theart stand-alone partitioner in the literature while supporting a wider range of partitioning policies.

References

[1]
http://www.graph500.org.
[2]
Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. BUbiNG: Massive Crawling for the Masses. In WWW, 2014.
[3]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, 2011.
[4]
Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004.
[5]
E. G. Boman, K. D. Devine, and S. Rajamanickam. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In SC, 2013.
[6]
D. Buono, T. De Matteis, G. Mencagli, and M. Vanneschi. Optimizing message-passing on multicore architectures using hardware multi-threading. In 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pages 262-- 270, Feb 2014.
[7]
Ümit V. Çatalyürek, Cevdet Aykanat, and Bora Uçar. On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe. SIAM J. Sci. Comput., 2010.
[8]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. EuroSys '15, 2015.
[9]
Hoang-Vu Dang, Roshan Dathathri, Gurbinder Gill, Alex Brooks, Nikoli Dryden, Andrew Lenharth, Loc Hoang, Keshav Pingali, and Marc Snir. A Lightweight Communication Runtime for Distributed Graph Analytics. In IPDPS, 2018.
[10]
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang- Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. Gluon: A Communication Optimizing Framework for Distributed Heterogeneous Graph Analytics. PLDI, 2018.
[11]
Gurbinder Gill, Roshan Dathathri, Loc Hoang, and Keshav Pingali. A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms. volume 12 of PVLDB, 2018.
[12]
Apache Giraph. http://giraph.apache.org/, 2013.
[13]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. OSDI'12, 2012.
[14]
Jiewen Huang and Daniel J. Abadi. Leopard: Lightweight edge-oriented partitioning and replication for dynamic graphs. Proc. VLDB Endow., 9(7):540--551, March 2016.
[15]
George Karypis and Vipin Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput.
[16]
Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings Publishing Co., Inc., 1994.
[17]
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker Graphs: An Approach to Modeling Networks. J. Mach. Learn. Res., 2010.
[18]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD Intl Conf. on Management of Data, SIGMOD '10, pages 135--146, 2010.
[19]
C. Martella, D. Logothetis, A. Loukas, and G. Siganos. Spinner: Scalable graph partitioning in the cloud. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pages 1083--1094, April 2017.
[20]
Christian Mayer, Ruben Mayer, Muhammad Adnan Tariq, Heiko Geppert, Larissa Laich, Lukas Rieger, and Kurt Rothermel. Adwise: Adaptive window-based streaming edge partitioning for high-speed graph processing. pages 685--695, 07 2018.
[21]
Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. Web Data Commons, 2012.
[22]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A Lightweight Infrastructure for Graph Analytics. SOSP '13.
[23]
Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, and Giorgio Iacoboni. HDRF: Stream- Based Partitioning for Power-Law Graphs. CIKM '15, 2015. 59
[24]
The Lemur Project. The ClueWeb12 Dataset, 2013.
[25]
X. Que, F. Checconi, F. Petrini, and J. A. Gunnels. Scalable community detection with the louvain algorithm. In IPDPS, 2015.
[26]
G. M. Slota, K. Madduri, and S. Rajamanickam. PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks. In IEEE Big Data, 2014.
[27]
G. M. Slota, S. Rajamanickam, K. Devine, and K. Madduri. Partitioning Trillion-Edge Graphs in Minutes. In IPDPS, 2017.
[28]
Isabelle Stanton and Gabriel Kliot. Streaming Graph Partitioning for Large Distributed Graphs. KDD, 2012.
[29]
Dan Stanzione, Bill Barth, Niall Gaffney, Kelly Gaither, Chris Hempel, Tommy Minyard, S. Mehringer, Eric Wernert, H. Tufo, D. Panda, and P. Teller. Stampede 2: The Evolution of an XSEDE Supercomputer. PEARC17, 2017.
[30]
Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. WSDM '14, 2014.
[31]
Johan Ugander and Lars Backstrom. Balanced Label Propagation for Partitioning Massive Graphs. WSDM '13, 2013.
[32]
L. Wang, Y. Xiao, B. Shao, and H. Wang. How to partition a billion-node graph. In ICDE, 2014.
[33]
Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. Distributed Power-law Graph Computing: Theoretical and Empirical Analysis. In NIPS. 2014.
[34]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A Computation-centric Distributed Graph Processing System. OSDI'16, 2016.

Cited By

View all
  • (2023)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 1-Nov-2023
  • (2023)An adaptive graph sampling framework for graph analyticsSocial Network Analysis and Mining10.1007/s13278-023-01157-x14:1Online publication date: 6-Dec-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 55, Issue 1
SIGOPS
July 2021
107 pages
ISSN:0163-5980
DOI:10.1145/3469379
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2021
Published in SIGOPS Volume 55, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 1-Nov-2023
  • (2023)An adaptive graph sampling framework for graph analyticsSocial Network Analysis and Mining10.1007/s13278-023-01157-x14:1Online publication date: 6-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media