[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2968826.2969013guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Distributed power-law graph computing: theoretical and empirical analysis

Published: 08 December 2014 Publication History

Abstract

With the emergence of big graphs in a variety of real applications like social networks, machine learning based on distributed graph-computing (DGC) frameworks has attracted much attention from big data machine learning community. In DGC frameworks, the graph partitioning (GP) strategy plays a key role to affect the performance, including the workload balance and communication cost. Typically, the degree distributions of natural graphs from real applications follow skewed power laws, which makes GP a challenging task. Recently, many methods have been proposed to solve the GP problem. However, the existing GP methods cannot achieve satisfactory performance for applications with power-law graphs. In this paper, we propose a novel vertex-cut method, called degree-based hashing (DBH), for GP. DBH makes effective use of the skewed degree distributions for GP. We theoretically prove that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance. Furthermore, empirical results on several large power-law graphs also show that DBH can outperform the state of the art.

References

[1]
Lada A Adamic and Bernardo A Huberman. Zipf's law and the internet. Glottometrics, 3(1):143-150, 2002.
[2]
Paolo Boldi and Sebastiano Vigna. The webgraph framework I: compression techniques. In Proceedings of the 13th international conference on World Wide Web (WWW), 2004.
[3]
Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web. Computer networks, 33(1):309-320, 2000.
[4]
Rong Chen, Jiaxin Shi, Yanzhe Chen, Haibing Guan, and Haibo Chen. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. Technical Report IPADSTR-2013-001, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University, 2013.
[5]
Timothy A Davis and Yifan Hu. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38(1):1, 2011.
[6]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012.
[7]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.
[8]
Nilesh Jain, Guangdeng Liao, and Theodore L Willke. Graphbuilder: scalable graph etl framework. In Proceedings of the First International Workshop on Graph Data Management Experiences and Systems, 2013.
[9]
George Karypis and Vipin Kumar. Multilevel graph partitioning schemes. In Proceedings of the International Conference on Parallel Processing (ICPP), 1995.
[10]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media. In Proceedings of the 19th international conference on World Wide Web (WWW), 2010.
[11]
Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. Graphchi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012.
[12]
Jure Leskovec. Stanford large network dataset collection. URL http://snap.stanford.edu/data/index.html, 2011.
[13]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Heller-stein. GraphLab: A new framework for parallel machine learning. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2010.
[14]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. Distributed graphlab: A framework for machine learning in the cloud. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2012.
[15]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2010.
[16]
Alan Mislove, Massimiliano Marcon, Krishna P Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet Measurement, 2007.
[17]
Martin Raab and Angelika Steger. balls into binsa simple and tight analysis. In Randomization and Approximation Techniques in Computer Science, pages 159-170. Springer, 1998.
[18]
Isabelle Stanton and Gabriel Kliot. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2012.
[19]
Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM), 2014.
[20]
Lu Wang, Yanghua Xiao, Bin Shao, and Haixun Wang. How to partition a billion-node graph. In Proceedings of the International Conference on Data Engineering (ICDE), 2014.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1
December 2014
3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Local Graph Edge PartitioningACM Transactions on Intelligent Systems and Technology10.1145/346668512:5(1-25)Online publication date: 23-Sep-2021
  • (2019)Dynamic scaling for parallel graph computationsProceedings of the VLDB Endowment10.14778/3324301.332430512:8(877-890)Online publication date: 1-Apr-2019
  • (2019)Wandering between close and apartProceedings of the ACM Turing Celebration Conference - China10.1145/3321408.3321576(1-6)Online publication date: 17-May-2019
  • (2019)Alleviating Users' Pain of Waiting: Effective Task Grouping for Online-to-Offline Food Delivery ServicesThe World Wide Web Conference10.1145/3308558.3313464(773-783)Online publication date: 13-May-2019
  • (2019)PowerLyraACM Transactions on Parallel Computing10.1145/32989895:3(1-39)Online publication date: 22-Jan-2019
  • (2018)Streaming graph partitioningProceedings of the VLDB Endowment10.14778/3236187.323620811:11(1590-1603)Online publication date: 1-Jul-2018
  • (2018)Q-graphProceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3210259.3210265(1-10)Online publication date: 10-Jun-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media