[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2807591.2807632acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Free access

Data partitioning strategies for graph workloads on heterogeneous clusters

Published: 15 November 2015 Publication History

Abstract

Large scale graph analytics are an important class of problem in the modern data center. However, while data centers are trending towards a large number of heterogeneous processing nodes, graph analytics frameworks still operate under the assumption of uniform compute resources. In this paper, we develop heterogeneity-aware data ingress strategies for graph analytics workloads using the popular PowerGraph framework. We illustrate how simple estimates of relative node computational throughput can guide heterogeneity-aware data partitioning algorithms to provide balanced graph cutting decisions. Our work enhances five online data ingress strategies from a variety of sources to optimize application execution for throughput differences in heterogeneous data centers. The proposed partitioning algorithms improve the runtime of several popular machine learning and data mining applications by as much as a 65% and on average by 32% as compared to the default, balanced partitioning approaches.

References

[1]
Amazon EC2. http://aws.amazon.com/ec2. Accessed: 04-16-2015.
[2]
Apache hadoop. https://hadoop.apache.org/. Accessed: 08-11-2015.
[3]
Ec2 network estimations. http://www.aerospike.com/blog/boosting-amazon-ec2-network-for-high-throughput. Accessed: 04-16-2015.
[4]
Intel xeon phi coprocessor. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html. Accessed: 04-16-2015.
[5]
Netflix dataset. http://www.select.cs.cmu.edu/code/graphlab/datasets/. Accessed: 04-16-2015.
[6]
F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar. Tarazu: Optimizing mapreduce on heterogeneous clusters. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 61--74, New York, NY, USA, 2012. ACM.
[7]
C. Avery. Giraph: Large-scale graph processing infrastructure on hadoop. Proceedings of the Hadoop Summit. Santa Clara, 2011.
[8]
R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In EuroSys, Apr. 2015.
[9]
B.-G. Chun, G. Iannaccone, G. Iannaccone, R. Katz, G. Lee, and L. Niccolini. An energy case for hybrid datacenters. SIGOPS Oper. Syst. Rev., 44(1):76--80, Mar. 2010.
[10]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[11]
Z. Fadika, E. Dede, J. Hartog, and M. Govindaraju. Marla: Mapreduce for heterogeneous clusters. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID '12, pages 49--56, Washington, DC, USA, 2012. IEEE Computer Society.
[12]
S. Garg, S. Sundaram, and H. D. Patel. Robust heterogeneous data center design: A principled approach. SIGMETRICS Perform. Eval. Rev., 39(3):28--30, Dec. 2011.
[13]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30. USENIX Association, 2012.
[14]
Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L. Willke. How well do graph-processing platforms perform? an empirical performance evaluation and analysis. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS '14, pages 395--404, Washington, DC, USA, 2014. IEEE Computer Society.
[15]
M. Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, and T. Jin. An experimental comparison of pregel-like graph processing systems. Proc. VLDB Endow., 7(12):1047--1058, Aug. 2014.
[16]
N. Jain, G. Liao, and T. L. Willke. Graphbuilder: Scalable graph etl framework. GRADES, pages 4:1--4:6, 2013.
[17]
G. Karypis and V. Kumar. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. 1995.
[18]
Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: A system for dynamic load balancing in large-scale graph processing. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 169--182, New York, NY, USA, 2013. ACM.
[19]
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30--37, 2009.
[20]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW '10: Proceedings of the 19th international conference on World wide web, pages 591--600, New York, NY, USA, 2010. ACM.
[21]
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 31--46, Berkeley, CA, USA, 2012. USENIX Association.
[22]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data. Accessed: 04-16-2015.
[23]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012.
[24]
Y. Low, J. E. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. UAI, pages 340--349.
[25]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM.
[26]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.
[27]
S. Salihoglu and J. Widom. Gps: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM, pages 22:1--22:12, New York, NY, USA, 2013. ACM.
[28]
T. Schank. Algorithmic aspects of triangle-based network analysis. PhD thesis, University Karlsruhe, 2007.
[29]
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 333--342. ACM, 2014.
[30]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 10--10, Berkeley, CA, USA, 2010. USENIX Association.
[31]
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 29--42, Berkeley, CA, USA, 2008. USENIX Association.
[32]
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. In Algorithmic Aspects in Information and Management, pages 337--348. Springer, 2008.

Cited By

View all
  • (2025)Optimizing Data Processing Through Edge-Cloud Integration: A Comprehensive FrameworkInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT25111235211:1(3374-3383)Online publication date: 25-Feb-2025
  • (2024) CBANA: A Lightweight, Efficient, and Flexible C ache B ehavior Ana lysis Framework IEEE Transactions on Computers10.1109/TC.2024.341674773:9(2262-2274)Online publication date: Sep-2024
  • (2024)Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00098(468-477)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
  • General Chair:
  • Jackie Kern,
  • Program Chair:
  • Jeffrey S. Vetter
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. data center
  3. data partitioning
  4. graph algorithms
  5. heterogeneous

Qualifiers

  • Research-article

Funding Sources

Conference

SC15
Sponsor:

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)145
  • Downloads (Last 6 weeks)22
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Optimizing Data Processing Through Edge-Cloud Integration: A Comprehensive FrameworkInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT25111235211:1(3374-3383)Online publication date: 25-Feb-2025
  • (2024) CBANA: A Lightweight, Efficient, and Flexible C ache B ehavior Ana lysis Framework IEEE Transactions on Computers10.1109/TC.2024.341674773:9(2262-2274)Online publication date: Sep-2024
  • (2024)Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00098(468-477)Online publication date: 27-May-2024
  • (2023)PANCODE: Multilevel Partitioning of Neural Networks for Constrained Internet-of-Things DevicesIEEE Access10.1109/ACCESS.2023.323424511(2058-2077)Online publication date: 2023
  • (2023)HAEP: Heterogeneous Environment Aware Edge Partitioning for Power-Law GraphsDatabase Systems for Advanced Applications10.1007/978-3-031-30675-4_23(331-340)Online publication date: 15-Apr-2023
  • (2022)Towards distributed bitruss decomposition on bipartite graphsProceedings of the VLDB Endowment10.14778/3538598.353861015:9(1889-1901)Online publication date: 1-May-2022
  • (2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
  • (2020)A Fast Lock for Explicit Message Passing ArchitecturesIEEE Transactions on Computers10.1109/TC.2020.3015727(1-1)Online publication date: 2020
  • (2020)Accelerating Force-directed Graph Layout with Processing-in-Memory Architecture2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC50609.2020.00041(271-282)Online publication date: Dec-2020
  • (2019)A survey on data storage and placement methodologies for Cloud-Big Data ecosystemJournal of Big Data10.1186/s40537-019-0178-36:1Online publication date: 11-Feb-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media