[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Towards characterizing cloud backend workloads: insights from Google compute clusters

Published: 27 March 2010 Publication History

Abstract

The advent of cloud computing promises highly available, efficient, and flexible computing services for applications such as web search, email, voice over IP, and web search alerts. Our experience at Google is that realizing the promises of cloud computing requires an extremely scalable backend consisting of many large compute clusters that are shared by application tasks with diverse service level requirements for throughput, latency, and jitter. These considerations impact (a) capacity planning to determine which machine resources must grow and by how much and (b) task scheduling to achieve high machine utilization and to meet service level objectives.
Both capacity planning and task scheduling require a good understanding of task resource consumption (e.g., CPU and memory usage). This in turn demands simple and accurate approaches to workload classification-determining how to form groups of tasks (workloads) with similar resource demands. One approach to workload classification is to make each task its own workload. However, this approach scales poorly since tens of thousands of tasks execute daily on Google compute clusters. Another approach to workload classification is to view all tasks as belonging to a single workload. Unfortunately, applying such a coarse-grain workload classification to the diversity of tasks running on Google compute clusters results in large variances in predicted resource consumptions.
This paper describes an approach to workload classification and its application to the Google Cloud Backend, arguably the largest cloud backend on the planet. Our methodology for workload classification consists of: (1) identifying the workload dimensions; (2) constructing task classes using an off-the-shelf algorithm such as k-means; (3) determining the break points for qualitative coordinates within the workload dimensions; and (4) merging adjacent task classes to reduce the number of workloads. We use the foregoing, especially the notion of qualitative coordinates, to glean several insights about the Google Cloud Backend: (a) the duration of task executions is bimodal in that tasks either have a short duration or a long duration; (b) most tasks have short durations; and (c) most resources are consumed by a few tasks with long duration that have large demands for CPU and memory.

References

[1]
Google Cluster Data, http://googleresearch.blogspot.com/2010/01/google-cluster-data.html.
[2]
Y. Aridor, T. Domany, O. Goldshmidt, E. Shmueli, J. Moreira, and L. Stockmeier. Multi-Toroidal Interconnects: Using Additional Communication Links to Improve Utilization of Parallel Computers. Job Scheduling Strategies for Parallel Processing, LNCS, 2004.
[3]
H. P. Artis. Capacity Planning for MVS Computer Systems. SIGMETRICS PER, 8(4):45--62, 1980.
[4]
P. Barford and M. Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. SIGMETRICS PER, 1998.
[5]
M. Calzarossa and D. Ferrari. A Sensitivity Study of the Clustering Approach to Workload Modeling. In SIGMETRICS '85: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, 1985.
[6]
M. Calzarossa and G. Serazzi. Workload characterization: A survey. Proceedings of the IEEE, 81(8):1136--1150, Aug 1993.
[7]
S. J. Chapin, W. Cirne, D. G. Feitelson, J. P. Jones, S. T. Leutenegger, U. Schwiegelshohn, W. Smith, and D. Talby. Benchmarks and Standards for the Evaluation of Parallel Job Schedulers. In Job Scheduling Strategies for Parallel Processing, LNCS. 1999.
[8]
L. Cherkasova and P. Phaal. Session-Based Admission Control: A Mechanism for Peak Load Management of Commercial Web Sites. IEEE Transactions on Computers, 51(6):669--685, 2002.
[9]
W. Cirne and F. Berman. A Comprehensive Model of the Supercomputer Workload. In 4th Workshop on Workload Characterization, pages 140--148, Dec 2001.
[10]
M. E. Crovella. Performance Evaluation with Heavy Tailed Distributions. In Job Scheduling Strategies for Parallel Processing, LNCS. 2001.
[11]
A. B. Downey and D. G. Feitelson. The Elusive Goal of Workload Characterization. SIGMETRICS PER, 1999.
[12]
D. Ersoz, M. S. Yousif, and C. R. Das. Characterizing network traffic in a cluster-based, multi-tier data center. In ICDCS '07: Proceedings of the 27th Intl. Conference on Distributed Computing Systems, 2007.
[13]
D. G. Feitelson. Metric and Workload Effects on Computer Systems Evaluation. Computer, 36(9):18--25, Sep 2003.
[14]
D. G. Feitelson and B. Nitzberg. Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860. In Job Scheduling Strategies for Parallel Processing, LNCS. 1995.
[15]
D. Ferrari. On The Foundations of Artificial Workload Design. In SIGMETRICS '84: Proceedings of the 1984 ACMSIGMETRICS Conf. on Measurement and Modeling of Computer Systems, 1984.
[16]
J. A. Hartigan. Probability and Mathematical Statistics. John Wiley, 1975.
[17]
J. L. Hellerstein, F. Zhang, and P. Shahabuddin. A Statistical Approach to Predictive Detection. Computer Networks, 2001.
[18]
E. Hernández-Orallo and J. Vila-Carbó. Web Server Performance Analysis using Histogram Workload Models. Comput. Netw., 2009.
[19]
C. B. Lee, Y. Schwartzman, J. Hardy, and A. Snavely. Are User Runtime Estimates Inherently Inaccurate? In Job Scheduling Strategies for Parallel Processing, LNCS. 2004.
[20]
F. Petrini, E. Frachtenberg, A. Hoisie, and S. Coll. Performance Evaluation of the Quadrics Interconnection Network. Cluster Comput., 6(2):1125--142, Apr 2003.
[21]
B. Song, C. Ernemann, and R. Yahyapour. Parallel Computer Workload Modeling with Markov Chains. In Job Scheduling Strategies for Parallel Processing, LNCS. 2004.
[22]
D. Talby, D. G. Feitelson, and A. Raveh. A Co-Plot Analysis of Logs and Models of Parallel Workloads. ACM Transactions on Modeling & Comput. Simulation (TOMACS), 12(3), Jul 2007.
[23]
D. Tsafrir and D. G. Feitelson. The Dynamics of Backfilling: Solving the Mystery of Why Increased Inaccuracy May Help. In IEEE International Symposium on Workload Characterization (IISWC), 2006.
[24]
F. Wang, Q. Xin, B. Hong, S. A. Brandt, E. L. Miller, D. D. E. Long, and T. T. Mclarty. File System Workload Analysis for Large Scale Scientific Computing Applications. In Proc. of the 21st IEEE / 12th NASA Goddard Conf. on Mass Storage Systems and Tech., 2004.

Cited By

View all
  • (2024)Dynamic Resource Aggregation Method Based on Statistical Capacity DistributionElectronics10.3390/electronics1323461713:23(4617)Online publication date: 22-Nov-2024
  • (2024)Offloading Datacenter Jobs to RISC-V Hardware for Improved Performance and Power EfficiencyProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689152(39-52)Online publication date: 16-Sep-2024
  • (2024)TraceUpscaler: Upscaling Traces to Evaluate Systems at High LoadProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629581(942-961)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 37, Issue 4
March 2010
87 pages
ISSN:0163-5999
DOI:10.1145/1773394
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2010
Published in SIGMETRICS Volume 37, Issue 4

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)5
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Dynamic Resource Aggregation Method Based on Statistical Capacity DistributionElectronics10.3390/electronics1323461713:23(4617)Online publication date: 22-Nov-2024
  • (2024)Offloading Datacenter Jobs to RISC-V Hardware for Improved Performance and Power EfficiencyProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689152(39-52)Online publication date: 16-Sep-2024
  • (2024)TraceUpscaler: Upscaling Traces to Evaluate Systems at High LoadProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629581(942-961)Online publication date: 22-Apr-2024
  • (2024)HEXO: Offloading Long-Running Compute- and Memory-Intensive Workloads on Low-Cost, Low-Power Embedded SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2024.348217812:4(1415-1432)Online publication date: Oct-2024
  • (2024)The Non-Saturated Multiserver Job Queuing Model with Two Job Classes: a Matrix Geometric Analysis2024 32nd International Conference on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS64422.2024.10786546(1-8)Online publication date: 21-Oct-2024
  • (2024)A Survey on Scheduling Techniques in Computing and Network ConvergenceIEEE Communications Surveys & Tutorials10.1109/COMST.2023.332902726:1(160-195)Online publication date: 1-Jan-2024
  • (2024)DRIFTNET-EnVACK: Adaptive Drift Detection in Cloud Data Streams With Ensemble Variational Auto-Encoder Featuring Contextual NetworkIEEE Access10.1109/ACCESS.2024.340943312(80020-80034)Online publication date: 2024
  • (2024)Power-Efficient Joint Dynamic Resource Allocation in Virtualized Inter-Data Center Elastic Optical NetworksIEEE Access10.1109/ACCESS.2024.340620612(75599-75609)Online publication date: 2024
  • (2024)ML WPStreamCloud: ML-based Workload Prediction and Task Clustering for Efficient Stream Application Ofoading in Heterogeneous Edge and Cloud EnvironmentsProcedia Computer Science10.1016/j.procs.2024.09.610246(1527-1537)Online publication date: 2024
  • (2023)Workload Characterization and Classification: A Step Towards Better Resource Utilization in a Cloud Data CenterPertanika Journal of Science and Technology10.47836/pjst.31.5.2731:5(2559-2575)Online publication date: 27-Jul-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media