Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2207.12499 (cs)

[Submitted on 25 Jul 2022]

Title:Interference and Need Aware Workload Colocation in Hyperscale Datacenters

Authors:Sayak Chakraborti, Brian Coutinho, Sandhya Dwarkadas, Parth Malani, Bikash Sharma

View PDF

Abstract:Datacenters suffer from resource utilization inefficiencies due to the conflicting goals of service owners and platform providers. Service owners intending to maintain Service Level Objectives (SLO) for themselves typically request a conservative amount of resources. Platform providers want to increase operational efficiency to reduce capital and operating costs. Achieving both operational efficiency and SLO for individual services at the same time is challenging due to the diversity in service workload characteristics, resource usage patterns that are dependent on input load, heterogeneity in platform, memory, I/O, and network architecture, and resource bundling.
This paper presents a tunable approach to resource allocation that accounts for both dynamic service resource needs and platform heterogeneity. In addition, an online K-Means-based service classification method is used in conjunction with an offline sensitivity component. Our tunable approach allows trading resource utilization efficiency for absolute SLO guarantees based on the service owners' sensitivity to its SLO. We evaluate our tunable resource allocator at scale in a private cloud environment with mostly latency-critical workloads. When tuning for operational efficiency, we demonstrate up to ~50% reduction in required machines; ~40% reduction in Total-Cost-of-Ownership (TCO); and ~60% reduction in CPU and memory fragmentation, but at the cost of increasing the number of tasks experiencing degradation of SLO by up to ~25% compared to the baseline. When tuning for SLO, by introducing interference-aware colocation, we can tune the solver to reduce tasks experiencing degradation of SLO by up to ~22% compared to the baseline, but at an additional cost of ~30% in terms of the number of hosts. We highlight this trade-off between TCO and SLO violations, and offer tuning based on the requirements of the platform owners.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2207.12499 [cs.DC]
	(or arXiv:2207.12499v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2207.12499

Submission history

From: Sayak Chakraborti [view email]
[v1] Mon, 25 Jul 2022 20:01:50 UTC (7,224 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Interference and Need Aware Workload Colocation in Hyperscale Datacenters

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Interference and Need Aware Workload Colocation in Hyperscale Datacenters

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators