More Web Proxy on the site http://driver.im/

Article

Preemptive, low latency datacenter scheduling via lightweight virtualization

Authors:

Xiaobo ZhouAuthors Info & Claims

USENIX ATC '17: Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference

Pages 251 - 263

Published: 12 July 2017 Publication History

Abstract

Data centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. On the one hand, latency-critical jobs need to be scheduled as soon as they are submitted to avoid any queuing delays. On the other hand, best-effort long jobs should be allowed to occupy the cluster when there are idle resources to improve cluster utilization. The challenge lies in how to minimize the queuing delays of short jobs while maximizing cluster utilization. Existing solutions either forcibly kill long jobs to guarantee low latency for short jobs or disable preemption to optimize utilization. Hybrid approaches with resource reservations have been proposed but need to be tuned for specific workloads.

In this paper, we propose and develop BIG-C, a container-based resource management framework for Big Data cluster computing. The key design is to leverage lightweight virtualization, a.k.a, containers to make tasks preemptable in cluster scheduling. We devise two types of preemption strategies: immediate and graceful preemptions and show their effectiveness and tradeoffs with loosely-coupled MapReduce workloads as well as iterative, in-memory Spark workloads. Based on the mechanisms for task preemption, we further develop a preemptive fair share cluster scheduler. We have implemented BIG-C in YARN. Our evaluation with synthetic and production workloads shows that low-latency and high utilization can be both attained when scheduling heterogeneous workloads on a contended cluster

References

[1]

Apache hadoop project. https://hadoop.apache.org/.

[2]

Spark-sql. http://spark.apache.org/sql/.

[3]

ANANTHANARAYANAN, G., DOUGLAS, C., RAMAKRISHNAN, R., RAO, S., AND STOICA, I. True elasticity in multitenant data-intensive compute clusters. In Proceedings of the Third ACM Symposium on Cloud Computing (2012).

[4]

ARON, M., DRUSCHEL, P., AND ZWAENEPOEL, W. Cluster reserves: a mechanism for resource management in cluster-based network servers. In Proceedings of ACM SIGMETRICS Performance Evaluation Review (2000).

[5]

BARROSO, L. A., AND HOELZLE, U. The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 2009.

[6]

BURNS, B., AND OPPENHEIMER, D. Design patterns for container-based distributed systems. In Proceedings of the 8th USENIX Workshop on Hot Topics in Cloud Computing (Hot-Cloud 16) (2016).

[7]

CHEN, Y., ALSPAUGH, S., AND KATZ, R. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. In Proceedings of the VLDB Endowment (2012).

[8]

CHO, B., RAHMAN, M., CHAJED, T., GUPTA, I., ABAD, C., ROBERTS, N., AND LIN, P. Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters. In Proceedings of the 4th annual Symposium on Cloud Computing (2013).

[9]

CURINO, C., DIFALLAH, D. E., DOUGLAS, C., KRISHNAN, S., RAMAKRISHNAN, R., AND RAO, S. Reservation-based scheduling: If you're late don't blame us! In Proceedings of the ACM Symposium on Cloud Computing (2014).

[10]

DELGADO, P., DINU, F., KERMARREC, A.-M., AND ZWAENEPOEL, W. Hawk: Hybrid datacenter scheduling. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC 15) (2015).

[11]

DELIMITROU, C., AND KOZYRAKIS, C. Quasar: Resource-efficientand qos-aware cluster management. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (2014).

[12]

FERGUSON, A. D., BODIK, P., KANDULA, S., BOUTIN, E., AND FONSECA, R. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems (2012).

[13]

GHODSI, A., ZAHARIA, M., HINDMAN, B., KONWINSKI, A., SHENKER, S., AND STOICA, I. Dominant resource fairness: Fair allocation of multiple resource types. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (2011).

[14]

GRANDL, R., CHOWDHURY, M., AKELLA, A., AND ANANTHANARAYANAN, G. Altruistic scheduling in multiresource clusters. In Proceedings of OSDI16: 12th USENIX Symposium on Operating Systems Design and Implementation (2016).

[15]

HARTER, T., SALMON, B., LIU, R., ARPACI-DUSSEAU, A. C., AND ARPACI-DUSSEAU, R. H. Slacker: fast distribution with lazy docker containers. In Proceedings of 14th USENIX Conference on File and Storage Technologies (FAST 16) (2016).

[16]

HINDMAN, B., KONWINSKI, A., ZAHARIA, M., GHODSI, A., JOSEPH, A. D., KATZ, R. H., SHENKER, S., AND STOICA, I. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (2011).

[17]

HUANG, S., HUANG, J., DAI, J., XIE, T., AND HUANG, B. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Proceedings of the Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on (2010).

[18]

KARANASOS, K., RAO, S., CURINO, C., DOUGLAS, C., CHALIPARAMBIL, K., FUMAROLA, G.M., HEDDAYA, S., RAMAKRISHNAN, R., AND SAKALANAGA, S. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC 15) (2015).

[19]

LI, J., PU, C., CHEN, Y., TALWAR, V., AND MILOJICIC, D. Improving preemptive scheduling with application-transparent checkpointing in shared clusters. In Proceedings of the 16th Annual Middleware Conference (2015).

[20]

LO, D., CHENG, L., GOVINDARAJU, R., RANGANATHAN, P., AND KOZYRAKIS, C. Heracles: improving resource efficiency at scale. In Processings of the ACM SIGARCH Computer Architecture News (2015).

[21]

MARS, J., TANG, L., HUNDT, R., SKADRON, K., AND SOFFA, M. L. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture (2011).

[22]

MERKEL, D. Docker: lightweight linux containers for consistent development and deployment. Proceedings of the Linux Journal (2014).

[23]

MURRAY, D. G., MCSHERRY, F., ISAACS, R., ISARD, M., BARHAM, P., AND ABADI, M. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013).

[24]

NGUYEN, K., FANG, L., XU, G., DEMSKY, B., LU, S., ALAMIAN, S., AND MUTLU, O. Yak: A high-performance big-data-friendly garbage collector. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016).

[25]

OUSTERHOUT, K., WENDELL, P., ZAHARIA, M., AND STOICA, I. Sparrow: distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013).

[26]

RAO, S., RAMAKRISHNAN, R., SILBERSTEIN, A., OVSIANNIKOV, M., AND REEVES, D. Sailfish: A framework for large scale data processing. In Proceedings of the Third ACM Symposium on Cloud Computing (2012).

[27]

RASLEY, J., KARANASOS, K., KANDULA, S., FONSECA, R., VOJNOVIC, M., AND RAO, S. Efficient queue management for cluster scheduling. In Proceedings of the Eleventh European Conference on Computer Systems (2016).

[28]

REISS, C., TUMANOV, A., GANGER, G. R., KATZ, R. H., AND KOZUCH, M. A. Towards understanding heterogeneous clouds at scale: Google trace analysis. Proceedings of the Intel Science and Technology Center for Cloud Computing, Tech. Rep (2012).

[29]

SCHWARZKOPF, M., KONWINSKI, A., ABD-EL-MALEK, M., AND WILKES, J. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (2013).

[30]

THUSOO, A., SARMA, J. S., JAIN, N., SHAO, Z., CHAKKA, P., ANTHONY, S., LIU, H.,WYCKOFF, P., AND MURTHY, R. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment (2009).

[31]

VAVILAPALLI, V. K., MURTHY, A. C., DOUGLAS, C., AGARWAL, S., KONAR, M., EVANS, R., GRAVES, T., LOWE, J., SHAH, H., SETH, S., ET AL. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing (2013).

[32]

VENKATARAMAN, S., YANG, Z., FRANKLIN, M., RECHT, B., AND STOICA, I. Ernest: efficient performance prediction for large-scale advanced analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16) (2016).

[33]

VERMA, A., PEDROSA, L., KORUPOLU, M., OPPENHEIMER, D., TUNE, E., AND WILKES, J. Large-scale cluster management at google with borg. In Proceedings of the Tenth European Conference on Computer Systems (2015).

[34]

XAVIER, M. G., NEVES, M. V., ROSSI, F. D., FERRETO, T. C., LANGE, T., AND DE ROSE, C. A. Performance evaluation of container-based virtualization for high performance computing environments. In Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (2013).

[35]

YANG, H., BRESLOW, A., MARS, J., AND TANG, L. Bubbleflux: Precise online qos management for increased utilization in warehouse scale computers. In Proceedings of the ACM SIGARCH Computer Architecture News (2013).

[36]

ZAHARIA, M., CHOWDHURY, M., FRANKLIN, M. J., SHENKER, S., AND STOICA, I. Spark: Cluster computing with working sets. Proceedings of HOTCLOUD'16 USENIX Workshop on Hot Topics in Cloud Computing (2010).

[37]

ZAHARIA, M., KONWINSKI, A., JOSEPH, A. D., KATZ, R. H., AND STOICA, I. Improving mapreduce performance in heterogeneous environments. In OSDI (2008).

[38]

ZHANG, Y., PREKAS, G., FUMAROLA, G. M., FONTOURA, M., GOIRI, Í., AND BIANCHINI, R. History-based harvesting of spare cycles and storage in large-scale datacenters. In Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016).

Cited By

Xiao ZLiu KHu MWu D(2024)DeepCTS: A Deep Reinforcement Learning Approach for AI Container Task SchedulingProceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning10.1145/3654823.3654885(342-347)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1145/3654823.3654885
Zhao JZhou XChang SXu CButt AMi NChard K(2023)Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in GolangProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592998(169-180)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592998
Choi WUrgaonkar BKandemir MKesidis GBellavista PZhang KGherbi ABagchi SPatiño MDi Modica GGascon-Samson J(2022)Multi-resource fair allocation for consolidated flash-based caching systemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3565245(202-215)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3528535.3565245
Show More Cited By

Preemptive, low latency datacenter scheduling via lightweight virtualization

Recommendations

Preemptive and non-preemptive scheduling on two unrelated parallel machines
Abstract
In this paper, for the problem of minimizing the makespan on two unrelated parallel machines we compare the quality of preemptive and non-preemptive schedules. It is known that there exists an optimal preemptive schedule with at most two ...
From preemptive to non-preemptive speed-scaling scheduling

We are given a set of jobs, each one specified by its release date, its deadline and its processing volume (work), and a single (or a set of) speed-scalable processor(s). We adopt the standard model in speed-scaling in which if a processor runs at speed ...
Preemptive Hadoop Jobs Scheduling under a Deadline
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and Grids

MapReduce has become the dominant programming model in a cloud-based data processing environment, such as Hadoop. First In First Out (FIFO) is the default job scheduling policy of Hadoop, but it cannot guarantee that the job will be completed by a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

USENIX ATC '17: Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference

July 2017

811 pages

ISBN:9781931971386

Program Chairs:
Dilma Da Silva
Texas A&M University
,
Bryan Ford
École Polytechnique Fédérale de Lausanne

Sponsors

VMware
NetApp
Microsoft: Microsoft
Facebook: Facebook
ORACLE: ORACLE

Publisher

USENIX Association

United States

Publication History

Published: 12 July 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xiao ZLiu KHu MWu D(2024)DeepCTS: A Deep Reinforcement Learning Approach for AI Container Task SchedulingProceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning10.1145/3654823.3654885(342-347)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1145/3654823.3654885
Zhao JZhou XChang SXu CButt AMi NChard K(2023)Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in GolangProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592998(169-180)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592998
Choi WUrgaonkar BKandemir MKesidis GBellavista PZhang KGherbi ABagchi SPatiño MDi Modica GGascon-Samson J(2022)Multi-resource fair allocation for consolidated flash-based caching systemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3565245(202-215)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3528535.3565245
Zhao JPi AZhou XChang SXu CBellavista PZhang KGherbi ABagchi SPatiño MDi Modica GGascon-Samson J(2022)Improving Concurrent GC for Latency Critical Services in Multi-tenant SystemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3531515(43-55)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3528535.3531515
Pi AZhou XXu CWeissman JChandra AGavrilovska ATiwari D(2022)HolmesProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531464(110-121)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3502181.3531464
Yu JFeng DTong WLv PXiong Y(2021)CERES: Container-Based Elastic Resource Management System for Mixed WorkloadsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472459(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472459
Pi AZhao JWang SZhou XZhang KGherbi AVenkatasubramanian NVeiga L(2021)Memory at your serviceProceedings of the 22nd International Middleware Conference10.1145/3464298.3493394(185-197)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3464298.3493394
Chen WPi AWang SZhou X(2019)OS-Augmented Oversubscription of Opportunistic Memory with a User-Assisted OOM KillerProceedings of the 20th International Middleware Conference10.1145/3361525.3361534(28-40)Online publication date: 9-Dec-2019
https://dl.acm.org/doi/10.1145/3361525.3361534
Chen WPi AWang SZhou X(2019)PufferfishProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362730(259-271)Online publication date: 20-Nov-2019
https://dl.acm.org/doi/10.1145/3357223.3362730
Zhou WWhite KYu H(2019)Improving Short Job Latency Performance in Hybrid Job Schedulers with DiceProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337851(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337851
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents