[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3154690.3154714guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Preemptive, low latency datacenter scheduling via lightweight virtualization

Published: 12 July 2017 Publication History

Abstract

Data centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. On the one hand, latency-critical jobs need to be scheduled as soon as they are submitted to avoid any queuing delays. On the other hand, best-effort long jobs should be allowed to occupy the cluster when there are idle resources to improve cluster utilization. The challenge lies in how to minimize the queuing delays of short jobs while maximizing cluster utilization. Existing solutions either forcibly kill long jobs to guarantee low latency for short jobs or disable preemption to optimize utilization. Hybrid approaches with resource reservations have been proposed but need to be tuned for specific workloads.
In this paper, we propose and develop BIG-C, a container-based resource management framework for Big Data cluster computing. The key design is to leverage lightweight virtualization, a.k.a, containers to make tasks preemptable in cluster scheduling. We devise two types of preemption strategies: immediate and graceful preemptions and show their effectiveness and tradeoffs with loosely-coupled MapReduce workloads as well as iterative, in-memory Spark workloads. Based on the mechanisms for task preemption, we further develop a preemptive fair share cluster scheduler. We have implemented BIG-C in YARN. Our evaluation with synthetic and production workloads shows that low-latency and high utilization can be both attained when scheduling heterogeneous workloads on a contended cluster

References

[1]
Apache hadoop project. https://hadoop.apache.org/.
[2]
Spark-sql. http://spark.apache.org/sql/.
[3]
ANANTHANARAYANAN, G., DOUGLAS, C., RAMAKRISHNAN, R., RAO, S., AND STOICA, I. True elasticity in multitenant data-intensive compute clusters. In Proceedings of the Third ACM Symposium on Cloud Computing (2012).
[4]
ARON, M., DRUSCHEL, P., AND ZWAENEPOEL, W. Cluster reserves: a mechanism for resource management in cluster-based network servers. In Proceedings of ACM SIGMETRICS Performance Evaluation Review (2000).
[5]
BARROSO, L. A., AND HOELZLE, U. The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 2009.
[6]
BURNS, B., AND OPPENHEIMER, D. Design patterns for container-based distributed systems. In Proceedings of the 8th USENIX Workshop on Hot Topics in Cloud Computing (Hot-Cloud 16) (2016).
[7]
CHEN, Y., ALSPAUGH, S., AND KATZ, R. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. In Proceedings of the VLDB Endowment (2012).
[8]
CHO, B., RAHMAN, M., CHAJED, T., GUPTA, I., ABAD, C., ROBERTS, N., AND LIN, P. Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters. In Proceedings of the 4th annual Symposium on Cloud Computing (2013).
[9]
CURINO, C., DIFALLAH, D. E., DOUGLAS, C., KRISHNAN, S., RAMAKRISHNAN, R., AND RAO, S. Reservation-based scheduling: If you're late don't blame us! In Proceedings of the ACM Symposium on Cloud Computing (2014).
[10]
DELGADO, P., DINU, F., KERMARREC, A.-M., AND ZWAENEPOEL, W. Hawk: Hybrid datacenter scheduling. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC 15) (2015).
[11]
DELIMITROU, C., AND KOZYRAKIS, C. Quasar: Resource-efficientand qos-aware cluster management. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (2014).
[12]
FERGUSON, A. D., BODIK, P., KANDULA, S., BOUTIN, E., AND FONSECA, R. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems (2012).
[13]
GHODSI, A., ZAHARIA, M., HINDMAN, B., KONWINSKI, A., SHENKER, S., AND STOICA, I. Dominant resource fairness: Fair allocation of multiple resource types. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (2011).
[14]
GRANDL, R., CHOWDHURY, M., AKELLA, A., AND ANANTHANARAYANAN, G. Altruistic scheduling in multiresource clusters. In Proceedings of OSDI16: 12th USENIX Symposium on Operating Systems Design and Implementation (2016).
[15]
HARTER, T., SALMON, B., LIU, R., ARPACI-DUSSEAU, A. C., AND ARPACI-DUSSEAU, R. H. Slacker: fast distribution with lazy docker containers. In Proceedings of 14th USENIX Conference on File and Storage Technologies (FAST 16) (2016).
[16]
HINDMAN, B., KONWINSKI, A., ZAHARIA, M., GHODSI, A., JOSEPH, A. D., KATZ, R. H., SHENKER, S., AND STOICA, I. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (2011).
[17]
HUANG, S., HUANG, J., DAI, J., XIE, T., AND HUANG, B. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Proceedings of the Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on (2010).
[18]
KARANASOS, K., RAO, S., CURINO, C., DOUGLAS, C., CHALIPARAMBIL, K., FUMAROLA, G.M., HEDDAYA, S., RAMAKRISHNAN, R., AND SAKALANAGA, S. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC 15) (2015).
[19]
LI, J., PU, C., CHEN, Y., TALWAR, V., AND MILOJICIC, D. Improving preemptive scheduling with application-transparent checkpointing in shared clusters. In Proceedings of the 16th Annual Middleware Conference (2015).
[20]
LO, D., CHENG, L., GOVINDARAJU, R., RANGANATHAN, P., AND KOZYRAKIS, C. Heracles: improving resource efficiency at scale. In Processings of the ACM SIGARCH Computer Architecture News (2015).
[21]
MARS, J., TANG, L., HUNDT, R., SKADRON, K., AND SOFFA, M. L. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture (2011).
[22]
MERKEL, D. Docker: lightweight linux containers for consistent development and deployment. Proceedings of the Linux Journal (2014).
[23]
MURRAY, D. G., MCSHERRY, F., ISAACS, R., ISARD, M., BARHAM, P., AND ABADI, M. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013).
[24]
NGUYEN, K., FANG, L., XU, G., DEMSKY, B., LU, S., ALAMIAN, S., AND MUTLU, O. Yak: A high-performance big-data-friendly garbage collector. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016).
[25]
OUSTERHOUT, K., WENDELL, P., ZAHARIA, M., AND STOICA, I. Sparrow: distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013).
[26]
RAO, S., RAMAKRISHNAN, R., SILBERSTEIN, A., OVSIANNIKOV, M., AND REEVES, D. Sailfish: A framework for large scale data processing. In Proceedings of the Third ACM Symposium on Cloud Computing (2012).
[27]
RASLEY, J., KARANASOS, K., KANDULA, S., FONSECA, R., VOJNOVIC, M., AND RAO, S. Efficient queue management for cluster scheduling. In Proceedings of the Eleventh European Conference on Computer Systems (2016).
[28]
REISS, C., TUMANOV, A., GANGER, G. R., KATZ, R. H., AND KOZUCH, M. A. Towards understanding heterogeneous clouds at scale: Google trace analysis. Proceedings of the Intel Science and Technology Center for Cloud Computing, Tech. Rep (2012).
[29]
SCHWARZKOPF, M., KONWINSKI, A., ABD-EL-MALEK, M., AND WILKES, J. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (2013).
[30]
THUSOO, A., SARMA, J. S., JAIN, N., SHAO, Z., CHAKKA, P., ANTHONY, S., LIU, H.,WYCKOFF, P., AND MURTHY, R. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment (2009).
[31]
VAVILAPALLI, V. K., MURTHY, A. C., DOUGLAS, C., AGARWAL, S., KONAR, M., EVANS, R., GRAVES, T., LOWE, J., SHAH, H., SETH, S., ET AL. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing (2013).
[32]
VENKATARAMAN, S., YANG, Z., FRANKLIN, M., RECHT, B., AND STOICA, I. Ernest: efficient performance prediction for large-scale advanced analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16) (2016).
[33]
VERMA, A., PEDROSA, L., KORUPOLU, M., OPPENHEIMER, D., TUNE, E., AND WILKES, J. Large-scale cluster management at google with borg. In Proceedings of the Tenth European Conference on Computer Systems (2015).
[34]
XAVIER, M. G., NEVES, M. V., ROSSI, F. D., FERRETO, T. C., LANGE, T., AND DE ROSE, C. A. Performance evaluation of container-based virtualization for high performance computing environments. In Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (2013).
[35]
YANG, H., BRESLOW, A., MARS, J., AND TANG, L. Bubbleflux: Precise online qos management for increased utilization in warehouse scale computers. In Proceedings of the ACM SIGARCH Computer Architecture News (2013).
[36]
ZAHARIA, M., CHOWDHURY, M., FRANKLIN, M. J., SHENKER, S., AND STOICA, I. Spark: Cluster computing with working sets. Proceedings of HOTCLOUD'16 USENIX Workshop on Hot Topics in Cloud Computing (2010).
[37]
ZAHARIA, M., KONWINSKI, A., JOSEPH, A. D., KATZ, R. H., AND STOICA, I. Improving mapreduce performance in heterogeneous environments. In OSDI (2008).
[38]
ZHANG, Y., PREKAS, G., FUMAROLA, G. M., FONTOURA, M., GOIRI, Í., AND BIANCHINI, R. History-based harvesting of spare cycles and storage in large-scale datacenters. In Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016).

Cited By

View all
  • (2024)DeepCTS: A Deep Reinforcement Learning Approach for AI Container Task SchedulingProceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning10.1145/3654823.3654885(342-347)Online publication date: 22-Mar-2024
  • (2023)Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in GolangProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592998(169-180)Online publication date: 7-Aug-2023
  • (2022)Multi-resource fair allocation for consolidated flash-based caching systemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3565245(202-215)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
USENIX ATC '17: Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference
July 2017
811 pages
ISBN:9781931971386

Sponsors

  • VMware
  • NetApp
  • Microsoft: Microsoft
  • Facebook: Facebook
  • ORACLE: ORACLE

Publisher

USENIX Association

United States

Publication History

Published: 12 July 2017

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DeepCTS: A Deep Reinforcement Learning Approach for AI Container Task SchedulingProceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning10.1145/3654823.3654885(342-347)Online publication date: 22-Mar-2024
  • (2023)Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in GolangProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592998(169-180)Online publication date: 7-Aug-2023
  • (2022)Multi-resource fair allocation for consolidated flash-based caching systemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3565245(202-215)Online publication date: 7-Nov-2022
  • (2022)Improving Concurrent GC for Latency Critical Services in Multi-tenant SystemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3531515(43-55)Online publication date: 7-Nov-2022
  • (2022)HolmesProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531464(110-121)Online publication date: 27-Jun-2022
  • (2021)CERES: Container-Based Elastic Resource Management System for Mixed WorkloadsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472459(1-10)Online publication date: 9-Aug-2021
  • (2021)Memory at your serviceProceedings of the 22nd International Middleware Conference10.1145/3464298.3493394(185-197)Online publication date: 6-Dec-2021
  • (2019)OS-Augmented Oversubscription of Opportunistic Memory with a User-Assisted OOM KillerProceedings of the 20th International Middleware Conference10.1145/3361525.3361534(28-40)Online publication date: 9-Dec-2019
  • (2019)PufferfishProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362730(259-271)Online publication date: 20-Nov-2019
  • (2019)Improving Short Job Latency Performance in Hybrid Job Schedulers with DiceProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337851(1-10)Online publication date: 5-Aug-2019
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media