More Web Proxy on the site http://driver.im/

research-article

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds

Authors:

Minyi GuoAuthors Info & Claims

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

Article No.: 73, Pages 1 - 11

https://doi.org/10.1145/3404397.3404451

Published: 17 August 2020 Publication History

Abstract

Database platform-as-a-service (dbPaaS) is developing rapidly and a large number of databases have been migrated to run on the Clouds for the low cost and flexibility. Emerging Clouds rely on the tenants to provide the resource specification for their database workloads. However, they tend to over-estimate the resource requirement of their databases, resulting in the unnecessarily high cost and low Cloud utilization. A methodology that automatically suggests the “just-enough” resource specification that fulfills the performance requirement of every database workload is profitable.

To this end, we propose URSA, a capacity planning and fair scheduling system that is comprised of an online capacity planner, a performance interference estimator, and a contention-aware scheduling engine. The capacity planner identifies the most cost-efficient resource specification for a database workload to achieve the required performance online. The interference estimator quantifies the pressure on the shared resource and the tolerance to the shared resource contention of each workload. The scheduling engine schedules the workloads across Cloud nodes carefully to eliminate unfair performance interference between the co-located workloads. Experimental results show that URSA reduces up to 25.9% of CPU usage, 53.4% of memory and reduces the performance unfairness between the co-located workloads by 47.6% usage compared to the prior works without hurting their performance.

References

[1]

[n.d.]. Kunernetes. https://kubernetes.io.

[2]

[n.d.]. Linux containers. https://linuxcontainers.org.

[3]

Alibaba. [n.d.]. AliSQL. https://github.com/alibaba/AliSQL.

[4]

Amazon.[n.d.]. Amazon Relational Database Service.https://aws.amazon.com/rds.

[5]

Timothy G Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: a database benchmark based on the Facebook social graph. In SIGMOD. ACM, 1185–1196.

[6]

Jens Axboe. 2014. Fio-flexible io tester. URL http://freecode.com/projects/fio(2014).

[7]

Michael J Cahill, Uwe Röhm, and Alan D Fekete. 2009. Serializable isolation for snapshot databases. ACM Transactions on Database Systems 34, 4 (2009), 20.

Digital Library

[8]

Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. 2017. Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. ACM SIGARCH Computer Architecture News 45, 1 (2017), 17–32.

Digital Library

[9]

Alibaba Cloud.[n.d.]. Alibaba Relational Database Service.https://www.alibabacloud.com/zh/product/apsaradb-for-rds.

[10]

Linux community. 2015. perf: Linux profiling with performance counters.

[11]

Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In SoCC. ACM, 143–154.

[12]

The Transaction Processing Council.2007. TPC-C Benchmark. http://www.tpc.org/tpcc/spec/tpcc_current.pdf.

[13]

Christina Delimitrou and Christos Kozyrakis. 2013. ibench: Quantifying interference for datacenter applications. In 2013 IEEE international symposium on workload characterization (IISWC). IEEE, 23–33.

[14]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In ACM SIGPLAN Notices, Vol. 48. ACM, 77–88.

[15]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices 49, 4 (2014), 127–144.

Digital Library

[16]

Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudre-Mauroux. 2013. Oltp-bench: An extensible testbed for benchmarking relational databases. Proceedings of the VLDB Endowment 7, 4 (2013), 277–288.

Digital Library

[17]

[17] Gartner.[n.d.]. www.gartner.com/en/newsroom/press-releases/2018-04-12-gartner-forecasts-worldwide-public-cloud-revenue-to-grow-21-percent-in-2018.

[18]

Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy H Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI. 22–22.

[19]

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural networks 2, 5 (1989), 359–366.

Digital Library

[20]

Intel. 2016. Intel Resource Director Technology. (2016).

[21]

Harshad Kasture and Daniel Sanchez. 2014. Ubik: efficient cache sharing with strict qos for latency-critical workloads. In ACM SIGPLAN Notices, Vol. 49. ACM, 729–742.

[22]

Wonyoung Kim, Meeta S Gupta, Gu-Yeon Wei, and David Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In HPCA. IEEE, 123–134.

[23]

Alexey Kopytov. 2004. SysBench: a system performance benchmark. http://sysbench. sourceforge. net/(2004).

[24]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 450–462.

[25]

Jonathan Mace, Peter Bodik, Madanlal Musuvathi, Rodrigo Fonseca, and Krishnan Varadarajan. 2016. 2dfq: Two-dimensional fair queuing for multi-tenant cloud services. In Proceedings of the 2016 ACM SIGCOMM Conference. 144–159.

Digital Library

[26]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Micro. ACM, 248–259.

Digital Library

[27]

Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes. 2011. Cloudscale: elastic resource scaling for multi-tenant cloud systems. In SoCC. ACM, 5.

[28]

Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267–288.

[29]

Ajay Tirumala. 1999. Iperf: The TCP/UDP bandwidth measurement tool. http://dast. nlanr. net/Projects/Iperf/(1999).

[30]

Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In SIGMOD. ACM, 1009–1024.

[31]

Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, 2013. Apache hadoop yarn: Yet another resource negotiator. In SoCC. ACM, 5.

[32]

Deepak Vohra. 2017. Scheduling pods on nodes. In Kubernetes Management Design Patterns. Springer, 199–236.

[33]

Zhenning Wang, Long Zheng, Quan Chen, and Minyi Guo. 2013. CAP: co-scheduling based on asymptotic profiling in CPU+ GPU hybrid systems. In Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores. 107–114.

Digital Library

[34]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 607–618.

Digital Library

[35]

Wei Zhang, Weihao Cui, Kaihua Fu, Quan Chen, Daniel Edward Mawhirter, Bo Wu, Chao Li, and Minyi Guo. 2019. Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. In Proceedings of the ACM International Conference on Supercomputing. 58–68.

Digital Library

[36]

Yunqi Zhang, Michael A Laurenzano, Jason Mars, and Lingjia Tang. 2014. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In Micro. IEEE, 406–418.

Cited By

Rosinosky GSchmitz DRivière E(2024)StreamBed: Capacity Planning for Stream ProcessingProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666034(90-102)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3629104.3666034
Shi TYang YCheng YGao XFang ZYang Y(2023)Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00095(908-917)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00095
Fu KZhang WChen QZeng DGuo M(2022)Adaptive Resource Efficient Microservice Deployment in Cloud-Edge ContinuumIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312803733:8(1825-1840)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1109/TPDS.2021.3128037
Show More Cited By

Recommendations

vSocket: virtual socket interface for RDMA in public clouds
VEE 2019: Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

RDMA has been widely adopted as a promising solution for high performance networks, but is still unavailable for a large number of socket-based applications running in public clouds due to the following reasons. There is no available virtualization ...
Precise Capacity Planning for Database Public Clouds
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

Platform-as-a-service (PaaS) is a type of Cloud computing in which a service provider delivers a platform to tenants. Within the PaaS category, the fastest-growing segment is the database platform as a service (dbPaaS). Tenants rent Cloud instances to ...
Homogeneous and Automated Migration of Virtual Machines Between Multiple Public Clouds
WebMedia '22: Proceedings of the Brazilian Symposium on Multimedia and the Web

The objective of this work is to analyze the required steps for automated migration of Virtual Machines (VMs) using a proposed solution, called Kumo, through scenarios using public clouds, such as Amazon Web Services (AWS), Microsoft Azure (AZ) and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

August 2020

844 pages

ISBN:9781450388160

DOI:10.1145/3404397

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
the National R&D Program of China

Conference

ICPP '20

ICPP '20: 49th International Conference on Parallel Processing

August 17 - 20, 2020

AB, Edmonton, Canada

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
347
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rosinosky GSchmitz DRivière E(2024)StreamBed: Capacity Planning for Stream ProcessingProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666034(90-102)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3629104.3666034
Shi TYang YCheng YGao XFang ZYang Y(2023)Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00095(908-917)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00095
Fu KZhang WChen QZeng DGuo M(2022)Adaptive Resource Efficient Microservice Deployment in Cloud-Edge ContinuumIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312803733:8(1825-1840)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1109/TPDS.2021.3128037
Zhang WChen QZheng NCui WFu KGuo M(2022)Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUsIEEE Transactions on Computers10.1109/TC.2021.306435271:4(866-879)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TC.2021.3064352
Lan DTaherkordi AEliassen FLiu LDelbruel SDustdar SYang Y(2022)Task Partitioning and Orchestration on Heterogeneous Edge Platforms: The Case of Vision ApplicationsIEEE Internet of Things Journal10.1109/JIOT.2022.31539709:10(7418-7432)Online publication date: 15-May-2022
https://doi.org/10.1109/JIOT.2022.3153970
Duan YWang NWu J(2022)Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline SchedulingJournal of Computer Science and Technology10.1007/s11390-021-1488-437:4(852-868)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1007/s11390-021-1488-4
Cui WZhao HChen QZheng NLeng JZhao JSong ZMa TYang YLi CGuo Mde Supinski BHall MGamblin T(2021)Enable simultaneous DNN services based on deterministic operator overlap and precise latency predictionProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476143(1-15)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476143
Pang PChen QZeng DGuo M(2021)Adaptive Preference-Aware Co-Location for Improving Resource Utilization of Power Constrained DatacentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.302399732:2(441-456)Online publication date: 1-Feb-2021
https://doi.org/10.1109/TPDS.2020.3023997
Fu KZhang WChen QZeng DPeng XZheng WGuo M(2021)QoS-Aware and Resource Efficient Microservice Deployment in Cloud-Edge Continuum2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00102(932-941)Online publication date: May-2021
https://doi.org/10.1109/IPDPS49936.2021.00102
Zhang WFu KZheng NChen QLi CZheng WGuo M(2021)CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters2021 IEEE 39th International Conference on Computer Design (ICCD)10.1109/ICCD53106.2021.00056(307-315)Online publication date: Oct-2021
https://doi.org/10.1109/ICCD53106.2021.00056
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents