[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3543507.3583298acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

Published: 30 April 2023 Publication History

Abstract

Oversubscription is a common practice for improving cloud resource utilization. It allows the cloud service provider to sell more resources than the physical limit, assuming not all users would fully utilize the resources simultaneously. However, how to design an oversubscription policy that improves utilization while satisfying some safety constraints remains an open problem. Existing methods and industrial practices are over-conservative, ignoring the coordination of diverse resource usage patterns and probabilistic constraints. To address these two limitations, this paper formulates the oversubscription for cloud as a chance-constrained optimization problem and proposes an effective Chance-Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem. Specifically, C2MARL reduces the number of constraints by considering their upper bounds and leverages a multi-agent reinforcement learning paradigm to learn a safe and optimal coordination policy. We evaluate our C2MARL on an internal cloud platform and public cloud datasets. Experiments show that our C2MARL outperforms existing methods in improving utilization () under different levels of safety constraints.

References

[1]
Amazon. 2018. BYOL and Oversubscription. https://aws.amazon.com/blogs/compute/byol-and-oversubscription/. Accessed: 2022-08-13.
[2]
D.P. Bertsekas. 1982. Constrained Optimization and Lagrange Multiplier Methods. Academic Press. https://books.google.com.hk/books¿id=AX0jIIftffkC
[3]
Faruk Caglar and Aniruddha Gokhale. 2014. iOverbook: intelligent resource-overbooking to support soft real-time applications in the cloud. In 2014 IEEE 7th International Conference on Cloud Computing. IEEE, 538–545.
[4]
Jie Chen, Chun Cao, Ying Zhang, Xiaoxing Ma, Haiwei Zhou, and Chengwei Yang. 2018. Improving cluster resource efficiency with oversubscription. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. IEEE, 144–153.
[5]
Jie Chen, Chun Cao, Ying Zhang, Xiaoxing Ma, Haiwei Zhou, and Chengwei Yang. 2018. Improving cluster resource efficiency with oversubscription. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. IEEE, 144–153.
[6]
Maxime C Cohen, Philipp W Keller, Vahab Mirrokni, and Morteza Zadimoghaddam. 2019. Overcommitment in cloud services: Bin packing with chance constraints. Management Science 65, 7 (2019), 3255–3271.
[7]
Dongsheng Ding, Kaiqing Zhang, Tamer Basar, and Mihailo Jovanovic. 2020. Natural policy gradient primal-dual method for constrained markov decision processes. Advances in Neural Information Processing Systems 33 (2020), 8378–8390.
[8]
Peter Geibel and Fritz Wysotzki. 2005. Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research 24 (2005), 81–108.
[9]
Rahul Ghosh and Vijay K Naik. 2012. Biting off safely more than you can chew: Predictive analytics for resource over-commit in iaas cloud. In 2012 IEEE Fifth International Conference on Cloud Computing. IEEE, 25–32.
[10]
Joachim Giesen and Soeren Laue. 2019. Combining ADMM and the Augmented Lagrangian Method for Efficiently Handling Many Constraints. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 4525–4531. https://doi.org/10.24963/ijcai.2019/629
[11]
Venu Govindaraju, Vijay Raghavan, and Calyampudi Radhakrishna Rao. 2015. Big data analytics. Elsevier.
[12]
Jing Guo, Zihao Chang, Sa Wang, Haiyang Ding, Yihui Feng, Liang Mao, and Yungang Bao. 2019. Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces. In Proceedings of the International Symposium on Quality of Service. 1–10.
[13]
Ori Hadary, Luke Marshall, Ishai Menache, Abhisek Pan, Esaias E Greeff, David Dion, Star Dorminey, Shailesh Joshi, Yang Chen, Mark Russinovich, and Thomas Moscibroda. 2020. Protean: VM Allocation Service at Scale. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 845–861. https://www.usenix.org/conference/osdi20/presentation/hadary
[14]
Rachel Householder, Scott Arnold, and Robert Green. 2014. On cloud-based oversubscription. arXiv preprint arXiv:1402.4758 (2014).
[15]
Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, 2021. { Prediction-Based} Power Oversubscription in Cloud Platforms. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 473–487.
[16]
Norman Levenberg and Evgeny A Poletsky. 2002. Reverse Markov inequality. In ANNALES-ACADEMIAE SCIENTIARUM FENNICAE MATHEMATICA, Vol. 27. ACADEMIA SCIENTIARUM FENNICA, 173–182.
[17]
Chuan Luo, Bo Qiao, Xin Chen, Pu Zhao, Randolph Yao, Hongyu Zhang, Wei Wu, Andrew Zhou, and Qingwei Lin. 2020. Intelligent Virtual Machine Provisioning in Cloud Computing. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 1495–1502. https://doi.org/10.24963/ijcai.2020/208 Main track.
[18]
Elton Pan, Panagiotis Petsagkourakis, Max Mowbray, Dongda Zhang, and Ehecatl Antonio del Rio-Chanona. 2021. Constrained model-free reinforcement learning for process optimization. Computers & Chemical Engineering 154 (2021), 107462.
[19]
Santiago Paternain, Miguel Calvo-Fullana, Luiz FO Chamon, and Alejandro Ribeiro. 2019. Learning safe policies via primal-dual methods. In 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 6491–6497.
[20]
Baiyu Peng, Jingliang Duan, Jianyu Chen, Shengbo Eben Li, Genjin Xie, Congsheng Zhang, Yang Guan, Yao Mu, and Enxin Sun. 2022. Model-Based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian. IEEE Transactions on Neural Networks and Learning Systems (2022).
[21]
Baiyu Peng, Yao Mu, Yang Guan, Shengbo Eben Li, Yuming Yin, and Jianyu Chen. 2021. Model-based actor-critic with chance constraint for stochastic system. In 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 4694–4700.
[22]
Marc Platini, Thomas Ropars, Benoit Pelletier, and Noel De Palma. 2018. CPU overheating characterization in HPC systems: a case study. In 2018 IEEE/ACM 8th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS). IEEE, 59–68.
[23]
Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 205–218. https://www.usenix.org/conference/atc20/presentation/shahrad
[24]
Eliezer Shlifer and YJTS Vardi. 1975. An airline overbooking policy. Transportation Science 9, 2 (1975), 101–114.
[25]
Jennie Si, Andrew G Barto, Warren B Powell, and Don Wunsch. 2004. Handbook of learning and approximate dynamic programming. Vol. 2. John Wiley & Sons.
[26]
Michael Soltys and Katharine Soltys. 2020. WordPress on AWS: a Communication Framework. CoRR abs/2007.01823 (2020). arXiv:2007.01823https://arxiv.org/abs/2007.01823
[27]
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech M. Czarnecki, Vinícius Flores Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. 2018. Value-Decomposition Networks For Cooperative Multi-Agent Learning. ArXiv abs/1706.05296 (2018).

Index Terms

  1. Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '23: Proceedings of the ACM Web Conference 2023
      April 2023
      4293 pages
      ISBN:9781450394161
      DOI:10.1145/3543507
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 April 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Cloud Computing
      2. Multi-Agent System
      3. Over Subscription
      4. Reinforcement Learning

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      WWW '23
      Sponsor:
      WWW '23: The ACM Web Conference 2023
      April 30 - May 4, 2023
      TX, Austin, USA

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 245
        Total Downloads
      • Downloads (Last 12 months)126
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media