[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3534678.3539063acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

A Meta Reinforcement Learning Approach for Predictive Autoscaling in the Cloud

Published: 14 August 2022 Publication History

Abstract

Predictive autoscaling (autoscaling with workload forecasting) is an important mechanism that supports autonomous adjustment of computing resources in accordance with fluctuating workload demands in the Cloud. In recent works, Reinforcement Learning (RL) has been introduced as a promising approach to learn the resource management policies to guide the scaling actions under the dynamic and uncertain cloud environment. However, RL methods face the following challenges in steering predictive autoscaling, such as lack of accuracy in decision-making, inefficient sampling and significant variability in workload patterns that may cause policies to fail at test time. To this end, we propose an end-to-end predictive meta model-based RL algorithm, aiming to optimally allocate resource to maintain a stable CPU utilization level, which incorporates a specially-designed deep periodic workload prediction model as the input and embeds the Neural Process [11, 16] to guide the learning of the optimal scaling actions over numerous application services in the Cloud. Our algorithm not only ensures the predictability and accuracy of the scaling strategy, but also enables the scaling decisions to adapt to the changing workloads with high sample efficiency. Our method has achieved significant performance improvement compared to the existing algorithms and has been deployed online at Alipay, supporting the autoscaling of applications for the world-leading payment platform.

Supplemental Material

MP4 File
Predictive autoscaling is an important mechanism that supports autonomous adjustment of computing resources in accordance with workload demands in the Cloud. Recently, Reinforcement Learning (RL) has been introduced as a promising approach to guide the scaling actions under the dynamic cloud environment. To tackle the challenges that RL methods face, we propose an end-to-end predictive meta model-based RL algorithm, aiming to optimally allocate resource to maintain a stable CPU utilization level, which incorporates a specially-designed deep periodic workload prediction model as the input and embeds the Neural Process to guide the learning of the optimal scaling actions over numerous application services in the Cloud. Our algorithm ensures the predictability and accuracy of the scaling strategy and enables the scaling decisions to adapt to the changing workloads with high sample efficiency. Our method has achieved significant performance improvement over baselines and has been deployed at Alipay.

References

[1]
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX symposium on operating systems design and implementation (OSDI). 265--283.
[2]
Muhammad Abdullah, Waheed Iqbal, Josep Lluis Berral, Jorda Polo, and David Carrera. 2020. Burst-Aware Predictive Autoscaling for Containerized Microservices. IEEE Transactions on Services Computing (2020), 1--1. https://doi.org/10.1109/TSC.2020.2995937
[3]
Muhammad Abdullah, Waheed Iqbal, Abdelkarim Erradi, and Faisal Bukhari. 2019. Learning Predictive Autoscaling Policies for Cloud-Hosted Microservices Using Trace-Driven Modeling. In 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). 119--126. https://doi.org/10.1109/CloudCom.2019.00028
[4]
Giovanni Acampora, Mario Luca Bernardi, Marta Cimitile, Genoveffa Tortora, and Autilia Vitiello. 2017. A fuzzy-based autoscaling approach for process centered cloud systems. In 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). 1--8. https://doi.org/10.1109/FUZZ-IEEE.2017.8015768
[5]
Amazon. 2020. AWS auto scaling documentation. https://docs. aws.amazon.com/autoscaling/index.html
[6]
Hamid Arabnejad, Pooyan Jamshidi, Giovani Estrada, Nabil El Ioini, and Claus Pahl. 2016. An Auto-Scaling Cloud Controller Using Fuzzy Q-Learning - Implementation in OpenStack. In ESOCC.
[7]
J. V. Bibal Benifa and D. Dejey. 2019. RLPAS: Reinforcement Learning-Based Proactive Auto-Scaler for Resource Provisioning in Cloud Environment. Mob. Netw. Appl., Vol. 24, 4 (2019), 1348--1363.
[8]
Mingxi Cheng, Ji Li, and Shahin Nazarian. 2018. DRL-Cloud: Deep Reinforcement Learning-Based Resource Provisioning and Task Scheduling for Cloud Service Providers. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASPDAC '18). 129--134.
[9]
Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. 2018. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems (NeurIPS).
[10]
Chelsea Finn, Kelvin Xu, and Sergey Levine. 2018. Probabilistic model-agnostic meta-learning. In Advances in neural information processing systems (NeurIPS).
[11]
Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, and S. M. Ali Eslami. 2018. Conditional Neural Processes. In International Conference on Machine Learning (ICML).
[12]
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2020. Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations (ICLR).
[13]
Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, and Tom Erez. 2015. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems (NIPS).
[14]
Pooyan Jamshidi, Amir Sharifloo, Claus Pahl, Hamid Arabnejad, Andreas Metzger, and Giovani Estrada. 2016. Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Architectures. In 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA).
[15]
Arijit Khan, Xifeng Yan, Shu Tao, and Nikos Anerousis. 2012. Workload characterization and prediction in the cloud: A multiple time series approach. In IEEE Network Operations and Management Symposium. 1287--1294.
[16]
Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, and Yee Whye Teh. 2019. Attentive Neural Processes. In International Conference on Learning Representations (ICLR).
[17]
Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. https://arxiv.org/pdf/1412.6980.pdf
[18]
Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. In International Conference on Learning Representations (ICLR).
[19]
Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting. In Advances in Neural Information Processing Systems (NeurIPS).
[20]
Duc-Hung Luong, Huu-Trung Thieu, Abdelkader Outtagarts, and Yacine Ghamri-Doudane. 2018. Predictive Autoscaling Orchestration for Cloud-native Telecom Microservices. In IEEE 5G World Forum (5GWF). 153--158.
[21]
Microsoft. 2020. Azure auto scaling documentation. https://azure.microsoft. com/en-us/features/autoscale/
[22]
Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2020. FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices.
[23]
David Salinas, Valentin Flunkert, and Jan Gasthaus. 2019. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. In Advances in Neural Information Processing Systems.
[24]
Ashraf A Shahin. 2016. Automatic Cloud Resource Scaling Algorithm based on Long Short-Term Memory Recurrent Neural Network. International Journal of Advanced Computer Science and Applications, Vol. 7, 12 (2016).
[25]
Gautam Singh, Jaesik Yoon, Youngsung Son, and Sungjin Ahn. 2019. Sequential Neural Processes. In Advances in Neural Information Processing Systems (NeurIPS).
[26]
Richard S Sutton, Andrew G Barto, et al. 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.
[27]
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence.
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems (NIPS). 5998--6008.
[29]
Shubo Zhang, Tianyang Wu, Maolin Pan, Chaomeng Zhang, and Yang Yu. 2020. A-SARSA: A Predictive Container Auto-Scaling Algorithm Based on Reinforcement Learning. In IEEE International Conference on Web Services (ICWS).
[30]
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, Vol. 35. 11106--11115.

Cited By

View all
  • (2024)Impact of Autoscaling on Application Performance in Cloud EnvironmentsInternational Journal of Innovative Science and Research Technology (IJISRT)10.38124/ijisrt/IJISRT24OCT092(1-5)Online publication date: 9-Oct-2024
  • (2024)A Time Series-Based Approach to Elastic Kubernetes ScalingElectronics10.3390/electronics1302028513:2(285)Online publication date: 8-Jan-2024
  • (2024)OptScaler: A Collaborative Framework for Robust Autoscaling in the CloudProceedings of the VLDB Endowment10.14778/3685800.368582917:12(4090-4103)Online publication date: 8-Nov-2024
  • Show More Cited By

Index Terms

  1. A Meta Reinforcement Learning Approach for Predictive Autoscaling in the Cloud

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2022
      5033 pages
      ISBN:9781450393850
      DOI:10.1145/3534678
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 August 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. autoscaling
      2. reinforcement learning

      Qualifiers

      • Research-article

      Conference

      KDD '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)165
      • Downloads (Last 6 weeks)18
      Reflects downloads up to 15 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Impact of Autoscaling on Application Performance in Cloud EnvironmentsInternational Journal of Innovative Science and Research Technology (IJISRT)10.38124/ijisrt/IJISRT24OCT092(1-5)Online publication date: 9-Oct-2024
      • (2024)A Time Series-Based Approach to Elastic Kubernetes ScalingElectronics10.3390/electronics1302028513:2(285)Online publication date: 8-Jan-2024
      • (2024)OptScaler: A Collaborative Framework for Robust Autoscaling in the CloudProceedings of the VLDB Endowment10.14778/3685800.368582917:12(4090-4103)Online publication date: 8-Nov-2024
      • (2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
      • (2024)Flux: Decoupled Auto-Scaling for Heterogeneous Query Workload in Alibaba AnalyticDBCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653381(255-268)Online publication date: 9-Jun-2024
      • (2024)GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge CloudsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.336653335:4(646-662)Online publication date: 19-Feb-2024
      • (2024)Understanding and Improving Change Risk Detection in Practice2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00079(717-727)Online publication date: 12-Mar-2024
      • (2024)Online Policy Adaptation for Networked Systems using RolloutNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575707(1-9)Online publication date: 6-May-2024
      • (2024)DCScaler: Spatiotemporal Prediction Aided Distributed Collaborative Autoscaling of Microservices2024 IEEE 10th International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/EdgeCom62867.2024.00015(42-47)Online publication date: 28-Jun-2024
      • (2024)Future Workload and Cloud Resource Usage: Insights from an Interpretable Forecasting Model2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825137(2283-2287)Online publication date: 15-Dec-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media