[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Q-learning approach for the autoscaling of scientific workflows in the Cloud

Published: 01 February 2022 Publication History

Abstract

Autoscaling strategies aim to exploit the elasticity, resource heterogeneity and varied prices options of a Cloud infrastructure to improve efficiency in the execution of resource-hungry applications such as scientific workflows. Scientific workflows represent a special type of Cloud application with task dependencies, high-performance computational requirements and fluctuating workloads. Hence, the amount and type of resources needed during workflow execution changes dynamically over time. The well-known autoscaling problem comprises (i) scaling decisions, for adjusting the computing capacity of a virtualized infrastructure to meet the current demand of the application and (ii) task scheduling decisions, for assigning tasks to specific acquired Cloud resources for execution. Both are highly complex sub-problems, even more because of the uncertainty inherent to the Cloud. Reinforcement Learning (RL) provides a solid framework for decision-making problems in stochastic environments. Therefore, RL offers a promising perspective for designing Cloud autoscaling strategies based on an online learning process. In this work, we propose a novel formulation for the problem of infrastructure scaling in the Cloud as a Markov Decision Process, and we use the Q-learning algorithm for learning scaling policies, while demonstrating that considering the specific characteristics of workflow applications when taking autoscaling decisions can lead to more efficient workflow executions. Thus, our RL-based scaling strategy exploits the information available about workflow dependency structures. Simulations performed on four well-known workflows demonstrate significant gains (25%–55%) of our proposal in comparison with a similar state-of-the-art proposal.

Highlights

A new MDP formulation for the problem of Cloud infrastructure scaling for workflows. /item Learning scaling policies using Q-learning to reduce makespan and execution cost.
An in-depth evaluation of the proposed scaling strategy using 4 well-known workflows.
The inclusion of a state-of-the-art method as baseline for comparisons.
Simulation experiments demonstrate significant gains (25%–55%) of our proposal.

References

[1]
Monge D.A., Garí Y., Mateos C., García Garino C., Autoscaling scientific workflows on the cloud by combining on-demand and spot instances, J. Comput. Syst. Sci. Eng. 32 (4) (2017).
[2]
Silver D., Huang A., Maddison C.J., Guez A., Sifre L., Van Den Driessche G., Schrittwieser J., Antonoglou I., Panneershelvam V., Lanctot M., Dieleman S., Grewe D., Nham J., Kalchbrenner N., Sutskever I., Lillicrap T., Leach M., Kavukcuoglu K., Graepel T., Hassabis D., Mastering the game of Go with deep neural networks and tree search, Nature 529 (7587) (2016) 484–489.
[3]
Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G., Petersen S., Beattie C., Sadik A., Antonoglou I., King H., Kumaran D., Wierstra D., Legg S., Hassabis D., Human-level control through deep reinforcement learning, Nature 518 (2015) 529 EP –.
[4]
Garí Y., Monge D.A., Pacini E., Mateos C., García Garino C., Reinforcement learning-based application autoscaling in the cloud: A survey, Eng. Appl. Artif. Intell. 102 (2021).
[5]
Wei Y., Kudenko D., Liu S., Pan L., Wu L., Meng X., A reinforcement learning based auto-scaling approach for saas providers in dynamic cloud environment, Math. Probl. Eng. (2019) 1–11.
[6]
Bharathi S., Chervenak A., Deelman E., Mehta G., Su M.-H., Vahi K., Characterization of scientific workflows, in: 3rd Workshop on Workflows in Support of Large-Scale Science, 2008, pp. 1–10.
[7]
Schad J., Dittrich J., Quiané-Ruiz J.-A., Runtime measurements in the cloud: observing, analyzing, and reducing variance, Proc. VLDB Endow. 3 (1–2) (2010) 460–471.
[8]
Ericson J., Mohammadian M., Santana F., Analysis of performance variability in public cloud computing, in: 2017 IEEE International Conference on Information Reuse and Integration, 2017, pp. 308–314.
[9]
Monge D.A., Pacini E., Mateos C., Alba E., García Garino C., CMI: An online multi-objective genetic autoscaler for scientific and engineering workflows in cloud infrastructures with unreliable virtual machines, J. Netw. Comput. Appl. 149 (2020).
[10]
Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, A Bradford Book, USA, 2018.
[11]
Monge D.A., Holec M., Železnỳ F., García Garino C., Ensemble learning of runtime prediction models for gene-expression analysis workflows, Cluster Comput. 18 (4) (2015) 1317–1329.
[12]
Calheiros R.N., Ranjan R., Beloglazov A., De Rose C.A.F., Buyya R., CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Softw. - Pract. Exp. 41 (1) (2011) 23–50.
[13]
Garí Y., Monge D.A., Mateos C., García Garino C., Learning budget assignment policies for autoscaling scientific workflows in the cloud, Cluster Comput. 23 (2020) 87–105.
[14]
Barrett E., Howley E., Duggan J., A learning architecture for scheduling workflow applications in the cloud, in: 9th IEEE European Conference on Web Services, 2011, pp. 83–90.
[15]
Soualhia M., Khomh F., Tahar S., A dynamic and failure-aware Task Scheduling Framework for Hadoop, IEEE Trans. Cloud Comput. 8 (2) (2018) 1–16.
[16]
Mingxi Cheng J.L., Nazarian S., DRL-Cloud : Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers, in: 23rd Asia and South Pacific Design Automation Conference, 2018, pp. 129–134.
[17]
Dutreilh X., Kirgizov S., Using reinforcement learning for autonomic resource allocation in clouds: towards a fully automated workflow, in: 7th International Conference on Autonomic and Autonomous Systems, 2011, pp. 67–74.
[18]
Ghobaei-Arani M., Jabbehdari S., Pourmina M.A., An autonomic resource provisioning approach for service-based cloud applications: A hybrid approach, Future Gener. Comput. Syst. 78 (2018) 191–210.
[19]
Horovitz S., Arian Y., Efficient cloud auto-scaling with SLA Objective using Q-Learning, in: 6th International Conference on Future Internet of Things and Cloud, 2018, pp. 85–92.
[20]
Veni T., Saira Bhanu S.M., Auto-scale: automatic scaling of virtualised resources using neuro-fuzzy reinforcement learning approach, Int. J. Big Data Intell. 3 (3) (2016).
[21]
Bibal Benifa J.V., Dejey D., RLPAS: Reinforcement learning-based proactive auto-scaler for resource provisioning in cloud environment, Mob. Netw. Appl. (2018) 1–16.
[22]
Mohammad Reza Nouri S., Li H., Venugopal S., Guo W., He M., Tian W., Autonomic decentralized elasticity based on a reinforcement learning controller for cloud applications, Future Gener. Comput. Syst. 94 (2019) 765–780.
[23]
Tong Z., Chen H., Deng X., Li K., Li K., A scheduling scheme in the cloud computing environment using deep Q-learning, Inform. Sci. 512 (2020) 1170–1191.
[24]
Du B., Wu C., Huang Z., Learning resource allocation and pricing for cloud profit maximization, in: 2019 AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 7570–7577.
[25]
Arabnejad H., Pahl C., Jamshidi P., Estrada G., A comparison of reinforcement learning techniques for fuzzy cloud auto-scaling, in: 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2017, pp. 64–73.

Cited By

View all
  • (2023)An event‐driven fusion framework with auto‐scaling of edge intelligence for resilient smart applications in developing countriesTransactions on Emerging Telecommunications Technologies10.1002/ett.480434:8Online publication date: 3-Aug-2023
  • (2022)Adaptive parallel applications: from shared memory architectures to fog computing (2002–2022)Cluster Computing10.1007/s10586-022-03692-225:6(4439-4461)Online publication date: 1-Dec-2022

Index Terms

  1. A Q-learning approach for the autoscaling of scientific workflows in the Cloud
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Future Generation Computer Systems
          Future Generation Computer Systems  Volume 127, Issue C
          Feb 2022
          503 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 01 February 2022

          Author Tags

          1. Cloud Computing
          2. Autoscaling
          3. Workflow
          4. Reinforcement Learning

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 16 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)An event‐driven fusion framework with auto‐scaling of edge intelligence for resilient smart applications in developing countriesTransactions on Emerging Telecommunications Technologies10.1002/ett.480434:8Online publication date: 3-Aug-2023
          • (2022)Adaptive parallel applications: from shared memory architectures to fog computing (2002–2022)Cluster Computing10.1007/s10586-022-03692-225:6(4439-4461)Online publication date: 1-Dec-2022

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media