[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3603166.3632139acmconferencesArticle/Chapter ViewAbstractPublication PagesuccConference Proceedingsconference-collections
research-article

Optimal Deployment of Cloud-native Applications with Fault-Tolerance and Time-Critical End-to-End Constraints

Published: 04 April 2024 Publication History

Abstract

Cloud environments are becoming increasingly interesting to host time-critical use cases with far more stringent latency requirements than conventional cloud-native applications, such as smart industrial control systems or cloud-enabled autonomous vehicles. In these emerging domains, fault tolerance mechanisms play a critical role, due to the catastrophic consequences a fault might lead to, in the real world. This work presents a formal model for designing and deploying time-critical, cloud-native applications under fault conditions. Our model considers the interactions and interferences among service components and the possible occurrence of faults. We present an optimization framework to solve the deployment problem of minimizing the resources needed to achieve fault-tolerance under precise end-to-end deadline constraints. The ability of the optimizer to deliver precise temporal and fault-tolerance guarantees is validated through extensive simulations.

References

[1]
Luca Abeni, Remo Andreoli, Harald Gustafsson, Raquel Mini, and Tommaso Cucinotta. 2023. Fault Tolerance in Real-Time Cloud Computing. In 2023 IEEE 26th International Symposium on Real-Time Distributed Computing. IEEE, 170--175.
[2]
Luca Abeni, Tommaso Cucinotta, Balázs Pinczel, Péter Mátray, Murali Krishna Srinivasan, and Tobias Lindquist. 2022. On the Use of Linux Real-Time Features for RAN Packet Processing in Cloud Environments. In High Performance Computing. ISC High Performance 2022 International Workshops, Hartwig Anzt, Amanda Bienz, Piotr Luszczek, and Marc Baboulin (Eds.). Springer International Publishing, Cham, 371--382.
[3]
Zulfiqar Ahmad, Babar Nazir, and Asif Umer. 2021. A fault-tolerant workflow management system with QoS-aware scheduling for scientific workflows in cloud computing. International Journal of Communication Systems 34, 1 (2021), e4649.
[4]
Fredrik Alriksson, Lisa Boström, Joachim Sachs, Y-P Eric Wang, and Ali Zaidi. 2020. Critical IoT connectivity Ideal for Time-Critical Communications. Ericsson technology review 2020, 6 (2020), 2--13.
[5]
Remo Andreoli, Harald Gustafsson, Luca Abeni, Raquel Mini, and Tommaso Cucinotta. 2023. Design-time Analysis of Time-Critical and Fault-Tolerance Constraints in Cloud Services. In 2023 IEEE 16th International Conference on Cloud Computing (CLOUD). IEEE, 415--417.
[6]
Stamatia Bibi, Dimitrios Katsaros, and Panayiotis Bozanis. 2010. Application Development: Fly to the clouds or stay in-house?. In 19th Intern. Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises. IEEE, 60--65.
[7]
Giorgio Buttazzo, Enrico Bini, and Yifan Wu. 2011. Partitioning Real-Time Applications Over Multicore Reservations. IEEE Transactions on Industrial Informatics 7, 2 (2011), 302--315.
[8]
Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose, and Rajkumar Buyya. 2011. CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience 41, 1 (2011), 23--50.
[9]
Mehdi Nazari Cheraghlou, Ahmad Khadem-Zadeh, and Majid Haghparast. 2016. A survey of fault tolerance architecture in cloud computing. Journal of Network and Computer Applications 61 (2016), 81--92.
[10]
Tommaso Cucinotta, Luca Abeni, Mauro Marinoni, Riccardo Mancini, and Carlo Vitucci. 2021. Strong Temporal Isolation among Containers in OpenStack for NFV Services. IEEE Transactions on Cloud Computing (2021), 1--1.
[11]
Tommaso Cucinotta, Luigi Pannocchi, Filippo Galli, Silvia Fichera, Sourav Lahiri, and Antonino Artale. 2022. Optimum VM Placement for NFV Infrastructures. In 2022 IEEE International Conference on Cloud Engineering (IC2E). IEEE.
[12]
Yaozu Dong, Xiaowei Yang, Jianhui Li, Guangdeng Liao, Kun Tian, and Haibing Guan. 2012. High performance network virtualization with SR-IOV. J. Parallel and Distrib. Comput. 72, 11 (2012), 1471--1480.
[13]
Qiang Duan, Shangguang Wang, and Nirwan Ansari. 2020. Convergence of Networking and Cloud/Edge Computing: Status, Challenges, and Opportunities. IEEE Network 34, 6 (2020), 148--155.
[14]
Mostafa Elhemali, Niall Gallagher, Bin Tang, Nick Gordon, Hao Huang, Haibo Chen, Joseph Idziorek, Mengtian Wang, Richard Krog, Zongpeng Zhu, et al. 2022. Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service. In USENIX Annual Technical Conference. 1037--1048.
[15]
Fay Chang et al. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions Computing Systems 26, 2, Article 4 (jun 2008), 26 pages.
[16]
Norman Finn. 2018. Introduction to time-sensitive networking. IEEE Communications Standards Magazine 2, 2 (2018), 22--28.
[17]
Dennis Gannon, Roger Barga, and Neel Sundaresan. 2017. Cloud-native applications. IEEE Cloud Computing 4, 5 (2017), 16--21.
[18]
Pengze Guo, Ming Liu, Jun Wu, Zhi Xue, and Xiangjian He. 2018. Energy-efficient fault-tolerant scheduling algorithm for real-time tasks in cloud-based 5G networks. IEEE Access 6 (2018), 53671--53683.
[19]
Moin Hasan and Major Singh Goraya. 2018. Fault tolerance in cloud computing environment: A systematic survey. Computers in Industry 99 (2018), 156--172.
[20]
Chesta Kathpal and Ritu Garg. 2019. Survey on fault-tolerance-aware scheduling in cloud computing. In Information and Communication Technology for Competitive Strategies: Proc. of Third International Conf. on ICTCS. Springer, 275--283.
[21]
Zengpeng Li, Huiqun Yu, Guisheng Fan, and Jiayin Zhang. 2023. Cost-efficient Fault-tolerant Workflow Scheduling for Deadline-constrained Microservice-based Applications in Clouds. IEEE Trans. on Network and Service Management (2023).
[22]
I. Mahadevan and K.M. Sivalingam. 1999. Quality of Service architectures for wireless networks: IntServ and DiffServ models. In Proceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks. 420--425.
[23]
Sheheryar Malik and Fabrice Huet. 2011. Adaptive Fault Tolerance in Real Time Cloud Computing. In 2011 IEEE World Congress on Services. 280--287.
[24]
Toni Mastelic, Ariel Oleksiak, Holger Claussen, Ivona Brandic, Jean-Marc Pierson, and Athanasios V. Vasilakos. 2014. Cloud Computing: Survey on Energy Efficiency. ACM Comput. Surv. 47, 2, Article 33 (dec 2014), 36 pages.
[25]
Bruno Ordozgoiti, Alberto Mozo, Sandra Gómez Canaval, Udi Margolin, Elisha Rosensweig, and Itai Segall. 2017. Deep convolutional neural networks for detecting noisy neighbours in cloud infrastructure. COSTAC 2017 (2017), 59.
[26]
Peter O'Donovan, Colm Gallagher, Kevin Leahy, and Dominic T.J. O'Sullivan. 2019. A comparison of fog and cloud computing cyber-physical interfaces for Industry 4.0 real-time embedded machine learning engineering applications. Computers in Industry 110 (2019), 12--35.
[27]
Aleksi Peltonen, Ralf Sasse, and David Basin. 2021. A Comprehensive Formal Analysis of 5G Handover. In Proceedings of the 14th ACM Conference on Security and Privacy in Wireless and Mobile Networks (Abu Dhabi, United Arab Emirates) (WiSec '21). Association for Computing Machinery, New York, NY, USA, 1--12.
[28]
Vicent Selfa, Julio Sahuquillo, Lieven Eeckhout, Salvador Petit, and María E. Gómez. 2017. Application Clustering Policies to Address System Fairness with Intel's Cache Allocation Technology. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 194--205.
[29]
Roozbeh Siyadatzadeh, Fatemeh Mehrafrooz, Mohsen Ansari, Bardia Safaei, Muhammad Shafique, Jörg Henkel, and Alireza Ejlali. 2023. ReLIEF: A Reinforcement Learning-Based Real-Time Task Assignment Strategy in Emerging Fault-Tolerant Fog Computing. IEEE Internet of Things Journal (2023), 1--1.
[30]
Václav Struhár, Silviu S. Craciunas, Mohammad Ashjaei, Moris Behnam, and Alessandro V. Papadopoulos. 2021. REACT: Enabling Real-Time Container Orchestration. In 26th IEEE Int. Conf. on Emerging Techn. and Factory Automation.
[31]
Márk Szalay, Péter Mátray, and László Toka. 2021. Real-time task scheduling in a FaaS cloud. In IEEE 14th International Conference on Cloud Computing. 497--507.
[32]
Erwin van Eyk, Lucian Toader, Sacheendra Talluri, Laurens Versluis, Alexandru Uţă, and Alexandru Iosup. 2018. Serverless is More: From PaaS to Present Cloud Computing. IEEE Internet Computing 22, 5 (2018), 8--17.
[33]
Sisu Xi, Justin Wilson, Chenyang Lu, and Christopher Gill. 2011. RT-Xen: Towards real-time hypervisor scheduling in Xen. In Proceedings of the ninth ACM international conference on Embedded software. 39--48.
[34]
Xipeng Xiao, Alan Hannan, Brook Bailey, and Lionel M Ni. 2000. Traffic Engineering with MPLS in the Internet. IEEE network 14, 2 (2000), 28--33.
[35]
Guangshun Yao, Qian Ren, Xiaoping Li, Shenghui Zhao, and Rubén Ruiz. 2022. A Hybrid Fault-Tolerant Scheduling for Deadline-Constrained Tasks in Cloud Systems. IEEE Trans. on Services Computing 15, 3 (2022), 1371--1384.

Index Terms

  1. Optimal Deployment of Cloud-native Applications with Fault-Tolerance and Time-Critical End-to-End Constraints

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UCC '23: Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing
    December 2023
    502 pages
    ISBN:9798400702341
    DOI:10.1145/3603166
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 April 2024

    Check for updates

    Author Tags

    1. time-critical cloud
    2. fault tolerance
    3. capacity planning
    4. resource management
    5. optimization

    Qualifiers

    • Research-article

    Conference

    UCC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 38 of 125 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 67
      Total Downloads
    • Downloads (Last 12 months)67
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media