With the increasing popularity of Cloud computing systems, the demand for highly dependable Cloud applications has increased significantly. For this, reliability and availability of Cloud applications are two prominent issues for both the providers and the users of Cloud. However, ensuring these two properties in Cloud applications is often very difficult. This is especially because of the characteristics of the Cloud computing paradigm, which is a combination of hardware and software components in a dynamic setting. In spite of the challenges, it is often a key objective to ensure reliability and availability of such applications to guarantee the expected quality of service (QoS). Many methods, strategies and approaches have been proposed in the existing literature; however, as far as we have investigated, these works do not provide a global solution that could provide reliability, availability and high margin of QoS at the same time (for such systems). In this paper, we propose a novel formal framework for constructing reliable and available Cloud components using the DRB (distributed recovery block) scheme. The aim is to provide a strategy that can enhance Cloud dependability through the uniform treatment of software and hardware faults by constructing fault-masking nodes. A fault-masking node is suitable for handling (i.e., detection and tolerance of faults) software, hardware, and response time faults using both the acceptance test and try blocks to ensure safety and liveness properties at the same time.
Similar content being viewed by others
Fernando N, Loke SW, Rahayu W (2013) Mobile Cloud computing: a survey. Fut Gen Comput Syst 29(1):84–106. https://doi.org/10.1016/j.future.2012.05.023
Cox PA (2011) Mobile Cloud computing: devices, trends, issues and the enabling technologies, Retreived from: https://www.ibm.com/developerworks/cloud/library/clmobilecloudcomputing /index.html, March 11, 2011
Ahmed A, Ahmed E (2016) A survey on mobile edge computing. In: 10th IEEE International Conference on Intelligent Systems and Control, (ISCO), Coimbatore, India. https://doi.org/10.1109/ISCO.2016.7727082
Ahmed E, Gani A, Khan M-K, Bayya R, Khan SU (2015) Seamless application execution in mobile cloud computing: motivation, taxonomy, and open challenges. J Net Comp Appl 52(1):154–172. https://doi.org/10.1016/j.jnca.2015.03.001
Alofe OM, Fatema K (2020) Trustworthy cloud computing. In: Lynn T, Mooney JG, van der Werff L, Fox G (eds) Data privacy and trust in cloud computing. Palgrave studies in digital business & enabling technologies. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-54660-1_7
Mell P, Grance T (2009) NIST definition of cloud computing Vol 15, Available: http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf
Moore SJ, Nugent CD, Zhang S et al (2020) IoT reliability: a review leading to 5 key research directions. CCF Trans Pervasive Comp Interact 2:147–163. https://doi.org/10.1007/s42486-020-00037-z
Shankland S (2008) Amazon suffers U.S outage on Friday, CNET, San Francisco, California Retreived from: https://www.cnet.com/news/amazon-suffers-u-s-outage-on-friday/, June 6, 2008
Aldowah H, Ul Rehman S, Umar I (2021) Trust in IoT systems: a vision on the current issues, challenges, and recommended solutions. In: Saeed F, Al-Hadhrami T, Mohammed F, Mohammed E (eds) Advances on smart and soft computing. Advances in intelligent systems and computing, vol 1188. Springer, Singapore. https://doi.org/10.1007/978-981-15-6048-4_29
Kobie N (2011) Microsoft cloud service office 365 falls over again, alphr, Retreived from: https://www.alphr.com/news/enterprise/369790/microsoft-cloud-service-office-365-falls-over-again/. September 9, 2011
Bilal K, Khalid O, Malik R, Usman M, Khan S (2016) Fault tolerance in the cloud. In: Encyclopedia on cloud computing, Wiley & Sons, Ltd, USA, pp 291–300, Available: https://my.ece.msstate.edu/faculty/skhan/pub/B_K_2015_BC_MB.pdf
Zheng Z, Zhan T, Lyu M, King T (2012) Component ranking for fault tolerant cloud applications. IEEE Trans Serv Comput 5(4):540–550. https://doi.org/10.1109/TSC.2011.42
Bagherzadeh L, Shahinzadeh H, Shayeghi H, Dejamkhooy A, Bayindir R, Iranpour M (2020) Integration of cloud computing and IoT (CloudIoT) in smart grids: benefits, challenges, and solutions. In: 2020 International Conference on Computational Intelligence for Smart Power System and Sustainable Energy (CISPSSE, July 2020), India, (pp 1–8). IEEE. https://doi.org/10.1109/CISPSSE49931.2020.9212195
Winterford B (2009) Stress tests rain on Amazon’s cloud, IT News, 20 August 2009, Available: http://www.itnews.com.au/News/153451,stress-tests-rain-on-amazons-cloud.aspx
Khomh F (2014) On improving the dependability of cloud applications with fault tolerance, in: WISCA 2014, Sydney, Australia, Available: https://dl.acm.org/doi/https://doi.org/10.1145/2578128.2578228
Laprie JC (1992) Dependability: basic concepts and terminology. In: Laprie JC (ed) Dependability: basic concepts and terminology. Dependable computing and fault-tolerant systems, vol 5. Springer, Vienna. https://doi.org/10.1145/2578128.2578228
Laprie JC (1995) Dependability—its attributes, impairments and means. In: Randell B, Laprie JC, Kopetz H, Littlewood B (eds) Predictably dependable computing systems. ESPRIT basic research series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79789-7_1
Stanisavljević M, Schmid A, Leblebici Y (2011) Reliability, faults, and fault tolerance. In: Stanisavljević M, Schmid A, Leblebici Y (eds) Reliability of nanoscale circuits and systems. Springer, New York, pp 7–18. https://doi.org/10.1007/978-1-4419-6217-1_2
Lamport L (1977) Proving the correctness of multiprocess programs. IEEE Trans Softw Eng 3(2):125–143. https://doi.org/10.1109/TSE.1977.229904
Alpern B, Schneider FB (1985) Defining liveness. Inf Proc Lett 21(4):181–185. https://doi.org/10.1016/0020-0190(85)90056-0
Avizienis A (1976) Fault-tolerant systems. IEEE Trans Comput 25(12):1304–1312. https://doi.org/10.1109/TC.1976.1674598
Mansouri H, Pathan A-SK (2021) A communication-induced checkpointing algorithm for consistent-transaction in distributed database systems. In: Thampi SM et al (eds) Eight International Symposium on Security in Computing and Communications (SSCC 2020), Oct 14–17, 2020, Chennai, India, Communications in Computer and Information Science (CCIS), vol 1364, Springer, pp 21–32
Randel B (1975) System structure for software fault tolerance. IEEE Trans Softw Eng 1(2):220–232. https://doi.org/10.1109/TSE.1975.6312842
Horning JJ, Lauer HC, Melliar-Smith PM, Randell B (1974) A program structure for error detection and recovery. In: Gelenbe E, Kaiser C (eds) Operating systems. OS 1974. Lecture notes in computer science, vol 16. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029359
Kim KH, Welch HO (1989) Distributed execution of recovery blocks: an approach for uniform treatment of hardware and software faults in real-time applications. IEEE Trans Comput 38(5):626–636. https://doi.org/10.1109/12.24266
Shahid MA, Islam N, Alam MM, Mazliham MS, Musa S (2021) Towards resilient method: an exhaustive survey of fault tolerance methods in the cloud computing environment. Comput Sci Rev 40:100398. https://doi.org/10.1016/j.cosrev.2021.100398
Basu A, Bozga M, Sifakis J (2006) Modeling heterogeneous real-time components in BIP. In: IEEE International Conference on Software Engineering and Formal Methods (SEFM), pp 3–12. https://doi.org/10.1109/SEFM.2006.27
Bozga M, Sfyrla V, Sifakis J (2009) Modeling synchronous systems in BIP. In: ACM International Conference on Embedded Software (EMSOFT), pp 77–86. https://doi.org/10.1145/1629335.1629347
Basu A, Mounier L, Poulhies M, Pulou J, Sifakis J (2007) Using BIP for modeling and verification of networked systems–a case study on TinyOS-based networks. In: IEEE International Symposium on Network Computing and Applications, (NCA), Cambridge, MA, pp 257–260. https://doi.org/10.1109/NCA.2007.52
UPPAAL (2021) available at: https://uppaal.org/ (last accessed: 20 August, 2021)
Behrmann G, David A, Larsen KG (2004) A tutorial on UPPAAL. In: Bernardo M, Corradini F (eds) Formal methods for the design of real-time systems. SFM-RT 2004. Lecture notes in computer science, vol 3185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30080-9_7
Ganesh A, Sandhya M, Shankar S (2014) A study of fault tolerance methods in cloud computing. In: IEEE International Advance Computing Conference (IACC), India, pp 844–849. https://doi.org/10.1109/IAdCC.2014.6779432
Zhang Y, Zheng Z, Lyn MR (2011) BFT Cloud, A byzantine fault tolerance framework for voluntary-resource cloud computing. In: IEEE International Conference on Cloud Computing, Washington, US, pp 444–451. https://doi.org/10.1109/CLOUD.2011.16
Jia Z, Chen R, Xing X, Xu J, Xie Y (2014) SFDCloud: top-k service faults diagnosis in cloud computing. Autom Softw Eng 21(4):461–488. https://doi.org/10.1007/s10515-013-0137-8
Choi SK, Chung K, Yu H (2014) Fault tolerance and QoS scheduling using CAN in mobile social cloud computing. Clus Comp 17(1):911–926. https://doi.org/10.1007/s10586-013-0286-3
Jing D, Scott H, Yunghsiang H, Julia D (2010) Fault-tolerant and reliable computation in cloud computing. In: Globecom, Miami, FL, USA, pp 1601–1605. https://doi.org/10.1109/GLOCOMW.2010.5700210
Sun D, Chang G, Miao C, Wang X (2013) Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments. J Supercomp 66(1):193–228. https://doi.org/10.1007/s11227-013-0898-7
Yi H, Bin G, Fengyu W (2010) Cloud model-based security-aware and fault tolerant job scheduling for computing grid. In: ChinaGrid, Guangzhou, China, 25–30. https://doi.org/10.1109/ChinaGrid.2010.36
Malik S, Huet F (2011) Adaptive fault tolerance in real-time cloud computing In: IEEE World Congress on Services, Washington, USA, pp 280–287. https://doi.org/10.1109/SERVICES.2011.108
Wu Y, Yuan Y, Yang G, Zheng W (2010) An adaptive task-level fault tolerant approach to Grid. J SuperComp 51(1):97–114. https://doi.org/10.1007/s11227-009-0276-7
Lim JB, Gip JM, Chung KS, Kang J, Lee D, Yu H (2014) Gossip membership management with social graphs for byzantine fault tolerance in clouds. In: Hsu CH, Shi X, Salapura V (eds) Network and parallel computing. NPC. Lecture notes in computer science, Springer, Berlin, Heidelberg, vol 8707. https://doi.org/10.1007/978-3-662-44917-2_27
Liu J, Zhou J, Buyya R (2015) Software rejuvenation based fault tolerance scheme for cloud applications. In: IEEE International Conference on Cloud Computing, New York, USA, pp 1115–1118. https://doi.org/10.1109/CLOUD.2015.164
Garg A, Bagga S (2015) An autonomic approach for fault tolerance using scaling, replication and monitoring in cloud computing. In: International Conference on MOOCs, Innovation and Technology in Education, (MITE), Amritsar, India, pp 129–134. https://doi.org/10.1109/MITE.2015.7375302
Garraghan P, Townend P, Xu J, Yang X, Sui P (2013) Using byzantine fault-tolerance to improve dependability in federated cloud computing. Inter J Soft Inform 7(2):221–237
Alannsary M, Tian J (2016) Measurement and prediction of SaaS reliability in the cloud. In: IEEE International Conference on Software Quality, Reliability and Security Companion, (QRS-C), Vienna, Austria, pp 123–130. https://doi.org/10.1109/QRS-C.2016.20
Mohamed B, Kiran M, Awan I-U, Maiyama KM (2016) Optimizing fault tolerance in real-time cloud computing IaaS environment. In: IEEE International Conference on Future Internet of Things and Cloud, (FiCloud), Vienna, Austria, pp 363–370. https://doi.org/10.1109/FiCloud.2016.58
Reddy CM, Nalini N (2014) FT2R2Cloud: fault tolerance using timeout and retransmission of requests for cloud applications. In: International Conference on Advances in Electronics, Computers and Communications (ICAECC), Bangalore, India. https://doi.org/10.1109/ICAECC.2014.7002396
Zheng Z, Lyu MR (2015) Selecting an optimal fault tolerance strategy for reliable service-oriented systems with local and global constraints. IEEE Trans Comput 64(1):219–232. https://doi.org/10.1109/TC.2013.189
Chen G, Chen G, Jin H, Zou D, Zhou BB (2015) A lightweight software fault tolerance system in the cloud environment. Concur Comp: Pract Exper 27(12):2982–2998. https://doi.org/10.1002/cpe.3190
Moghtadaeipour A, Tavoli R (2015) A new approach to improve load balancing for increasing fault tolerance and decreasing energy consumption in cloud computing. In: 2nd International Conference on Knowledge-based Engineering and Innovation (KBEI), Tehran, Iran. https://doi.org/10.1109/KBEI.2015.7436178
Andrade E, Nogueira B (2020) Dependability evaluation of a disaster recovery solution for IoT infrastructures. J Supercomput 76:1828–1849. https://doi.org/10.1007/s11227-018-2290-0
Kousalya A, Sakthidasan K, Latha A (2021) Reliable service availability and access control method for cloud assisted IOT communications. Wireless Netw 27:881–892. https://doi.org/10.1007/s11276-019-02184-3
Elrotub M, Bali A, Gherbi A (2021) Sharing VM resources with using prediction of future user requests for an efficient load balancing in cloud computing environment. Int J Softw Sci Comput Intell 13(2):37–64. https://doi.org/10.4018/IJSSCI.2021040103
Abadi M, Lamport L (1991) The existence of refinement mappings. Theo Comp Sci 82(2):253–284. https://doi.org/10.1016/0304-3975(91)90224-P
Kim KH (1995) The distributed recovery block scheme. In: Lyu MR (ed) Software fault tolerance. John Wiley & Sons Ltd, New York
Humphrey JLM, Cheah YW, Ryu Y (2010) Fault tolerance and scaling in e-Science cloud applications: observations from the continuing development of MODISAzure. In: IEEE International Conference on e-Science, Brisbane, QLD, Australia, pp 1–8. https://doi.org/10.1109/eScience.2010.47
Sha W, Zhu Y, Chen M, Huang T (2018) Statistical learning for anomaly detection in cloud server systems: A multi-order Markov chain framework. IEEE Trans Cloud Comput 6(2):401–413. https://doi.org/10.1109/TCC.2015.2415813
Aharwir M-K, Ahirwar M-K, Chourasia U (2014) Anomaly detection in the services provided by multi cloud architectures; a survey. Int J Res Eng Technol 3(9):196–200
Sari A (2015) A review of anomaly detection systems in cloud network and survey of cloud security measures in cloud storage applications. J Inf Sec 6(2):142–154. https://doi.org/10.4236/jis.2015.62015
Osman IM, Elshoush HT (2011) Alert correlation in collaborative intelligent intrusion detection systems- a survey. Appl Soft Comp J 11(7):4349–4365. https://doi.org/10.1016/j.asoc.2010.12.004
Wu SX, Banzhaf W (2010) The use of computational intelligence in intrusion detection systems: a review. Appl Soft Comp J 10(1):1–35. https://doi.org/10.1016/j.asoc.2009.06.019
Pathan A-SK (2014) The state of the art in intrusion prevention and detection. CRC Press, Taylor & Francis Group, USA, ISBN 9781482203516
Smara M, Aliouat M, Pathan A-SK, Aliouat Z (2017) Acceptance test for fault detection in component-based cloud computing and systems. Future Gener Comput Syst 70:74–93
Gupta I, Chandra TD, Goldszmidt GS (2001) On scalable and efficient distributed failure detectors. In: ACM Symposium on Principles of Distributed Computing, pp 170–179. https://doi.org/10.1145/383962.384010
Ganesh A, Sandhya M, Shankar S (2014) A study of fault tolerance methods in cloud computing. In: IEEE International Advance Computing Conference, (IACC), India, pp 844–849. https://doi.org/10.1109/IAdCC.2014.6779432
Chamoli SK, Rana DS (2015) Fault tolerance and load balancing algorithm in cloud computing: a survey. Int J Adv Res Comput Commun Eng 4(7):92–96
Amin Z, Sethi N, Singh H (2015) Review on fault tolerance techniques in cloud computing. Int J Comput Appl 116(18):11–17
Hosseini SM, Arani MG (2015) Fault tolerance techniques in cloud storage: a survey. Int J Datab Theo Appl 8(4):183–190. https://doi.org/10.14257/ijdta.2015.8.4.19
Basu A, Mounier L, Poulhies M, Pulou J, Sifakis J (2007) Using BIP for modeling and verification of networked systems–a case study on TinyOS-based networks. In: IEEE International Symposium on Network Computing and Applications, NCA, Cambridge, MA, USA, pp 257–260. https://doi.org/10.1109/NCA.2007.52
de Silva L, Yan R, Ingrand F, Rachid A, Bensalem S (2015) A verifiable and correct-by-construction controller for robots in human environments. In: ACM/IEEE International Conference on Human-Robot Interaction, (HRI ACM New York, NY, USA, pp 281- 281. Available in: https://homepages.laas.fr/felix/publis-pdf/drhe10.pdf
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Smara, M., Aliouat, M., Harous, S. et al. Robustness improvement of component-based cloud computing systems. J Supercomput 78, 4977–5009 (2022). https://doi.org/10.1007/s11227-021-04054-2
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04054-2