Abstract
Existing models for the analysis of concurrent processes tend to focus on fail-stop failures, where processes are either working or permanently stopped, and their state (working/stopped) is known. In fact, systems are often affected by grey failures: failures that are latent, possibly transient, and may affect the system in subtle ways that later lead to major issues (such as crashes, limited availability, overload). We introduce a model of actor-based systems with grey failures, based on two interlinked layers: an actor model, given as an asynchronous process calculus with discrete time, and a failure model that represents failure patterns to inject in the system. Our failure model captures not only fail-stop node and link failures, but also grey failures (e.g., partial, transient). We give a behavioural equivalence relation based on weak barbed bisimulation to compare systems on the basis of their ability to recover from failures, and on this basis we define some desirable properties of reliable systems. By doing so, we reduce the problem of checking reliability properties of systems to the problem of checking bisimulation.
This work has been partially supported by EPSRC project EP/T014512/1 (STARDUST) and the BehAPI project funded by the EU H2020 RISE under the Marie Sklodowska-Curie action (No: 778233).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
As Q(u) where \(Q(0) = \mathop {?}\{p_i.P_i\}_{i\in I} {\,\texttt {after}\,P} \) and \(Q(i+1) = \mathop {?}\{p_i.P_i\}_{i\in I} {\,\texttt {after}\,Q(i)} \).
References
Aceto, L., Cimini, M., Ingolfsdottir, A., Reynisson, A.H., Sigurdarson, S.H., Sirjani, M.: Modelling and simulation of asynchronous real-time systems using timed Rebeca. EPTCS 58, 1–19 (2011). https://doi.org/10.4204/eptcs.58.1
Adameit, M., Peters, K., Nestmann, U.: Session types for link failures. In: Bouajjani, A., Silva, A. (eds.) FORTE 2017. LNCS, vol. 10321, pp. 1–16. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60225-7_1
Amadio, R.M.: An asynchronous model of locality, failure, and process mobility. In: Garlan, D., Le Métayer, D. (eds.) COORDINATION 1997. LNCS, vol. 1282, pp. 374–391. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63383-9_92
Amadio, R.M., Castellani, I., Sangiorgi, D.: On bisimulations for the asynchronous pi-calculus. Theor. Comput. Sci. 195(2), 291–324 (1998). https://doi.org/10.1016/S0304-3975(97)00223-5
Basu, S., Bultan, T., Ouederni, M.: Deciding choreography realizability. Proc. ACM Program. Lang. 47(POPL), 191–202 (2012). https://doi.org/10.1145/2103656.2103680
Berger, M., Honda, K.: The two-phase commitment protocol in an extended \(\pi \)-calculus. ENTCS 39(1), 21–46 (2003). https://doi.org/10.1016/S1571-0661(05)82502-2
Berger, M., Yoshida, N.: Timed, distributed, probabilistic, typed processes. In: Shao, Z. (ed.) APLAS 2007. LNCS, vol. 4807, pp. 158–174. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76637-7_11
Bergstra, J.A., Klop, J.W.: Process algebra for synchronous communication. Inf. Control. 60(1–3), 109–137 (1984). https://doi.org/10.1016/S0019-9958(84)80025-X
de Boer, F.S., Klop, J.W., Palamidessi, C.: Asynchronous communication in process algebra. In: Proceedings LICS, pp. 137–147. IEEE Computer Society (1992). https://doi.org/10.1109/LICS.1992.185528
Bollig, B., Giusto, C.D., Finkel, A., Laversa, L., Lozes, É., Suresh, A.: A unifying framework for deciding synchronizability. In: Proceedings CONCUR. LIPIcs, vol. 203, pp. 14:1–14:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021). https://doi.org/10.4230/LIPIcs.CONCUR.2021.14
Boreale, M., De Nicola, R., Pugliese, R.: A theory of “May’’ testing for asynchronous languages. In: Thomas, W. (ed.) FoSSaCS 1999. LNCS, vol. 1578, pp. 165–179. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49019-1_12
Boreale, M., Nicola, R.D., Pugliese, R.: Trace and testing equivalence on asynchronous processes. Inf. Comput. 172(2), 139–164 (2002). https://doi.org/10.1006/inco.2001.3080
Brand, D., Zafiropulo, P.: On communicating finite-state machines. J. ACM 30(2), 323–342 (1983). https://doi.org/10.1145/322374.322380
Cano, M., Castellani, I., Di Giusto, C., Pérez, J.A.: Multiparty Reactive Sessions. Research Report 9270, INRIA, April 2019. https://hal.archives-ouvertes.fr/hal-02106742
Capecchi, S., Giachino, E., Yoshida, N.: Global escape in multiparty sessions. MSCS 26(2), 156–205 (2016). https://doi.org/10.1017/S0960129514000164
Castellani, I.: Process algebras with localities. In: Handbook of Process Algebra, pp. 945–1045. North-Holland/Elsevier (2001). https://doi.org/10.1016/b978-044482830-9/50033-3
Castellani, I., Hennessy, M.: Testing theories for asynchronous languages. In: Arvind, V., Ramanujam, S. (eds.) FSTTCS 1998. LNCS, vol. 1530, pp. 90–101. Springer, Heidelberg (1998). https://doi.org/10.1007/978-3-540-49382-2_9
Coppo, M., Dezani-Ciancaglini, M., Yoshida, N., Padovani, L.: Global progress for dynamically interleaved multiparty sessions. MSCS 26(2), 238–302 (2016). https://doi.org/10.1017/S0960129514000188
Fournet, C., Gonthier, G., Levy, J.-J., Maranget, L., Rémy, D.: A calculus of mobile agents. In: Montanari, U., Sassone, V. (eds.) CONCUR 1996. LNCS, vol. 1119, pp. 406–421. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61604-7_67
Fowler, S., Lindley, S., Morris, J.G., Decova, S.: Exceptional asynchronous session types: session types without tiers. Proc. ACM Program. Lang. 3(POPL), 1–29 (2019). https://doi.org/10.1145/3290341
Francalanza, A., Hennessy, M.: A theory for observational fault tolerance. JLAMP 73(1–2), 22–50 (2007). https://doi.org/10.1007/11690634_2
Francalanza, A., Hennessy, M.: A theory of system behaviour in the presence of node and link failure. Inf. Comput. 206(6), 711–759 (2008). https://doi.org/10.1016/j.ic.2007.12.002
Graf, S., Sifakis, J.: A modal characterization of observational congruence on finite terms of CCS. Inf. Control. 68(1–3), 125–145 (1986). https://doi.org/10.1016/S0019-9958(86)80031-6
Gunawi, H.S., et al.: Fail-slow at scale: evidence of hardware performance faults in large production systems. ACM Trans. Storage 14(3), 23:1–23:26 (2018). https://doi.org/10.1145/3242086
Hennessy, M., Regan, T.: A process algebra for timed systems. Inf. Comput. 117(2), 221–239 (1995). https://doi.org/10.1006/inco.1995.1041
Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. J. ACM 63(1), 9:1–9:67 (2016). https://doi.org/10.1145/2827695
Hu, R., Neykova, R., Yoshida, N., Demangeon, R., Honda, K.: Practical interruptible conversations. In: Legay, A., Bensalem, S. (eds.) RV 2013. LNCS, vol. 8174, pp. 130–148. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40787-1_8
Huang, P., et al.: Gray failure: the Achilles’ heel of cloud-scale systems. In: Proceedings HotOS, pp. 150–155. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3102980.3103005
Lanese, I., Nishida, N., Palacios, A., Vidal, G.: A theory of reversibility for Erlang. JLAMP 100, 71–97 (2018). https://doi.org/10.1016/j.jlamp.2018.06.004
Lanese, I., Sangiorgi, D., Zavattaro, G.: Playing with bisimulation in Erlang. In: Boreale, M., Corradini, F., Loreti, M., Pugliese, R. (eds.) Models, Languages, and Tools for Concurrent and Distributed Programming. LNCS, vol. 11665, pp. 71–91. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21485-2_6
Laneve, C., Zavattaro, G.: Foundations of web transactions. In: Sassone, V. (ed.) FoSSaCS 2005. LNCS, vol. 3441, pp. 282–298. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31982-5_18
López, H.A., Pérez, J.A.: Time and exceptional behavior in multiparty structured interactions. In: Carbone, M., Petit, J.-M. (eds.) WS-FM 2011. LNCS, vol. 7176, pp. 48–63. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29834-9_5
Lou, C., Huang, P., Smith, S.: Understanding, detecting and localizing partial failures in large system software. In: NDSI, pp. 559–574. USENIX Association (2020). https://www.usenix.org/conference/nsdi20/presentation/lou
Merro, M., Sangiorgi, D.: On asynchrony in name-passing calculi. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 856–867. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055108
Milner, R.: Communication and Concurrency. PHI Series in Computer Science. Prentice Hall, Upper Saddle River (1989)
Milner, R., Parrow, J., Walker, D.: A calculus of mobile processes. I. Inf. Comput. 100(1), 1–40 (1992). https://doi.org/10.1016/0890-5401(92)90008-4
Milner, R., Sangiorgi, D.: Barbed bisimulation. In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 685–695. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55719-9_114
Mostrous, D., Vasconcelos, V.T.: Session typing for a featherweight Erlang. In: De Meuter, W., Roman, G.-C. (eds.) COORDINATION 2011. LNCS, vol. 6721, pp. 95–109. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21464-6_7
Murgia, M.: Input urgent semantics for asynchronous timed session types. JLAMP 107, 38–53 (2019). https://doi.org/10.1016/j.jlamp.2019.04.001
Nygard, M.T.: Release It!: Design and Deploy Production-Ready Software. Pragmatic Bookshelf (2018)
Riely, J., Hennessy, M.: Distributed processes and location failures. In: Degano, P., Gorrieri, R., Marchetti-Spaccamela, A. (eds.) ICALP 1997. LNCS, vol. 1256, pp. 471–481. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63165-8_203
Riely, J., Hennessy, M.: Distributed processes and location failures. Theor. Comput. Sci. 266(1–2), 693–735 (2001). https://doi.org/10.1016/S0304-3975(00)00326-1
Sabahi-Kaviani, Z., Khosravi, R., Ölveczky, P.C., Khamespanah, E., Sirjani, M.: Formal semantics and efficient analysis of timed Rebeca in real-time Maude. Sci. Comput. Program. 113, 85–118 (2015). https://doi.org/10.1016/j.scico.2015.07.003
Sangiorgi, D., Walker, D.: The \(\pi \)-Calculus: A Theory of Mobile Processes. Cambridge University Press, Cambridge (2001)
Sankar, K.: Programming Erlang - Software for a Concurrent World by Joe Armstrong, p. 536. Pragmatic Bookshelf (2007). ISBN-10: 193435600x. J. Funct. Program. 19(2), 259–261 (2009). https://doi.org/10.1017/S0956796809007163
Steffen, B.: Characteristic formulae. In: Ausiello, G., Dezani-Ciancaglini, M., Della Rocca, S.R. (eds.) ICALP 1989. LNCS, vol. 372, pp. 723–732. Springer, Heidelberg (1989). https://doi.org/10.1007/BFb0035794
Svensson, H., Fredlund, L., Earle, C.B.: A unified semantics for future Erlang. In: Proceedings ACM SIGPLAN Workshop on Erlang, pp. 23–32. ACM (2010). https://doi.org/10.1145/1863509.1863514
Wyatt, D.: Akka Concurrency. Artima Incorporation, Sunnyvale (2013)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Bocchi, L., Lange, J., Thompson, S., Voinea, A.L. (2022). A Model of Actors and Grey Failures. In: ter Beek, M.H., Sirjani, M. (eds) Coordination Models and Languages. COORDINATION 2022. IFIP Advances in Information and Communication Technology, vol 13271. Springer, Cham. https://doi.org/10.1007/978-3-031-08143-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-08143-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08145-3
Online ISBN: 978-3-031-08143-9
eBook Packages: Computer ScienceComputer Science (R0)