More Web Proxy on the site http://driver.im/

research-article

The failure detector abstraction

Authors:

Felix C. Freiling,

Rachid Guerraoui,

Petr KuznetsovAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 43, Issue 2

Article No.: 9, Pages 1 - 40

https://doi.org/10.1145/1883612.1883616

Published: 04 February 2011 Publication History

Abstract

A failure detector is a fundamental abstraction in distributed computing. This article surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlight some limitations of the failure detector abstraction along each of the dimensions.

References

[1]

Afek, Y. and Nir, I. 2008. Failure detectors in loosely named systems. In Proceedings of the Annual ACM SIGOPS Symposium on Principles of Distributed Computing (PODC). 65--74.

Digital Library

[2]

Aguilera, Delporte-Gallet, Fauconnier, and Toueg. 2001. Stable leader election. In Proceedings of the International Symposium on Distributed Computing (DISC).

Digital Library

[3]

Aguilera, Delporte-Gallet, Fauconnier, and Toueg. 2003. On implementing omega with weak reliability and synchrony assumptions. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC).

Digital Library

[4]

Aguilera, M. and Toueg, S. 1998. Failure detection and randomization: A hybrid approach to solve consensus. SIAM J. Comput. 28.

Digital Library

[5]

Aguilera, M. K., Chen, W., and Toueg, S. 1998. Failure detection and consensus in the crash-recovery model. In Proceedings of the 12th International Symposium on Distributed Computing (DISC). 231--245.

Digital Library

[6]

Aguilera, M. K., Chen, W., and Toueg, S. 1999. Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks. Theor. Comput. Sci. 220, 1, 3--30.

Digital Library

[7]

Aguilera, M. K., Chen, W., and Toueg, S. 2000a. Failure detection and consensus in the crash recovery model. Distrib. Comput. 13, 2, 99--125.

Digital Library

[8]

Aguilera, M. K., Chen, W., and Toueg, S. 2000b. On quiescent reliable communication. SIAM J. Comput. 29, 6, 2040--2073.

Digital Library

[9]

Aguilera., M. K., Chen, W., and Toueg, S. 2000c. On quiescent reliable communication. SIAM J. Comput. 29, 6, 2040--2073.

Digital Library

[10]

Aguilera, M. K., Delporte-Gallet, C., Fauconnier, H., and Toueg, S. 2000d. Thrifty generic broadcast. In Proceedings of the 14th International Symposium on Distributed Computing (DISC). Lecture Notes in Computer Science, vol. 1914. Springer, 268--282.

Digital Library

[11]

Aguilera, M. K., Le Lann, G., and Toueg, S. 2002. On the impact of fast failure detectors on real-time fault-tolerant systems. In Proceedings of the International Symposium on Distributed Computing (DISC). 354--370.

Digital Library

[12]

Arora, A. and Kulkarni, S. S. 1998. Detectors and correctors: A theory of fault-tolerance components. In Proceedings of the IEEE International Conference on Distributed Computing Systems.

Digital Library

[13]

Attiya, H., Bar-Noy, A., and Dolev, D. 1995. Sharing memory robustly in message-passing systems. J. ACM 42, 1, 124--142.

Digital Library

[14]

Attiya, H., Bar-Noy, A., Dolev, D., Peleg, D., and Reischuk, R. 1990. Renaming in an asynchronous environment. J. ACM 37, 3, 524--548.

Digital Library

[15]

Attiya, H. and Welch, J. L. 2004. Distributed Computing: Fundamentals, Simulations and Advanced Topics (2nd edition). Wiley.

[16]

Barborak, M., Dahbura, A., and Malek, M. 1993. The consensus problem in fault-tolerant computing. ACM Comput. Surv. 25, 2, 171--220.

Digital Library

[17]

Beauquier, J. and Kekkonen-Moneta, S. 1997. Fault-Tolerance and self-stabilization: Impossibility results and solutions using self-stabilizing failure detectors. Int. J. Syst. Sci. 28, 11, 1177--1187.

[18]

Ben-Or, M. 1983. Another advantage of free choice: Completely asynchronous agreement protocols. In Proceedings of the 2nd Annual ACM Symposium on Principles of Distributed Computing. 27--30.

Digital Library

[19]

Bernstein, P., Hadzilacos, V., and Goodman, N. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading, MA.

Digital Library

[20]

Borowsky, E. and Gafni, E. 1993. Generalized FLP impossibility result for t-resilient asynchronous computations. In Proceedings of the 25th ACM Symposium on Theory of Computing (STOC). 91--100.

Digital Library

[21]

Brasileiro, F., Greve, F., Mostéfaoui, A., and Raynal, M. 2000. Consensus in one communication step. Tech. rep. PI-1321, IRISA, Rennes, France.

[22]

Chandra, T. D., Hadzilacos, V., and Toueg, S. 1996. The weakest failure detector for solving consensus. J. ACM 43, 4, 685--722.

Digital Library

[23]

Chandra, T. D. and Toueg, S. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2, 225--267.

Digital Library

[24]

Chandy, K. M. and Misra, J. 1988. Parallel Program Design: A Foundation. Addison-Wesley, Reading, MA.

Digital Library

[25]

Charron-Bost, B., Guerraoui, R., and Schiper, A. 2000. Synchronous system and perfect failure detector: Solvability and efficiency issues. In International Conference on Dependable Systems and Networks.

Digital Library

[26]

Charron-Bost, B. and Schiper, A. 2006. The “heard-of” model: Unifying all benign faults. Tech. rep., EPFL.

[27]

Chaudhuri, S. 1990. Agreement is harder than consensus: Set consensus problems in totally asynchronous systems. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC).

Digital Library

[28]

Chen, W., Toueg, S., and Aguilera, M. K. 2000. On the quality of service of failure detectors. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'00). IEEE Computer Society Press.

Digital Library

[29]

Chu, F. 1998. Reducing Ω to &diamond; W. Inf. Process. Lett. 67, 289--293.

Digital Library

[30]

Cristian, F. and Fetzer, C. 1999. The timed asynchronous distributed system model. IEEE Trans. Parallel Distrib. Syst. 10, 6.

Digital Library

[31]

Delporte-Gallet, C., Fauconnier, G., and Freiling, F. C. 2005a. Revisiting failure detection and consensus in omission failure environments. In Proceedings of the International Conference on Theoretical Aspects of Computing (ICTAC'03), 2nd International Colloquium, D. V. Hung and M. Wirsing, Eds. Lecture Notes in Computer Science, vol. 3722. Springer, 394--408.

Digital Library

[32]

Delporte-Gallet, C., Fauconnier, H., and Guerraoui, R. 2003. Shared memory vs message passing. Tech. rep. IC/2003/77, EPFL. http://icwww.epfl.ch/publications/.

[33]

Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Kouznetsov, P., and Toueg, S. 2004. The weakest failure detectors to solve certain fundamental problems in distributed computing. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC). 338--346.

Digital Library

[34]

Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., and Kouznetsov, P. 2005b. Mutual exclusion in asynchronous systems with failure detectors. J. Parall. Dustrib. Comput. 65, 4, 492--505.

Digital Library

[35]

Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., and Tielmann, A. 2008. The weakest failure detector for message passing set-agreement. In Proceedings of the International Symposium on Distriburted Computing (DISC). 109--120.

Digital Library

[36]

Dijkstra, E. W. 1974. Self stabilizing systems in spite of distributed control. Comm. ACM 17, 11, 643--644.

Digital Library

[37]

Dijkstra, E. W., Feijen, W. H. J., and van Gasteren, A. J. M. 1983. Derivation of a termination detection algorithm for distributed computations. Inf. Process. Lett. 16, 5, 217--219.

[38]

Dolev, D., Dwork, C., and Stockmeyer, L. 1987. On the minimal synchronism needed for distributed consensus. J. ACM 34, 1, 77--97.

Digital Library

[39]

Dolev, D., Friedmann, R., Keidar, I., and Malkhi, D. 1997. Failure detectors in omission failure environments. In Proceedings of the ACM Symposium on Principles of Distributed Computing. (PODC).e Detectors in Omission Failure Environments.

Digital Library

[40]

Dolev, S. 2000. Self-Stabilization. MIT Press.

Digital Library

[41]

Doudou, A., Garbinato, B., and Guerraoui, R. 2002. Encapsulating failure detection: From crash to Byzantine failures. In Proceedings of the International Conference on Reliable Software Technologies.

Digital Library

[42]

Doudou, A., Garbinato, B., and Guerraoui, R. 2005. Tolerating arbitrary failures with state machine replication. In Dependable Computing Systems: Paradigms, Performance Issues and Applications, 1st ed., H. Diab and A. Zomaya, Eds. Addison-Wesley, Reading, MA, Chapter 2.

[43]

Doudou, A., Garbinato, B., Guerraoui, R., and Schiper, A. 1999. Muteness failure detectors: Specification and implementation. In Proceedings of the 3rd European Dependable Computing Conference (EDCC'99). Lecture Notes in Computer Science, vol. 1667. Springer, 71--87.

Digital Library

[44]

Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2, 288--323.

Digital Library

[45]

Eisler, J., Hadzilacos, V., and Toueg, S. 2007. The weakest failure detector to solve nonuniform consensus. Distrib. Comput. 19, 4, 335--359.

Digital Library

[46]

Fetzer, C., Schmid, U., and Süsskraut, M. 2005. On the possibility of consensus in asynchronous systems with finite average response times. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE Computer Society, 271--280.

Digital Library

[47]

Fischer, M. J., Lynch, N. A., and Paterson, M. S. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2, 374--382.

Digital Library

[48]

Freiling, F. C. and Völzer, H. 2006. Illustrating the impossibility of crash-tolerant consensus in asynchronous systems. Oper. Syst. Rev. 40, 2, 105--109.

Digital Library

[49]

Gafni, E. 1998. Round-by-round fault detectors: Unifying synchrony and asynchrony (extended abstract). In Proceedings of the Annual ACM SIGOPS Symposium on Principles of Distributed Computing (PODC). 143--152.

Digital Library

[50]

Garg, V. K. and Mitchell, J. R. 1998a. Distributed predicate detection in a faulty environment. In Proceedings of the 18th IEEE International Conference on Distributed Computing Systems (ICDCS98).

Digital Library

[51]

Garg, V. K. and Mitchell, J. R. 1998b. Implementable failure detectors in asynchronous systems. In Proceedings of the 18th Conference on Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, vol. 1530. Springer.

Digital Library

[52]

Gärtner, F. C. and Kloppenburg, S. 2000. Consistent detection of global predicates under a weak fault assumption. In Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems (SRDS'00). IEEE Computer Society Press, 94--103.

Digital Library

[53]

Gärtner, F. C. and Pleisch, S. 2001. (Im)Possibilities of predicate detection in crash-affected systems. In Proceedings of the 5th Workshop on Self-Stabilizing Systems (WSS'01). Lecture Notes in Computer Science, vol. 2194. Springer, 98--113.

Digital Library

[54]

Gärtner, F. C. and Pleisch, S. 2002. Failure detection sequencers: Necessary and sufficient information about failures to solve predicate detection. In Proceedings of the 16th International Symposium on Distributed Computing (DISC'02), D. Malkhi, Ed., Lecture Notes in Computer Science, vol. 2508. Springer, 280--294.

Digital Library

[55]

Guerraoui, R. 2000. Indulgent algorithms. In Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC'00). ACM Press, New York, 289--298.

Digital Library

[56]

Guerraoui, R. 2002. Non-Blocking atomic commitment in asynchronous systems with failure detectors. Distrib. Comput. 15, 1, 17--25.

Digital Library

[57]

Guerraoui, R., Herlihy, M., Kouznetsov, P., Lynch, N. A., and Newport, C. C. 2007. On the weakest failure detector ever. In Proceedings of the Annual ACM SIGOPS Symposium on Principles of Distributed Computing (PODC). 235--243.

Digital Library

[58]

Guerraoui, R., Hurfin, M., Mostéfaoui, A., Oliveira, R., Raynal, M., and Schiper, A. 1999. Consensus in asynchronous distributed systems: A concise guided tour. In Advances in Distributed Systems, S. Krakowiak and S. K. Shrivastava, Eds. Lecture Notes in Computer Science, vol. 1752. Springer, 33--47.

Digital Library

[59]

Guerraoui, R., Kapalka, M., and Kouznetsov, P. 2008. The weakest failure detectors to boost obstruction-freedom. Distrib. Comput. 20, 6, 415--433.

[60]

Guerraoui, R. and Kouznetsov, P. 2008a. Failure detectors as type boosters. Distrib. Comput. 20, 5, 343--358.

[61]

Guerraoui, R. and Kuznetsov, P. 2008b. The gap in circumventing the impossibility of consensus. J. Comput. Syst. Sci. 74, 5, 823--830.

Digital Library

[62]

Guerraoui, R. and Schiper, A. 1996. “Gamma-accurate” failure detectors. In Proceedings of the Distributed Algorithms 10th International Workshop (WDAG'96), Ö. Babaoglu and K. Marzullo, Eds. Lecture Notes in Computer Science, vol. 1151. Springer, 269--286.

Digital Library

[63]

Guerraoui, R. and Schiper, A. 1997. Genuine atomic multicast. In Proceedings of the 11th International Workshop on Distributed Algorithms (WDAG'97). Lecture Notes in Computer Science, vol. 1320. Springer, 141--154.

Digital Library

[64]

Hadzilacos, V. 1984. Issues of fault tolerance in concurrent computations. Tech. rep. TR11-84, Harvard University.

[65]

Hadzilacos, V. and Toueg, S. 1994. A modular approach to fault-tolerant broadcasts and related problems. Tech. rep. TR94-1425, Computer Science Department. Cornell University.

Digital Library

[66]

Haeberlen, A., Kouznetsov, P., and Druschel, P. 2007. Peerreview: Practical accountability for distributed systems. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP). 175--188.

Digital Library

[67]

Herlihy, M. and Shavit, N. 1999. The topological structure of asynchronous computability. J. ACM 46, 6, 858--923.

Digital Library

[68]

Herlihy, M. and Wing, J. M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3, 463--492.

Digital Library

[69]

Hermant, J. and Le Lann, G. 2002. Fast asynchronous uniform consensus in real-time distributed systems. IEEE Trans. Comput. 51, 8, 931--944.

Digital Library

[70]

Hermant, J.-F. and Widder, J. 2005. Implementing reliable distributed real-time systems with the theta-model. In Proceedings of the 9th International Conference on Principles of Distributed Systems (OPODIS'05).

Digital Library

[71]

Hurfin, M., Mostéfaoui, A., and Raynal, M. 1998. Consensus in asynchronous systems where processes can crash and recover. In Proceedings of the 17th IEEE Symposium on Reliable Distributed Systems (SRDS'98). IEEE Computer Society Press, 280--286.

Digital Library

[72]

Hurfin, M. and Raynal, M. 1999. A simple and fast asynchronous consensus protocol based on a weak failure detector. Distrib. Comput. 12, 4, 209--223.

Digital Library

[73]

Hutle, M. and Widder, J. 2005. On the possibility and the impossibility of message-driven self-stabilizing failure detection. In Proceedings of the Self Stablizing systems, 7th International Symposium, (SSS'05). T. Herman and S. Tixeuil, Eds. Lecture Notes in Computer Science, Vol. 3764. Springer, 153--170.

Digital Library

[74]

Israeli, A. and Li, M. 1993. Bounded time-stamps. Distrib. Comput. 6, 4, 205--209.

Digital Library

[75]

Jayanti, P. and Toueg, S. 2008. Every problem has a weakest failure detector. In Proceedings of the Annual ACM SIGOPS Symposium on Principles of Distributed Computing (PODC). 75--84.

Digital Library

[76]

Kihlstrom, K. P., Moser, L. E., and Melliar-Smith, P. M. 2003. Byzantine fault detectors for solving consensus. Comput. J. 46, 1.

[77]

Lamport, L. 1978. Time, clocks and the ordering of events in a distributed system. Comm. ACM 21, 7, 558--565.

Digital Library

[78]

Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169.

Digital Library

[79]

Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4, 3, 382--401.

Digital Library

[80]

Larrea, M., Fernández, A., and Arévalo, S. 2000a. Optimal implementation of the weakest failure detector for solving consensus. In Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems (SRDS'00). IEEE Computer Society Press.

Digital Library

[81]

Larrea, M., Fernández, A., and Arvalo, S. 2000b. Eventually consistent failure detectors. Tech. rep., Universidad Pública de Navarra, Spain. April.

[82]

Lo, W.-K. and Hadzilacos, V. 1994. Using failure detectors to solve consensus in asynchronous shared-memory systems (extended abstract). In Proceedings of the 8th International Workshop on Distributed Algorithms (WDAG'94), G. Tel and P. M. B. Vitányi, Eds. Lecture Notes in Computer Science, vol. 857. Springer, 280--295.

Digital Library

[83]

Long, D. D. E., Carroll, J. L., and Park, C. J. 1991. A study of the reliability of Internet sites. In Proceedings of the 10th IEEE Symposium on Reliable Distributed Systems (SRDS'91). 177--186.

[84]

Malkhi, D. and Reiter, M. 1997. Unreliable intrusion detection in distributed computations. In Proceedings of the 10th Computer Security Foundations Workshop (CSFW97). 116--124.

Digital Library

[85]

Matsui, H., Inoue, M., Masuzawa, T., and Fujiwara, H. 2000. Fault-tolerant and self-stabilizing protocols using an unreliable failure detector. IEICE Trans. E83-D, 10, 1831--1840.

[86]

Mittal, N., Freiling, F. C., Venkatesan, S., and Penso, L. D. 2005. Efficient reduction for wait-free termination detection in a crash-prone distributed system. In Proceedings of the International Symposium on Distributed Computing (DISC). 93--107.

Digital Library

[87]

Mostéfaoui, A., Raynal, M., and Travers, C. 2006. Exploring Gafni's reduction land: From mega to wait-free adaptive (2p-{p/k})-renaming via k-set agreement. In Proceedings of the International Symposium on Distributed Computing (DISC). 1--15.

Digital Library

[88]

Neiger, G. 1995. Failure detectors and the wait-free hierarchy. In Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing (PODC'95). 100--109.

Digital Library

[89]

Oki, B. and Liskov, B. 1988. Viewstamped replication: A general primary copy method to support highly available distributed systems. In Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing (PODC'88). 8--17.

Digital Library

[90]

Oliveira, R., Guerraoui, R., and Schiper, A. 1997. Consensus in the crash-recover model. Tech. rep. TR-97/239, EPFL -- Départment d'Informatique, Lausanne, Switzerland.

[91]

Paxson, V. and Adams, A. 2002. Experiences with NIMI. In Proceedings of the Symposium on Applications and the Internet.

Digital Library

[92]

Pedone, F. and Schiper, A. 1999. Generic broadcast. In Proceedings of the 13th International Symposium on Distributed Computing (DISC'99).

Digital Library

[93]

Powell, D. 1992. Failure mode assumptions and assumption coverage. In Proceedings of the 22nd Annual International Symposium on Fault-Tolerant Computing (FTCS '92). D. K. Pradhan, Ed. IEEE Computer Society Press, 386--395.

[94]

Raynal, M. 2002. Consensus in synchronous systems: A concise guided tour. In Proceedings of the Pacific Rim International Symposium on Dependable Computing (PRDC'00). IEEE Computer Society, 221.

Digital Library

[95]

Raynal, M. 2005. A short introduction to failure detectors for asynchronous distributed systems. SIGACT News 36, 1, 53--70.

Digital Library

[96]

Raynal, M. and Travers, C. 2006. In search of the holy grail: Looking for the weakest failure detector for wait-free set agreement. In Proceedings of the International Conference on Principles of Distributed Systems (OPODIS). 3--19.

Digital Library

[97]

Sabel, L. S. and Marzullo, K. 1995. Election vs. consensus in asynchronous systems. Tech. rep. TR95-1488, Computer Science Department, Cornell University. February.

Digital Library

[98]

Saks, M. E. and Zaharoglou, F. 2000. Wait-Free k-set agreement is impossible: The topology of public knowledge. SIAM J. Comput. 29, 5, 1449--1483.

Digital Library

[99]

Schiper, A. 1997a. Early consensus in an asynchronous system with a weak failure detector. Distrib. Comput. 10, 3, 149--157.

Digital Library

[100]

Schiper, A. 1997b. Erratum: Early consensus in an asynchronous system with a weak failure detector. Distrib. Comput. 10, 198.

Digital Library

[101]

Schlichting, R. D. and Schneider, F. B. 1983. Fail stop processors: An approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3, 222--238.

Digital Library

[102]

Schneider, F. B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4, 299--319.

Digital Library

[103]

Schneider, F. B. 1993. What good are models and what models are good&quest; In Distributed Systems, 2nd Ed., S. Mullender, Ed. Addison-Wesley, Reading, MA, Chapter 2, 17--26.

Digital Library

[104]

Sergent, N., Défago, X., and Schiper, A. 1999. Failure detectors: Implementation issues and impact on consensus performance. Tech. rep. SSC/1999/019, École Polytechnique Fédérale de Lausanne, Switzerland.

[105]

Tanenbaum, A. S. 1996. Computer Networks., 3rd Ed. Pren-tice-Hall, Englewood Cliffs, NJ.

Digital Library

[106]

Turek, J. and Shasha, D. 1992. The many faces of consensus in distributed systems. IEEE Comput. 25, 6, 8--17.

Digital Library

[107]

Vitányi, P. and Awerbuch, B. 1986. Atomic shared register access by asynchronous hardware. In Proceedings of the 27th Symposium on Foundations of Computer Science. 233--246.

Digital Library

[108]

Völzer, H. 2004. Randomization versus synchronization in distributed systems. In Proceedings 31st International Colloquium on Automata, Languages, and Programming (ICALP 2004). Lecture Notes in Computer Science, vol. 3142. Springer, 1214--1226.

[109]

Völzer, H. 2005. On conspiracies and hyperfairness in distributed computing. In Proceedings of the 19th International Symposium on Distributed Computing, (DISC'05). Lecture Notes in Computer Science vol. 3724. Springer, 33--47.

Digital Library

[110]

Zielinski, P. 2007. Automatic classification of eventual failure detectors. In Proceedings of the International Symposium on Distributed Computing (DISC). 465--479.

Digital Library

[111]

Zielinski, P. 2008. Anti-omega: The weakest failure detector for set agreement. In Proceedings of the Annual ACM SIGOPS Symposium on Principles of Distributed Computing (PODC). 55--64.

Digital Library

Cited By

Chaurasia BVerma AVerma P(2024)An in-depth and insightful exploration of failure detection in distributed systemsComputer Networks10.1016/j.comnet.2024.110432247(110432)Online publication date: Jun-2024
https://doi.org/10.1016/j.comnet.2024.110432
Bravo MChockler GGotsman A(2024)Liveness and latency of Byzantine state-machine replicationDistributed Computing10.1007/s00446-024-00466-437:2(177-205)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00446-024-00466-4
Tamir RLivshits AShadmi Y(2022)Simple Majority Consensus in Networks with Unreliable CommunicationEntropy10.3390/e2403033324:3(333)Online publication date: 25-Feb-2022
https://doi.org/10.3390/e24030333
Show More Cited By

Index Terms

The failure detector abstraction

Recommendations

Unreliable failure detectors for reliable distributed systems

We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties—completeness and accuracy. We ...
The weakest failure detector for solving consensus

We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In Chandra and Toueg [1996], it is shown that W, a failure detector that provides surprisingly ...
Failure Detection and Randomization: A Hybrid Approach to Solve Consensus

We present a consensus algorithm that combines unreliable failure detection and randomization, two well-known techniques for solving consensus in asynchronous systems with crash failures. This hybrid algorithm combines advantages from both approaches:...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 43, Issue 2

January 2011

276 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/1883612

Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2011

Accepted: 01 July 2009

Revised: 01 May 2009

Received: 01 March 2007

Published in CSUR Volume 43, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
1,596
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)7

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chaurasia BVerma AVerma P(2024)An in-depth and insightful exploration of failure detection in distributed systemsComputer Networks10.1016/j.comnet.2024.110432247(110432)Online publication date: Jun-2024
https://doi.org/10.1016/j.comnet.2024.110432
Bravo MChockler GGotsman A(2024)Liveness and latency of Byzantine state-machine replicationDistributed Computing10.1007/s00446-024-00466-437:2(177-205)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00446-024-00466-4
Tamir RLivshits AShadmi Y(2022)Simple Majority Consensus in Networks with Unreliable CommunicationEntropy10.3390/e2403033324:3(333)Online publication date: 25-Feb-2022
https://doi.org/10.3390/e24030333
Sutra PShapiro M(2022)Database Consistency ModelsEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_203-2(1-12)Online publication date: 24-May-2022
https://doi.org/10.1007/978-3-319-63962-8_203-2
Verma ASingh MPattanaik K(2021)Failure Detectors of Strong S and Perfect P Classes for Time Synchronous Hierarchical Distributed SystemsResearch Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing10.4018/978-1-7998-5339-8.ch064(1317-1343)Online publication date: 2021
https://doi.org/10.4018/978-1-7998-5339-8.ch064
Richardson DJhumka AMottola LHung CHong JBechini ASong E(2021)Protocol transformation for transiently powered wireless sensor networksProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3441985(1112-1121)Online publication date: 22-Mar-2021
https://dl.acm.org/doi/10.1145/3412841.3441985
Talluri SOverweel LVersluis LTrivedi AIosup A(2021)Empirical Characterization of User Reports about Cloud Failures2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS)10.1109/ACSOS52086.2021.00039(158-163)Online publication date: Sep-2021
https://doi.org/10.1109/ACSOS52086.2021.00039
Jiménez ELópez-Presa JPatiño-Martínez M(2021)Consensus in anonymous asynchronous systems with crash-recovery and omission failuresComputing10.1007/s00607-021-01023-8103:12(2811-2837)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s00607-021-01023-8
Buchnik YFriedman R(2020)FireLedgerProceedings of the VLDB Endowment10.14778/3397230.339724613:9(1525-1539)Online publication date: 26-Jun-2020
https://dl.acm.org/doi/10.14778/3397230.3397246
Huang KHuang YWei HEmek YCachin C(2020)Fine-grained Analysis on Fast Implementations of Distributed Multi-writer Atomic RegistersProceedings of the 39th Symposium on Principles of Distributed Computing10.1145/3382734.3405698(200-209)Online publication date: 31-Jul-2020
https://dl.acm.org/doi/10.1145/3382734.3405698
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents