Failure detection and consensus in the crash-recovery model

Marcos Kawazoe Aguilera¹,
Wei Chen¹ &
Sam Toueg¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1499))

Included in the following conference series:

International Symposium on Distributed Computing

161 Accesses
19 Citations

Abstract

We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice — those with no failures or failure detector mistakes. In such runs, consensus is achieved within 3δ time and with 4n messages, where δ is the maximum message delay and n is the number of processes in the system.

Research partially supported by NSF grant CCR-9402896 and CCR-9711403, by ARPA/ONR grant N00014-96-1-1014, and by an Olin Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. K. Aguilera, W. Chen, and S. Toueg. Heartbeat: a timeout-free failure detector for quiescent reliable communication. In Proceedings of the 11th International Workshop on Distributed Algorithms, Lecture Notes on Computer Science. Springer-Verlag, Sept. 1997. A full version is also available as Technical Report 97-1631, Computer Science Department, Cornell University, Ithaca, New York, May 1997.
Google Scholar
M. K. Aguilera, W. Chen, and S. Toueg. Failure detection and consensus in the crash-recovery model. Technical Report 98-1676, Department of Computer Science, Cornell University, April 1998.
Google Scholar
T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685–722, July 1996.
Article MATH MathSciNet Google Scholar
T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, March 1996.
Article MATH MathSciNet Google Scholar
D. Dolev, R. Friedman, I. Keidar, and D. Malkhi. Failure detectors in omission failure environments. Technical Report 96-1608, Department of Computer Science, Cornell University, Ithaca, New York, Sept. 1996.
Google Scholar
R. Guerraoui, R. Oliveira, and A. Schiper. Stubborn communication channels. Technical report, Département d'Informatique, Ecole Polytechnique Fédérale, Lausanne, Switzerland, Dec. 1996.
Google Scholar
M. Hurfin, A. Mostefaoui, and M. Raynal. Consensus in asynchronous systems where processes can crash and recover. Technical Report 1144, Institut de Recherche en Informatique et Systèmes Aléatoires, Université de Rennes, Nov. 1997.
Google Scholar
G. Neiger and S. Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, 11(3):374–419, 1990.
Article MATH MathSciNet Google Scholar
R. Oliveira, R. Guerraoui, and A. Schiper. Consensus in the crash-recover model. Technical Report 97-239, Département d'Informatique, Ecole Polytechnique Fédérale, Lausanne, Switzerland, Aug. 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Cornell University, 14853-7501, Ithaca, NY, USA
Marcos Kawazoe Aguilera, Wei Chen & Sam Toueg

Authors

Marcos Kawazoe Aguilera
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sam Toueg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Shay Kutten

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aguilera, M.K., Chen, W., Toueg, S. (1998). Failure detection and consensus in the crash-recovery model. In: Kutten, S. (eds) Distributed Computing. DISC 1998. Lecture Notes in Computer Science, vol 1499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0056486

Download citation

DOI: https://doi.org/10.1007/BFb0056486
Published: 02 June 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65066-9
Online ISBN: 978-3-540-49693-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics