On the scalability and mean-time to failure of $k$ resilient protocols

Sampath Rangarajan¹,
Yennun Huang¹ &
Satish K. Tripathi²

30 Accesses
Explore all metrics

Abstract.

Consider a distributed system with $n$ nodes. A protocol running on this system is $k$ resilient if it could tolerate up to $k$ failures and operate correctly. The reliability of such a protocol is defined as the probability that no more than $k$ nodes have failed. In the first part of the paper, we study the scalability of systems running such protocols. We show the existence of a threshold time of operation for these protocols which we call the scalable mission time (SMT). This scalable mission time is the maximum time until which an asymptotic increase in the system size leads to an asymptotic increase in the reliability of the protocol. We show that beyond this scalable mission time, an asymptotic increase in system size leads to an asymptotic decrease in reliability. We also show techniques to compute the scalable mission time. In the second part of the paper, we show that the scalable mission time for a $k$ resilient protocol can be used as a good approximation to the mean-time to failure (MTTF) of the protocol, even when the failure distributions are non-exponential and the nodes fail at different rates (a heterogeneous system). We also show that the MTTF asymptotically approaches the SMT with an increase in system size $n$. Computation of the MTTF is quite difficult when the system is heterogeneous even if the failure distribution of the nodes is exponential. Using experimental results, we show that the SMT approximation to the MTTF gives values very close to the real MTTF. Further, we consider the maintenance interval of systems running $k$ resilient protocols and show that if the maintenance interval is larger than the scalable mission time, then there is a maximum scalability value beyond which it is undesirable to scale up the size of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

AT & T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA , , , , , , US
Sampath Rangarajan & Yennun Huang
Department of Computer Science, University of Maryland at College Park, College Park, MD 20742, USA , , , , , , US
Satish K. Tripathi

Authors

Sampath Rangarajan
View author publications
You can also search for this author in PubMed Google Scholar
Yennun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Satish K. Tripathi
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Received: 4 October 1994 / 30 May 1996

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rangarajan, S., Huang, Y. & Tripathi, S. On the scalability and mean-time to failure of $k$ resilient protocols. Acta Informatica 34, 543–556 (1997). https://doi.org/10.1007/s002360050096

Download citation

Issue Date: July 1997
DOI: https://doi.org/10.1007/s002360050096

On the scalability and mean-time to failure of \(k\) resilient protocols

Abstract.

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A nearly optimal upper bound for the self-stabilization time in Herman’s algorithm

A Nearly Optimal Upper Bound for the Self-Stabilization Time in Herman’s Algorithm

Cutoffs for Symmetric Point-to-Point Distributed Algorithms

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

On the scalability and mean-time to failure of \(k\) resilient protocols

Abstract.

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A nearly optimal upper bound for the self-stabilization time in Herman’s algorithm

A Nearly Optimal Upper Bound for the Self-Stabilization Time in Herman’s Algorithm

Cutoffs for Symmetric Point-to-Point Distributed Algorithms

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation