[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

On the scalability and mean-time to failure of \(k\) resilient protocols

  • Published:
Acta Informatica Aims and scope Submit manuscript

Abstract.

Consider a distributed system with \(n\) nodes. A protocol running on this system is \(k\) resilient if it could tolerate up to \(k\) failures and operate correctly. The reliability of such a protocol is defined as the probability that no more than \(k\) nodes have failed. In the first part of the paper, we study the scalability of systems running such protocols. We show the existence of a threshold time of operation for these protocols which we call the scalable mission time (SMT). This scalable mission time is the maximum time until which an asymptotic increase in the system size leads to an asymptotic increase in the reliability of the protocol. We show that beyond this scalable mission time, an asymptotic increase in system size leads to an asymptotic decrease in reliability. We also show techniques to compute the scalable mission time. In the second part of the paper, we show that the scalable mission time for a \(k\) resilient protocol can be used as a good approximation to the mean-time to failure (MTTF) of the protocol, even when the failure distributions are non-exponential and the nodes fail at different rates (a heterogeneous system). We also show that the MTTF asymptotically approaches the SMT with an increase in system size \(n\). Computation of the MTTF is quite difficult when the system is heterogeneous even if the failure distribution of the nodes is exponential. Using experimental results, we show that the SMT approximation to the MTTF gives values very close to the real MTTF. Further, we consider the maintenance interval of systems running \(k\) resilient protocols and show that if the maintenance interval is larger than the scalable mission time, then there is a maximum scalability value beyond which it is undesirable to scale up the size of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Received: 4 October 1994 / 30 May 1996

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rangarajan, S., Huang, Y. & Tripathi, S. On the scalability and mean-time to failure of \(k\) resilient protocols. Acta Informatica 34, 543–556 (1997). https://doi.org/10.1007/s002360050096

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s002360050096

Keywords

Navigation