Simulation and analysis of highly reliable systems

January 1990

Author:
Perwez Shahabuddin,
Adviser:
Peter W. Glynn

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Order Number:AAI9108903

Pages:

193

Purchase on ProQuest

Bibliometrics

Abstract

In this thesis we investigate simulation algorithms and numerical approximations for estimating performance measures for some large classes of highly reliable systems. We start with the study of analytical approximations for a fundamental problem in reliability theory that is known to be computationally intractable (NP-hard): the standard network reliability problem with highly reliable independent components. We extrapolate these ideas to Markovian systems with highly reliable interacting components. The special structure of the generator matrix is used to derive limit theorems and efficient approximations for some special classes of such systems. For more general systems with complex interdependencies among components, we have to resort to simulation techniques. In particular, we investigate simulation techniques for a class of systems that are considered by the "SAVE" package. The SAVE ("Systems Availability Estimator") package is a state of the art software package being developed at the IBM T. J. Watson Research Center for the performance analysis of highly reliable systems.

In the SAVE package the component failure times and component repair times are assumed to be exponentially distributed so that the systems may be modelled as continuous time Markov chains. Despite the Markovian structure, naive simulation is very inefficient due to the rarity of the failure events. An importance sampling technique called failure biasing has been known empirically to produce orders of magnitude of variance reduction in the simulation of some such systems. We modify this technique to make it both more robust and applicable to a broader class of systems. We develop a mathematical framework within which we can prove that our modified failure biasing technique yields a rate of convergence that is insensitive to the component failure rates.

Another way of making a system highly reliable is to have large degrees of component redundancy. The components are permitted to have generally distributed repair times. We derive exact analytical formulas for the performance measures of some systems with components in parallel. We also investigate importance sampling techniques, based on large deviation theory, that can be used to efficiently simulate such systems.

Cited By

Contributors

Peter W Glynn
Stanford University
- Publication Years1982 - 2024
- Publication counts201
- Citation count2,685
- Available for Download98
- Downloads (cumulative)24,922
- Downloads (12 months)2,688
- Downloads (6 weeks)453
- Average Downloads per Article254
- Average Citation per Article13
View Full Profile
Perwez Shahabuddin
Stanford University
- Publication Years1987 - 2007
- Publication counts43
- Citation count1,296
- Available for Download27
- Downloads (cumulative)10,048
- Downloads (12 months)616
- Downloads (6 weeks)74
- Average Downloads per Article372
- Average Citation per Article30
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Fast Simulation of Highly Dependable Systems with General Failure and Repair Processes

An approach for simulating models of highly dependable systems with general failure and repair time distribution is described. The approach combines importance sampling with event rescheduling in order to obtain variance reductions in such rare event ...
Highly reliable message-passing mechanism for cluster file system

With the increase in personal computer clusters in popularity and quantity, message passing between nodes has been an important issue for high failure rate in the network. File access in a cluster file system often contains several sub-operations; each ...
Quick Simulation Methods For Estimating The Unreliability Of Regenerative Models Of Large, Highly Reliable Systems

We investigate fast simulation techniques for estimating the unreliability in large Markovian models of highly reliable systems for which analytical/numerical techniques are difficult to apply. We first show mathematically that for “small” time horizons,...

Browse Theses

Sections

Cited By

Fast Simulation of Highly Dependable Systems with General Failure and Repair Processes

Highly reliable message-passing mechanism for cluster file system

Quick Simulation Methods For Estimating The Unreliability Of Regenerative Models Of Large, Highly Reliable Systems

Sections

Cited By

Save to Binder

Recommendations

Fast Simulation of Highly Dependable Systems with General Failure and Repair Processes

Highly reliable message-passing mechanism for cluster file system

Quick Simulation Methods For Estimating The Unreliability Of Regenerative Models Of Large, Highly Reliable Systems