Abstract
The purpose of this paper is to summarize major issues in providing the capabilities for tolerance of both hardware faults and software faults in real-time computer systems (DCS’s). The paper starts with several guidelines considered to be highly useful in searching for effective system-level fault tolerance schemes. Some promising schemes are then reviewed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderson, T. and Lee, P.A., ‘Fault Tolerance: Principles and Practice’, Prentice-Hall Int’l, Inc., London, 1981.
Avizienis, A., “The N-Version Approach to Fault-Tolerant Software”, IEEE Trans, on Software Engineering, Vol. Se-11, No. 12, December 1985, pp. 1491–1501.
Avizienis, A., Kopetz, H., and Laprie, J.C. eds., ‘The Evolution of Fault-Tolerant Computing’, Springer-Verlag, New York, 1987.
Avi88] Avizienis, A., Lyu, M.R., and Schutz, W., “In Search of Effective Diversity: A Six- Language Study of Fault-Tolerant Flight Control Software”, Proc. FTCS-18, pp.15–22.
Carter, W.C., “Hardware Fault Tolerance”, Chapter 2 in Anderson, T., ed., ‘Resilient Computing Systems’, Vol. 1, Wiley-lnterscience, 1985, pp. 11–63.
Chu, W.W., Kim, K.H., and Mcdonald, W.C., “Testbed-based Evaluation of Design Techniques for Fault-Tolerant Real-Time Distributed Computer Systems”, Proceedings of the IEEE, Vol.75, No.5, Special Issue on Distributed Databases, May 1987, pp. 649–667.
Gregory, S.T. and Knight, J.C., “A new Linguistic Approach to Backward Error Recovery”, Proc. FTCS-15, 1985, pp. 404–409.
Hagelin, G., “ERICSSON Safety System for Railway Control”, in U. Voges ed., ‘Software Diversity in Computerized Control Systems’, Springer Verlag, Vienna, 1987, pp. 11–21.
Hecht, M., Hochhauser, So, and Hecht, H., “Extended Distributed Recovery Blocks for Nuclear Reactor Control and Safety Functions,” Final Report, Contract DE-AC03-87-ER80532, Dec. 87.
Hopkins, A.L.,, “FTMP-A highly Reliable Fault-Tolerant Multiprocessor for Aircraft”, Proc. IEEE, Vol. 66, No. 10, Oct. 1978, pp. 1221–1239.
Horning, J.J., Lauer, H.C., Melliar-Smith, P.M., and Randell, B., “A program structure for error detection and recovery”, Lecture Notes in Comp. Sci., vol. 16, Springer-Verlag, 1974, pp. 171–187.
Kelly, J.P.J,, “A Large Scale Second Generation Experiment in Multi-Version Software: Description and Early Results”, Proc. FTCS-18, pp.9–14.
Kim, K.H., “An Approach to Programmer-Transparent Coordination of Recovering Parallel Processes and Its Efficient Implementation Rules”, Proc. 1978 Int’l Conf. on Parallel Processing, August 1978, pp. 58–68.
Kim, K.H., ’Approaches to Mechanization of the Conversation Scheme Based on Monitor, IEEE Trans, on Software Eng., Vol. SE-8, No. 3, May 1982, pp. 189–197.
Kim, K.H., “Distributed Execution of Recovery Blocks: an Approach to Uniform Treatment of Hardware and Software Faults”, Proc. 4th Int’l Conf. on Distributed Computing System, May 1984, pp. 526–532.
Kim, K.H., Yang, S.M., and Kim, M.H., “Implementation of Concurrent Programming Language Facilities Supporting Conversation Structuring”, Proc. COMPSAC 85, Oct. 1985, pp. 445–453.
Kim, K.H., Heu, S., and Yang, S.M., “An Analysis of the Execution Overhead Inherent in the Conversation Scheme”, Proc. 5th Symp. on Reliability in Distributed Software and Database Systems, Jan. 1986, pp. 159–168.
Kim, K.H., You, J.H., and Abouelnaga, A., “A Scheme for Coordinated Execution of Independently Designed Recoverable Distributed Processes”, Proc. 16th Int’l Conf. on Fault- Tolerant Computing, July 1986, pp. 130–135.
Kim, K.H. and Yoon, J.C., “Approaches to Implementation of a Repairable Distributed Recovery Block Scheme”, Proc. 18th Int’l Symp. on Fault-Tolerant Computing (FTCS-18), pp.50–55.
Kim, K.H., “Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation”, IEEE Trans, on Software Engr., Vol. 14, No. 6, June 1988, pp. 810–821.
Kim, K.H., “Designing Fault Tolerance Capabilities into Real-Time Distributed Computer Systems”, Proc. IEEE Computer Society’s Workshop on Future Trends of Distributed Computing Systems in the 1990s, Sept. 1988, Hong Kong, pp.318–328.
Kim, K.H. and Welch, H.O., “Distributed Execution of Recovery Blocks: An Approach for Uniform Treatment of Hardware and Software Faults in Real-Time Applications”, IEEE Trans, on Computers, Vol. 38, No. 5, May 1989, pp. 626–636.
Kim, K.H., “An Approach to Experimental Evaluation of Real-Time Fault-Tolerant Distributed Computing Schemes”, IEEE Trans, on Software Engineering, Vol. 15, No. 6, June 1989, pp. 715–725.
Randell, B., “System structure for software fault tolerance”, IEEE Trans, on Software Engr., June 1975, pp. 220–232.
Stratus Continuous Processing’, Stratus Computer, Inc., 1984.
Tong, Z., Kain, R.Y., and Tsai, W.T., “A Loosely Synchronized Checkpointing Scheme for Rollback Recovery in Distributed Systems”, Tech. Report, TC-DS-13, Dept. of Electrical Engineering, Univ. of Minnesota, Minneapolis, MN 55455.
Toy, W.N., “Fault-Tolerant Design of Local ESS Processors”, Proceedings of the IEEE, Vol. 66, No. 10, Oct. 1978, pp. 1126–1145.
Toy, W.N., “Fault-Tolerant Computing”, A chapter in Advances in Computers, Vol. 26, Academic Press, 1987, pp. 201–279.
Yang,S.M. and Kim, K.H., “Implementation of the Conversation Scheme into Loosely Coupled Distributed Computer Systems”, Proc. 9th Int’l Conf. on Distributed Computing Systems, June 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1989 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, K.H. (1989). Approaches for System-Level Fault Tolerance in Distributed Real-Time Computer Systems. In: Görke, W., Sörensen, H. (eds) Fehlertolerierende Rechensysteme / Fault-tolerant Computing Systems. Informatik-Fachberichte, vol 214. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-75002-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-75002-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-51565-4
Online ISBN: 978-3-642-75002-1
eBook Packages: Springer Book Archive