[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/197694.197745acmconferencesArticle/Chapter ViewAbstractPublication PagesadaConference Proceedingsconference-collections
Article
Free access

Transparent fault tolerance for distributed Ada applications

Published: 11 November 1994 Publication History

Abstract

The advent of open architectures and initiatives in massively parallel supercomputing, combined with the maturation of distributed processing methods and algorithms, has enabled the implementation of responsive software-based fault tolerance. Expanding capabilities of distributed Ada runtime environments further stimulate the incorporation of hardware fault tolerance into critical, realtime embedded systems. Through the integration of proven Ada program component distribution and virtually synchronous communication protocols, we have established a benchmark fault tolerant system, which layers transparently between an Ada application and the runtime environment. Such transparence allows rapid reconfiguration of distribution and fault tolerance characteristics without change to the source code, thus enhancing portability, scalability, and reuse.
The Ada Fault Tolerance project has implemented software technologies which penetrate the envelope of an Ada program to detect, diagnose, and recover from hardware faults. These realtime facilities interact with the Rational distributed application development and runtime environment systems to service replicated Ada software tasks (i.e., threads of control). The deployed system proves that all replicated threads, including those of independently distributed components, can achieve timely consensus during periodic fault detection cycles through transparently embedded voting protocols. Our implementation uses a hybrid redundancy computation strategy and relies on a communication layer which provides virtual synchrony via a causal multicast protocol.

References

[1]
Michael Barborak, Miroslaw Malek, and Anton Dahbura, "The Consensus Problem in Fault-Tolerant Computing", A CM Computing Surveys, ACM, New York, Vol. 25, No. 2, pages 171-220, June 1993.
[2]
Kenneth Binnan and Robert Cooper, "The ISIS Project: Real Experience with a Fault Tolerant Programming System", Department of Computer Science, Comell University. Ithaca. New York. TR 90-1138. July 1990.
[3]
Kenneth P. Birman and Thomas A. Joseph, "Exploiting Replication", Department of Computer Science, Cornell University, Ithaca, New York, TR 88-917, June 1988.
[4]
Kenneth Birman, Andre Schiper, and Pat Stephenson, "Fast Causal Multicast", Department of Computer Science, Comell University, Ithaca, New York, TR 90-1105, April 1990.
[5]
Marc Chereque, David Powell, Philippe Reynier, Jean-Luc Richier, and Jacques Voiron, "Active Replication in Delta-4", in Proc. 22nd. Intl. Syrup. on Fault-Tolerant Computing (FTCS.22), pages 28-37, iEEE, Boston, MA, July 1992.
[6]
R. Smart Cramer and Jonathan D. Dehn, "The Use of Ada to Achieve Fault Tolerance in AAS", in Proceedings Of TRI-Ada '92, pages 545-552, ACM, Orlando, FL, November 1992.
[7]
Flaviu Cristian, Richard de Beijer, and Shivakant Mishra, "Comparing How Well Asynchronous Atomic Broadcast Protocols Perform", in Proc. of the 3rd Workshop on Responsive Systems, pages 192-204, 1993.
[8]
Greg Eisenhauer and Rakesh Jha, "Honeywell Distributed Aria - Approach", Distributed Ada: developments and experiences - Proceedings of the Distributed Ada '89 Symposium, University of Southampton, 11-12 December 1989, edited by Judy Bishop, Cambridge University Press, Cambridge, MA, 1990, pages 141-16 I.
[9]
Steven G. Frison and John H. Wensley, "Interactive Consistency and Its Impact on the Design of TMR Systems", in Proc. 12th. Intl. Syrup. on Fault-Tolerant Computing (FTCS-12), pages 228-233, 1EEE, Santa Monica, CA, June 1982.
[10]
Rakesh Jha, j. Michael Kamrad II, and Dennis T. Cornhill, "Ada Program Partitioning Language: A Notation for Distributing Ada Programs", IEEE Transactions on Software Engineering, Vol. 15, No. 3, March 1989.
[11]
John C. Knight and John I. A. Urquhart, "On the Implementation and Use of Ada on Fault-Tolerant Distributed Systems", IEEE Transactions on Software Engineering, Vol. 13, No. 5, May 1987.
[12]
P. M~iller and G. Hommel, "GranAda: A Programming Environment for Implementing Distributed Real-Time Applications", in Proc. Intl. Syrnp. on Artt~cial Intelligence in Real-Time Control, IFAC, Delft, The Netherlands, June 1992.
[13]
BehroozParhami, "Optimal Algorithms for Exact, Inexact, and Approval Voting", in Proc. 22nd. Intl. Syrup. on Fault-Tolerant Computing (FTCS-22), pages 404-411, IEEE, Boston, MA, July 1992.
[14]
Pat Rogers, et al., "Demonstrable Fault Tolerance for Distributed Ada", in Proceedings Of TRI-Ada '93, pages 180-188, ACM, Seattle, WA, September 1993.
[15]
T. Basil Smith HI, et al., The Fault. Tolerant multiprocessor Computer, Noyes Publications, Park Ridge, NJ, 1986.

Cited By

View all
  • (2004)An Active Replication Scheme That Tolerates Failures in Distributed Embedded Real-Time SystemsDesign Methods and Applications for Distributed Embedded Systems10.1007/1-4020-8149-9_9(83-92)Online publication date: 2004
  • (1999)Redistribution in distributed AdaACM SIGAda Ada Letters10.1145/319295.319296XIX:3(3-8)Online publication date: 1-Sep-1999
  • (1999)Redistribution in distributed AdaProceedings of the 1999 annual ACM SIGAda international conference on Ada10.1145/319294.319296(3-8)Online publication date: 1-Sep-1999
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
TRI-Ada '94: Proceedings of the conference on TRI-Ada '94
November 1994
508 pages
ISBN:0897916662
DOI:10.1145/197694
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 1994

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

TriAda94
Sponsor:
TriAda94: Tri-Ada '94
November 6 - 11, 1994
Maryland, Baltimore, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2004)An Active Replication Scheme That Tolerates Failures in Distributed Embedded Real-Time SystemsDesign Methods and Applications for Distributed Embedded Systems10.1007/1-4020-8149-9_9(83-92)Online publication date: 2004
  • (1999)Redistribution in distributed AdaACM SIGAda Ada Letters10.1145/319295.319296XIX:3(3-8)Online publication date: 1-Sep-1999
  • (1999)Redistribution in distributed AdaProceedings of the 1999 annual ACM SIGAda international conference on Ada10.1145/319294.319296(3-8)Online publication date: 1-Sep-1999
  • (1997)Programming distributed fault tolerant systemsProceedings of the conference on TRI-Ada '9710.1145/269629.269632(21-29)Online publication date: 1-Nov-1997

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media