[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Rigorous Development of an Embedded Fault-Tolerant System Based on Coordinated Atomic Actions

Published: 01 February 2002 Publication History

Abstract

This paper describes our experience using coordinated atomic (CA) actions as a system structuring tool to design and validate a sophisticated and embedded control system for a complex industrial application that has high reliability and safety requirements. Our study is based on an extended production cell model, the specification and simulator for which were defined and developed by FZI (Forschungszentrum Informatik, Germany). This "Fault-Tolerant Production Cell" represents a manufacturing process involving redundant mechanical devices (provided in order to enable continued production in the presence of machine faults). The challenge posed by the model specification is to design a control system that maintains specified safety and liveness properties even in the presence of a large number and variety of device and sensor failures. Based on an analysis of such failures, we provide in this paper details of: 1) a design for a control program that uses CA actions to deal with both safety-related and fault tolerance concerns and 2) the formal verification of this design based on the use of model-checking. We found that CA action structuring facilitated both the design and verification tasks by enabling the various safety problems (involving possible clashes of moving machinery) to be treated independently. Even complex situations involving the concurrent occurrence of any pairs of the many possible mechanical and sensor failures can be handled simply yet appropriately. The formal verification activity was performed in parallel with the design activity and the interaction between them resulted in a combined exercise in "design for validation"; formal verification was very valuable in identifying some very subtle residual bugs in early versions of our design which would have been difficult to detect otherwise.

References

[1]
R.H. Campbell and B. Randell, "Error Recovery in Asynchronous Systems,' IEEE Trans. Software Eng., vol. 12, no. 8, pp. 811-826, Aug. 1986.
[2]
E. Canver, 0. Schwier, A. Romanovsky, and J. Xu, "Formal Verification of CAA-Based Designs: The Fault-Tolerant Production Cell,' third year report, ESPRIT Long Term Research Project 20072 on Design for Validation, pp. 229-258, Nov. 1998
[3]
S.J Caughey, M.C. Little, and S.K. Shrivastava, "Checked Transactions in an Asynchronous Message Passing Environment," Proc. First IEEE Int'l Symp. Object-Oriented Real-Time Distribute,! Computing, pp. 222-229, Apr. 1998.
[4]
R. DeLemos and A. Romanovsky, "Co-Ordinated Atomic Action in Modelling Object CoOperation," Proc. First IEEE Int'l Sipup Object-Oriented Real-Time Distributed Computing, pp. 152-161, Apr 1998.
[5]
E.A. Emerson, 'Temporal and Modal Logic," Handbook of Theoretical Computer Science, J, van Leeuwen, ed., vol. 8, chaptt'i 16, pp. 995-1072, Elsevier Science Publishers, 1990.
[6]
J.N. Gray, "A Census of Tandem System Availability between 1985 and 1990," IEEE Trans. Reliability, vol. 39, no, 4, pp. 409-418, 1990.
[7]
K.H. Kim, "Approaches to Mechanization of the Conversation Scheme Based on Monitors," IEEE Trans. Software Eng., vol. 8, no. 3, pp. 189-197, 1982.
[8]
C. Lewerentz and T. Lindner, Formal Development of Reactive Systems. Case Study "Production Cell". Springer, Jan. 1995.
[9]
P. Liggesmeyer and M. Rothfelder, "Improving System Reliability with Automatic Fault Tree Generation," Proc. 28th Int'l Sgm p. Fault-Tolerant Computing, pp. 90-99, June 1998.
[10]
A. Ltzbeyer, "Task Description of a Fault-Tolerant Production Cell," version 1.6, available from http://www.fzi.de/prost/ projects/korsys/korsys.html, 1996.
[11]
A. Ltzbeyer and R. Mhlfeld, "Task Description of a Flexible Production Cell with Real Time Properties," FZI Technical Report, (ftp: / /ftp.fzi.de /pub/PROST/projects/korsys /task_descr_ flex_2._2.ps), 1996.
[12]
G. Matos and E. White, "Application of Dynamic Reconfiguration in the Design of FaultTolerant Production Cell," Proc. Fourth Int'l Coiif. Configurable Distributed Systems, pp. 2-9, 1998.
[13]
K.L. McMillan, "Symbolic Model Checking" revised version of PhD thesis, Carnegie Mellon Univ., Kluwer Academic Publishers, 1993.
[14]
G.D. Parrington, S.K. Shrivastava, S.M. Wheater, and M.C. Little, "The Design and Implementation of Arena," LISENIX Computing Systems J. vol. 8, no. 3, 1995.
[15]
B. Randell, "System Structure for Software Fault Tolerance," IEEE Trans Software Eng, vol. 1, no, 2, pp. 220-232, 1975.
[16]
A. Romanovsky, J. Xu, and B. Randell, "Coordinated Exception Handling in Real-Time Distributed Object Systems," Int'l J. Computer Systems Science & Eng, vol. 14, no. 4, pp. 197-207, July 1999.
[17]
D. Schwier, F. von Henke, J. Xu, R.J. Stroud, A. Romanovsky, and B. Randell, "Formalization of the CA Action Concept Based on Temporal Logic," second year report, ESPRIT Long Term Research Project 20072 on Design for Validation, pp. 3-15, Dec 1997.
[18]
J. Vachon, D. Buchs, M. Buffo, G.D.M. Serugendo, B. Randell, A. Romanovsky, R.J. Stroud, and J. Xu, "COALA-A Formal Language for Co-Ordinated Atomic Actions," third year report, ESPRIT Long Term Research Project 20072 on Design for Validation, pp. 43-86, Nov. 1998.
[19]
J. Xu, B. Randell, A, Romanovsky, C. Rubira, R.J. Stroud, and Z. Wu, "Fault Tolerance in Concurrent Object-Oriented Software through Co-Ordinated Error Recovery," Proc. 25th Int'l Symp. Fault-Tolerant Computing, pp. 499-508, June 1995.
[20]
J. Xu, A. Romanovsky, and B. Randell, "Co-Ordinated Exception Handling in Distributed Object Systems: From Model to System Implementation," Proc. 18th IEEE Int'l Conf. Distributed Computing Systems, pp. 12-21, May 1998,
[21]
J. Xu, A. Romanovsky, A. Zorzo, B. Randell, R.J. Stroud, and E. Canver, 'peveloping Control Software for Production Cell IF Failure Analysis and System Design Using CA Actions," third year report, ESPRIT Long Term Research Project 2(E72 on Design for Validation, pp. 167-188, Nov. 1998.
[22]
J. Xu, B. Randell, A. Romanovsky, R.J. Stroud, A.F. Zorzo, E Canver, and F. von Henke, "Rigorous Development of a SafetyCritical System Based on Coordinated Atomic Actions," Proc. 29th Int'l Symp. Fault-Tolerant Computing, pp. 68-75, June 1999
[23]
J. Xu, A. Romanovsky, and B. Randell, "Concurrent Exception Handling and Resolution in Distributed Object Systems," IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 10, pp. 1019-1032, Oct. 2000.
[24]
A.F. Zorzo, A. Romanovsky, J. Xu, B. Randell, R.J. Stroud, and I.S. Welch, "Using Co-Ordinated Atomic Actions to Design Complex Safety-C'i dical Systems: The Production Cell Case Study," Sottiiiiri' & E'cperience, vol. 29, no. 2, pp. 667-697, 1999.

Cited By

View all
  • (2016)On the role of latent design conditions in cyber-physical systems securityProceedings of the 2nd International Workshop on Software Engineering for Smart Cyber-Physical Systems10.1145/2897035.2897036(43-46)Online publication date: 14-May-2016
  • (2011)Achieving dependability in service-oriented systemsDependable and Historic Computing10.5555/2184121.2184161(504-522)Online publication date: 1-Jan-2011
  • (2009)Global-to-local approach to rigorously developing distributed system with exception handlingJournal of Computer Science and Technology10.1007/s11390-009-9220-924:2(238-249)Online publication date: 1-Mar-2009
  • Show More Cited By

Index Terms

  1. Rigorous Development of an Embedded Fault-Tolerant System Based on Coordinated Atomic Actions

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Computers
        IEEE Transactions on Computers  Volume 51, Issue 2
        Special issue on fault-tolerant embedded systems
        February 2002
        143 pages

        Publisher

        IEEE Computer Society

        United States

        Publication History

        Published: 01 February 2002

        Author Tags

        1. Concurrency
        2. coordinated atomic (CA) actions
        3. embedded fault-tolerant systems
        4. exception handling
        5. formal verification
        6. model checking
        7. object orientation
        8. reliability
        9. safety.

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2016)On the role of latent design conditions in cyber-physical systems securityProceedings of the 2nd International Workshop on Software Engineering for Smart Cyber-Physical Systems10.1145/2897035.2897036(43-46)Online publication date: 14-May-2016
        • (2011)Achieving dependability in service-oriented systemsDependable and Historic Computing10.5555/2184121.2184161(504-522)Online publication date: 1-Jan-2011
        • (2009)Global-to-local approach to rigorously developing distributed system with exception handlingJournal of Computer Science and Technology10.1007/s11390-009-9220-924:2(238-249)Online publication date: 1-Mar-2009
        • (2008)On foundation of engineering context-sensitive applicationsJournal of Mobile Multimedia10.5555/2011033.20110364:3(185-199)Online publication date: 1-Oct-2008
        • (2006)Recovery oriented programmingProceedings of the 8th international conference on Stabilization, safety, and security of distributed systems10.5555/1759076.1759089(152-168)Online publication date: 17-Nov-2006
        • (2006)Verification of coordinated exception handlingProceedings of the 2006 ACM symposium on Applied computing10.1145/1141277.1141434(680-685)Online publication date: 23-Apr-2006
        • (2005)An agent model for fault-tolerant systemsProceedings of the 2005 ACM symposium on Applied computing10.1145/1066677.1066696(60-65)Online publication date: 13-Mar-2005
        • (2004)The Guardian Model and Primitives for Exception Handling in Distributed SystemsIEEE Transactions on Software Engineering10.1109/TSE.2004.10630:12(1008-1022)Online publication date: 1-Dec-2004

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media