[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

An abstract model of rollback recovery control in distributed systems

Published: 01 October 1992 Publication History

Abstract

This paper develops an abstract model which presents a method of uniform description of different rollback recovery control algorithms for distributed systems. We first developed a general definition of the distributed rollback recovery control problem. The concept of a distributed recovery control system (DRC system), consisting of distributed recovery control units (DRC units), is proposed to model recovery with various control granularities. Then, we developed a graph model, called dependency graph, for distributed rollback recovery control algorithms. An atomic subgraph is defined as a subgraph induced by a set of nodes which has no outgoing arcs to other nodes in the graph. Committing and aborting atomic actions can be modeled as identifying atomic subgraphs. Next, we defined two kinds of dependency graphs: checkpoint graphs and unit graphs, based on the dependency relation defined by rollback propagation. We have shown that various types of distributed recovery control algorithms can be classified based on the identifications of atomic subgraphs in these two graphs. Therefore, using the model may allow us to describe existing algorithms in a uniform way and, more importantly, to find new algorithms.

References

[1]
{Barigazzi83} G. Barigazzi and L. Strigini, "Application-Transparent Setting of Recovery Points", IEEE Proc. Third International Conf. on Distributed Computing Systems, pp. 48-54, 1983.
[2]
{Best81} E. Best and B. Randell, "A Formal Model of Atomicity in Asynchronous Systems", Acta Informatica, 16, pp. 93-124, 1981.
[3]
{Bhargava88} B. Bhargava and L. LiLien, "Feature Analysis of Selected Database Recovery Techniques", Pro. National Computer Conference, pp. 543-554, 1981.
[4]
{Bhargava88} B. Bhargava and S. Lian, "Independent Checkpointing and Concurrent Rollback Recovery for Distributed Systems - An Optimistic Approach", IEEE Proc. 7th Symp. on Reliability in Distributed Systems, pp. 3-12, Oct. 1988.
[5]
{Borg83} A. Borg et al., "A Message System Supporting Fault Tolerance", ACM Proc. 9th Symp. on Operating System Principles, pp. 90-99, Oct. 1983.
[6]
{Briatico84} D. Briatico, A. Giuffoletti and L. Simoncini, "A Distributed Domino-Effect Free Recovery Algorithm", IEEE Proc. 4th Symposium on Reliability in Distributed Software and Database Systems, pp. 207-215, Oct. 1984.
[7]
{Cao91} J. Cao, "On Correctness of Distributed Rollback Recovery", Proc. 14th Australia Computer Science Conference, pp. 39.1-39.10, Feb. 1991.
[8]
{Cao92} J. Cao, "Efficient Synchronous Checkpointing in Distributed Systems", Proco 15th Australia Computer Science Conference, pp. 165-179, Jan. 1992.
[9]
{Cao92b} J. Cao, "Domino-Effect Avoidance in Distributed Rollback Recovery", To be submitted.
[10]
{Ceri84} S. Ceri and G. Pelagatti, "Distributed Databases: Principles and Systems", McGrraw-Hill Inc., 1984.
[11]
{Chandy72} K.M. Chandy and C.V. Ramamoorthy, "Rollback and Recovery Strategies for Computer Programs ", IEEE Trans. on Computers, Vol. 21, No. 6, pp. 546-556, Jun. 1972.
[12]
{Chandy85} K.M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems", ACM Trans. on Computer Systems, Vol. 3, No. 1, Feb. 1985, pp. 63-75.
[13]
{Jalote85} P. Jalote, "Atomic Actions in Concurrent Systems", Ph.D Dissertation, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, 1985.
[14]
{Johnson90} D.B. Johnson and W. Zwaenepoel, "Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing", Journal of Algorithms, 11, pp. 462-491, 1990.
[15]
{Koo87} R. Koo and S. Toueg, "Checkpointing and Rollback-Recovery for Distributed Systems", IEEE Trans. on Software Eng., Vol. SE-13, No. 1, Jan. 1987.
[16]
{Lai87} T.H. Lai and T.H. Yang, "On Distributed Snapshots", Information Processing Letters 25 (1987) pp. 153-158, May, 1987.
[17]
{Lampson81} B.W. Lampson, "Atomic Transactions", Lecture Notes in Computer Sciences, Vol. 105, 1981.
[18]
{Leu89} P.J. Leu and B. Bhargava, "Concurrent Robust Checkpointing and Recovery in Distributed Systems", IEEE 4th Conf. on Data Eng. 1988, pp. 154-163.
[19]
{Leu89} P.J. Leu and B. Bhargava, "A Model for Concurrent Checkpointing and Recovery Using Transactions", IEEE 9th Int'l Conf. on Distributed Computing Systems, 1989, pp. 423-430.
[20]
{Liskov83} B.H. Liskov and R. Scheifler, "Guardians and Actions: Linguistic Support for Robust, Distributed Programs", ACM Trans. of Programming Languages and Systems, Vol. 5, No. 3, pp. 381-404, Jul. 1983.
[21]
{McDermid81} J.A. Mcdermid, "Checkpointing and Error Recovery in Distributed Systems", IEEE Proc. 2nd Intl. Conf. Distributed Computing Systems, 1981, pp. 271-282.
[22]
{Merlin78} P.M. Merlin and B. Randell, "State Restoration in Distributed Systems", IEEE Digest of Papers FTCS-8, pp. 128-134, 1978.
[23]
{Powel183} M.L. Powell and D.L. Presotto, "Publishing: A Reliable Broadcast Communication Mechanism ", ACM Proc. 9th Symp. on Operating Systems Principles, pp. 100-109, Oct. 1983.
[24]
{Randel175} B. Randell, "System Structure for Software Fault Tolerance", IEEE Trans. on Software Eng. Vol. SE-1, No. 2, pp. 220-232, 1975.
[25]
{Russell80} D.L. Russell, "State Restoration in Systems of Communicating Processes", IEEE Trans. on Software Eng. Vol. SE-6, No. 2, pp. 183-194, Mar. 1980.
[26]
{Strom85} R. Strom and S. Temini, "Optimistic Recovery in Distributed Systems", ACM Trans. Computer Systems, pp. 204-226, Aug. 1985.
[27]
{Wood80} W.G. Wood, "Recovery Control of Communicating Processes in a Distributed System", Tech. Report No. 158, 1980, Univ. of Newcastle upon Tyne.

Cited By

View all
  • (2013)Research on High Availability Mechanism in Distributed Data Stream Management SystemProceedings of the 2013 10th Web Information System and Application Conference10.1109/WISA.2013.61(289-293)Online publication date: 10-Nov-2013
  • (2013)A Programming Language Approach to Fault Tolerance for Fork-Join ParallelismProceedings of the 2013 International Symposium on Theoretical Aspects of Software Engineering10.1109/TASE.2013.22(105-112)Online publication date: 1-Jul-2013
  • (2012)Distributed ControlMobile Agents in Networking and Distributed Computing10.1002/9781118135617.ch8(189-218)Online publication date: 24-Aug-2012
  • Show More Cited By

Index Terms

  1. An abstract model of rollback recovery control in distributed systems

                    Recommendations

                    Comments

                    Please enable JavaScript to view thecomments powered by Disqus.

                    Information & Contributors

                    Information

                    Published In

                    cover image ACM SIGOPS Operating Systems Review
                    ACM SIGOPS Operating Systems Review  Volume 26, Issue 4
                    Oct. 1992
                    96 pages
                    ISSN:0163-5980
                    DOI:10.1145/142854
                    Issue’s Table of Contents

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    Published: 01 October 1992
                    Published in SIGOPS Volume 26, Issue 4

                    Check for updates

                    Qualifiers

                    • Article

                    Contributors

                    Other Metrics

                    Bibliometrics & Citations

                    Bibliometrics

                    Article Metrics

                    • Downloads (Last 12 months)68
                    • Downloads (Last 6 weeks)7
                    Reflects downloads up to 06 Jan 2025

                    Other Metrics

                    Citations

                    Cited By

                    View all
                    • (2013)Research on High Availability Mechanism in Distributed Data Stream Management SystemProceedings of the 2013 10th Web Information System and Application Conference10.1109/WISA.2013.61(289-293)Online publication date: 10-Nov-2013
                    • (2013)A Programming Language Approach to Fault Tolerance for Fork-Join ParallelismProceedings of the 2013 International Symposium on Theoretical Aspects of Software Engineering10.1109/TASE.2013.22(105-112)Online publication date: 1-Jul-2013
                    • (2012)Distributed ControlMobile Agents in Networking and Distributed Computing10.1002/9781118135617.ch8(189-218)Online publication date: 24-Aug-2012
                    • (2004)Checkpointing in hybrid distributed systems7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings.10.1109/ISPAN.2004.1300471(136-141)Online publication date: 2004
                    • (2002)A survey of rollback-recovery protocols in message-passing systemsACM Computing Surveys10.1145/568522.56852534:3(375-408)Online publication date: 1-Sep-2002
                    • (2001)Hybrid checkpoint protocol for supporting mobile-to-mobile communicationProceedings 15th International Conference on Information Networking10.1109/ICOIN.2001.905503(529-536)Online publication date: 2001
                    • (1997)Design and analysis of an efficient algorithm for coordinated checkpointing in distributed systemsProceedings. Advances in Parallel and Distributed Computing10.1109/APDC.1997.574042(261-268)Online publication date: 1997
                    • (1994)Replica determinism in distributed real-time systemsReal-Time Systems10.1007/BF010886296:3(289-316)Online publication date: 1-May-1994
                    • (1993)A methodology for the use of single level RDBMS software in a multi-level secured systemProceedings of 9th Annual Computer Security Applications Conference10.1109/CSAC.1993.315458(11-20)Online publication date: 1993

                    View Options

                    View options

                    PDF

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader

                    Login options

                    Media

                    Figures

                    Other

                    Tables

                    Share

                    Share

                    Share this Publication link

                    Share on social media