[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/502034.502037acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
Article

BASE: using abstraction to improve fault tolerance

Published: 21 October 2001 Publication History

Abstract

Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication technique, BASE, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BASE reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or non-deterministic service implementations, which reduces the probability of common mode failures. We built an NFS service where each replica can run a different off-the-shelf file system implementation, and an object-oriented database where the replicas ran the same, non-deterministic implementation. These examples suggest that our technique can be used in practice --- in both cases, the implementation required only a modest amount of new code, and our performance results indicate that the replicated services perform comparably to the implementations that they reuse.

References

[1]
A. Adya, R. Gruber, B. Liskov, and U. Maheshwari. Efficient Optimistic Concurrency Control using Loosely Synchronized Clocks. In Proceedings of A CM SIGMOD International Conference on Management of Data, pages 23-34, San Jose, CA, May 1995.]]
[2]
T. Bressoud and F. Schneider. Hypervisor-based Fault Tolerance. In Proceeding of the 15th ACM Symposium on Operating System Principles, pages 1-11, Dec. 1995.]]
[3]
B. Callaghan. NFS Illustrated. Addison-Wesley, 1999.]]
[4]
M. J. Carey, D. J. DeWitt, and J. F. Naughton. The 007 Benchmark. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 12-21, Washington D.C., May 1993.]]
[5]
M. Castro. Practical Byzantine Fault-Tolerance. PhD thesis, Massachusetts Institute of Technology, 2000.]]
[6]
M. Castro, A. Adya, B. Liskov, and A. Myers. HAC: Hybrid Adaptive Caching for Distributed Storage Systems. In Proceeding of the 16th ACM Symposium on Operating System Principles, pages 102-115, St. Malo, France, Oct. 1997.]]
[7]
M. Castro and B. Liskov. Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, New Orleans, LA, Feb. 1999.]]
[8]
M. Castro and B. Liskov. Proactive recovery in a Byzantine-fault-tolerant system. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation, San Diego, CA, Oct. 2000.]]
[9]
L. Chen and A. Avizienis. N-Version Programming: A Fanlt-Tolerance Approach to Reliability of Software Operation. In Fault Tolerant Computing, FTCS-8, pages 3-9, 1978.]]
[10]
E. Cooper. Replicated Distributed Programs. In Proceedings of the lOth ACM Symposium on Operating Systems Principles, pages 63-78, Dec. 1985.]]
[11]
K. Geiger. Inside ODBC. Microsoft Press, 1995.]]
[12]
S. Ghemawat. The Modified Object Buffer: a Storage Management Technique for Object-Oriented Databases. PhD thesis, Massachusetts Institute of Technology, 1995.]]
[13]
J. Gray and D. Siewiorek. High-availability computer systems. IEEE Computer, 24(9):39-48, Sept. 1991.]]
[14]
J. N. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., 1993.]]
[15]
J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems, 6(1):51-81, Feb. 1988.]]
[16]
Y. Huang, C. Kintala, N. Kolettis, and N. D. Fulton. Software rejuvenation: Analysis, modules and applications. In Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing, pages 381-390, Pasadena, CA, June 1995.]]
[17]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the A CM, 21(7):558-565, July 1978.]]
[18]
B. Liskov, M. Castro, L. Shrira, and A. Adya. Providing persistent objects in distributed systems. In Proceedings of the 13th European Conference on Object-Oriented Programming (ECOOP '99), Lisbon, Portugal, June 1999.]]
[19]
B. Liskov, S. Ghemawat, R. Gruber, P. Johnson, L. Shrira, and M. Williams. Replication in the Harp File System. In Proceeding of the 13th ACM Symposium on Operating System Principles, pages 226-238. ACM Press, 1991.]]
[20]
B. Liskov and J. Guttag. Program Development in Java: Abstraction, Specification, and Object-Oriented Design. Addison-Wesley, 2000.]]
[21]
S. Maffeis. Adding group communication and fault tolerance to CORBA. In Proceedings of the Pad USENIX Conference on Object-Oriented Technologies, pages 135-146, June 1995.]]
[22]
K. Marzullo and F. Schmuck. Supplying high availability with a standard network file system. In Proceedings of the 8th International Conference on Distributed Computing Systems, pages 447-453. IEEE, June 1988.]]
[23]
R. Minnich. The Linux BIOS Home Page. http://www.acl.lanl.gov/linuxbios, 2000.]]
[24]
L. Moser, P. Melliar-Smith, and P. Narasimhan. Consistent object replication in the eternal system. Theory and Practice of Object Systems, 4(2):81-92, Jan. 1998.]]
[25]
P. Narasimhan, K. Kihlstrom, L. Moser, and P. Melliar-Smith. Providing Support for Survivable CORBA Applications with the Immune System. In Proceedings of the 19th IEEE International Conference on Distributed Computing Systems, pages 507-516, May 1999.]]
[26]
Network working group request for comments: 1014. XDR: External data representation standard, June 1987.]]
[27]
Network working group request for comments: 1094. NFS: Network file system protocol specification, March 1989.]]
[28]
Object Management Group. The Common Object Request Broker: Architecture and Specification. OMG techical committee document formal/98-12-01, June 1999.]]
[29]
Object Management Group. Fault Tolerant CORBA. OMG techical committee document orbos/2000-04-04, Mar. 2000.]]
[30]
J. Ousterhout. Why Aren't Operating Systems Getting Faster as Fast as Hardware? In Proceedings of USENIX Summer Conference, pages 247-256, Anaheim, CA, June 1990.]]
[31]
M. Pease, R. Shostak, and L. Lamport. Reaching Agreement in the Presence of Faults. Journal of the ACM, 27(2):228-234, Apr. 1980.]]
[32]
R. Rodrigues. Combining abstraction with Byzantine fault-tolerance. Master's thesis, Massachusetts Institute of Technology, 2001.]]
[33]
A. Romanovsky. Faulty version recovery in object-oriented N-version programming. IEE Proceedings - Software, 147(3):81-90, June 2000.]]
[34]
F. Salles, M. Rodriguez, J. Fabre, and J. Arlat. MetaKernels and Fault Containment Wrappers. In Proceedings the 29th Annual International Symposium on Fault-Tolerant Computing, pages 22-29, June 1999.]]
[35]
F. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299-319, Dec. 1990.]]
[36]
K. Tso and A. Avizienis. Community error recovery in N-version software: A design study with experimentation. In Proceedings of the 17th Annual International Symposium on Fault-Tolerant Computing, pages 127-133, Pittsburgh, PA, July 1987.]]

Cited By

View all
  • (2020)The Effects of Computer Science Stereotypes and Interest on Middle School Boys’ Career IntentionsACM Transactions on Computing Education10.1145/339496420:3(1-15)Online publication date: 16-Jun-2020
  • (2020)n-m-Variant SystemsProceedings of the Tenth ACM Conference on Data and Application Security and Privacy10.1145/3374664.3375745(235-246)Online publication date: 16-Mar-2020
  • (2019)SBFT: A Scalable and Decentralized Trust Infrastructure2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2019.00063(568-580)Online publication date: Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles
October 2001
254 pages
ISBN:1581133898
DOI:10.1145/502034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SOSP01
Sponsor:
SOSP01: 18th Symposium on Operating System Principles
October 21 - 24, 2001
Alberta, Banff, Canada

Acceptance Rates

SOSP '01 Paper Acceptance Rate 17 of 85 submissions, 20%;
Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)5
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)The Effects of Computer Science Stereotypes and Interest on Middle School Boys’ Career IntentionsACM Transactions on Computing Education10.1145/339496420:3(1-15)Online publication date: 16-Jun-2020
  • (2020)n-m-Variant SystemsProceedings of the Tenth ACM Conference on Data and Application Security and Privacy10.1145/3374664.3375745(235-246)Online publication date: 16-Mar-2020
  • (2019)SBFT: A Scalable and Decentralized Trust Infrastructure2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2019.00063(568-580)Online publication date: Jun-2019
  • (2016)Towards service continuity for transactional applications via diverse device driversInternational Journal of Information and Computer Security10.1504/IJICS.2016.0804288:4(382-400)Online publication date: 1-Jan-2016
  • (2016)SAREK: Optimistic Parallel Ordering in Byzantine Fault Tolerance2016 12th European Dependable Computing Conference (EDCC)10.1109/EDCC.2016.36(77-88)Online publication date: Sep-2016
  • (2015)Understanding issue correlationsProceedings of the Sixth ACM Symposium on Cloud Computing10.1145/2806777.2806937(2-15)Online publication date: 27-Aug-2015
  • (2015)A Cycle-Time-Analysis Model for Byzantine Fault ToleranceProceedings of the ICA3PP International Workshops and Symposiums on Algorithms and Architectures for Parallel Processing - Volume 953210.1007/978-3-319-27161-3_60(659-668)Online publication date: 18-Nov-2015
  • (2014)Survivable SCADA Via Intrusion-Tolerant ReplicationIEEE Transactions on Smart Grid10.1109/TSG.2013.22695415:1(60-70)Online publication date: Jan-2014
  • (2013)HARDFSProceedings of the 11th USENIX conference on File and Storage Technologies10.5555/2591272.2591284(105-118)Online publication date: 12-Feb-2013
  • (2013)Towards transparent hardening of distributed systemsProceedings of the 9th Workshop on Hot Topics in Dependable Systems10.1145/2524224.2524230(1-6)Online publication date: 3-Nov-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media