[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1095810.1095817acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
Article

Fault-scalable Byzantine fault-tolerant services

Published: 20 October 2005 Publication History

Abstract

A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine fault-tolerant services. The optimistic quorum-based nature of the Q/U protocol allows it to provide better throughput and fault-scalability than replicated state machines using agreement-based protocols. A prototype service built using the Q/U protocol outperforms the same service built using a popular replicated state machine implementation at all system sizes in experiments that permit an optimistic execution. Moreover, the performance of the Q/U protocol decreases by only 36% as the number of Byzantine faults tolerated increases from one to five, whereas the performance of the replicated state machine decreases by 83%.

References

[1]
M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. Lazy verification in fault-tolerant distributed storage systems. Symposium on Reliable Distributed Systems, 2005.]]
[2]
M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. The Read/Conditional-Write and Query/Update protocols. Technical report CMU--PDL--05--107. Parallel Data Laboratory, Carnegie Mellon University, Pittsburgh, PA, August 2005.]]
[3]
A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. FARSITE: federated, available, and reliable storage for an incompletely trusted environment. Symposium on Operating Systems Design and Implementation, pages 1--15. USENIX Association, 2002.]]
[4]
M. K. Aguilera, W. Chen, and S. Toueg. Failure detection and consensus in the crash-recovery model. Distributed Computing, 13(2):99--125. Springer-Verlag, 2000.]]
[5]
M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for message authentication. Advances in Cryptology - CRYPTO, pages 1--15. Springer-Verlag, 1996.]]
[6]
P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. Addison-Wesley, Reading, Massachusetts, 1987.]]
[7]
G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4):824--840. ACM, October 1985.]]
[8]
C. Cachin and J. A. Poritz. Secure intrusion-tolerant replication on the Internet. International Conference on Dependable Systems and Networks, pages 167--176. IEEE, 2002.]]
[9]
M. Castro and B. Liskov. Practical Byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4):398--461, November 2002.]]
[10]
G. Chockler, D. Malkhi, and M. Reiter. Backoff protocols for distributed mutual exclusion and ordering. International Conference on Distributed Computing Systems, pages 11--20. IEEE, 2001.]]
[11]
C. P. Fry and M. K. Reiter. Nested objects in a Byzantine quorum-replicated system. Symposium on Reliable Distributed Systems. IEEE, 2004.]]
[12]
J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. ACM SIGMOD International Conference on Management of Data. Published as SIGMOD Record, 25(2):173--182. ACM, June 1996.]]
[13]
S. D. Gribble, E. A. Brewer, J. M. Hellerstein, and D. Culler. Scalable, distributed data structures for internet service construction. Symposium on Operating Systems Design and Implementation, 2000.]]
[14]
M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: double-ended queues as an example. International Conference on Distributed Computing Systems, pages 522--529. IEEE, 2003.]]
[15]
M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463--492. ACM, July 1990.]]
[16]
R. Jiménez-Peris, M. Patiño-Martínez, G. Alonso, and B. Kemme. Are quorums an alternative for data replication? ACM Transactions on Database Systems (TODS), 28(3):257--294. ACM, September 2003.]]
[17]
J. Katcher. PostMark: a new file system benchmark. Technical report TR3022. Network Appliance, October 1997.]]
[18]
K. P. Kihlstrom, L. E. Moser, and P. M. Melliar-Smith. The SecureRing group communication system. ACM Transactions on Information and Systems Security, 1(4):371--406. IEEE, November 2001.]]
[19]
H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems, 6(2):213--226, June 1981.]]
[20]
K. Kursawe. Optimistic Byzantine agreement. Symposium on Reliable Distributed Systems, pages 262--267. IEEE, 2002.]]
[21]
L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169. ACM Press, May 1998.]]
[22]
L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382--401. ACM, July 1982.]]
[23]
L. L. Lamport. The implementation of reliable distributed multiprocess systems. Computer Networks, 2:95--114, 1978.]]
[24]
W. Litwin and T. Schwarz. LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes. ACM SIGMOD International Conference on Management of Data, pages 237--248. ACM, 2000.]]
[25]
J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou. Boxwood: abstractions as the foundation for storage infrastructure. Symposium on Operating Systems Design and Implementation, pages 105--120. USENIX Association, 2004.]]
[26]
D. Malkhi and M. Reiter. Byzantine quorum systems. Distributed Computing, 11(4):203--213. Springer-Verlag, 1998.]]
[27]
D. Malkhi, M. Reiter, and A. Wool. The load and availability of Byzantine quorum systems. SIAM Journal of Computing, 29(6):1889--1906. Society for Industrial and Applied Mathematics, April 2000.]]
[28]
D. Malkhi and M. K. Reiter. An architecture for survivable coordination in large distributed systems. IEEE Transactions on Knowledge and Data Engineering, 12(2). IEEE, April 2000.]]
[29]
J.-P. Martin and L. Alvisi. Fast Byzantine consensus. International Conference on Dependable Systems and Networks. IEEE, 2005.]]
[30]
J.-P. Martin, L. Alvisi, and M. Dahlin. Minimal Byzantine storage. International Symposium on Distributed Computing, 2002.]]
[31]
R. Morris. Storage: from atoms to people. Keynote address at Conference on File and Storage Technologies, January 2002.]]
[32]
M. Naor and A. Wool. The load, capacity, and availability of quorum systems. SIAM Journal on Computing, 27(2):423--447. SIAM, April 1998.]]
[33]
M. K. Reiter. The Rampart toolkit for building high-integrity services. Theory and Practice in Distributed Systems(Lecture Notes in Computer Science 938), pages 99--110, 1995.]]
[34]
S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz. Pond: the OceanStore prototype. Conference on File and Storage Technologies. USENIX Association, 2003.]]
[35]
R. L. Rivest. The MD5 message-digest algorithm, RFC--1321. Network Working Group, IETF, April 1992.]]
[36]
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Computing Surveys, 22(4):299--319, December 1990.]]
[37]
P. Thambidurai and Y.-K. Park. Interactive consistency with multiple failure modes. Symposium on Reliable Distributed Systems, pages 93--100. IEEE, 1988.]]
[38]
R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. Symposium on Operating Systems Design and Implementation, pages 91--104. USENIX Association, 2004.]]
[39]
X. Wang, D. Feng, X. Lai, and H. Yu. Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD. Report 2004/199. Cryptology ePrint Archive, August 2004. http://eprint.iacr.org/.]]
[40]
A. Wool. Quorum systems in replicated databases: science or fiction. Bull. IEEE Technical Committee on Data Engineering, 21(4):3--11. IEEE, December 1998.]]
[41]
L. Zhou, F. B. Schneider, and R. V. Renesse. COCA: A secure distributed online certification authority. ACM Transactions on Computer Systems, 20(4):329--368. ACM, November 2002.]]

Cited By

View all
  • (2024)DARE to Agree: Byzantine Agreement With Optimal Resilience and Adaptive CommunicationProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662792(145-156)Online publication date: 17-Jun-2024
  • (2024)Consensus-Agnostic State-Machine ReplicationProceedings of the 25th International Middleware Conference10.1145/3652892.3700776(341-353)Online publication date: 2-Dec-2024
  • (2024)OsirisBFT: Say No to Task Replication for Scalable Byzantine Fault Tolerant AnalyticsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638468(94-108)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles
October 2005
259 pages
ISBN:1595930795
DOI:10.1145/1095810
  • cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 39, Issue 5
    SOSP '05
    December 2005
    290 pages
    ISSN:0163-5980
    DOI:10.1145/1095809
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. byzantine fault-tolerance
  2. fault-scalability
  3. quorums
  4. replicated state machines
  5. services

Qualifiers

  • Article

Conference

SOSP05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)75
  • Downloads (Last 6 weeks)8
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DARE to Agree: Byzantine Agreement With Optimal Resilience and Adaptive CommunicationProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662792(145-156)Online publication date: 17-Jun-2024
  • (2024)Consensus-Agnostic State-Machine ReplicationProceedings of the 25th International Middleware Conference10.1145/3652892.3700776(341-353)Online publication date: 2-Dec-2024
  • (2024)OsirisBFT: Say No to Task Replication for Scalable Byzantine Fault Tolerant AnalyticsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638468(94-108)Online publication date: 2-Mar-2024
  • (2024)TELL: Efficient Transaction Execution Protocol Towards Leaderless Consensus2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00154(1902-1915)Online publication date: 13-May-2024
  • (2023)As easy as ABCJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104743181:COnline publication date: 1-Nov-2023
  • (2022)Narwhal and TuskProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519594(34-50)Online publication date: 28-Mar-2022
  • (2022)A Faster Blockchain Sharding Protocol for Decentralized Ledger2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom56396.2022.00120(864-873)Online publication date: Dec-2022
  • (2022)Quality of Automated Program Repair on Real-World DefectsIEEE Transactions on Software Engineering10.1109/TSE.2020.299878548:2(637-661)Online publication date: 1-Feb-2022
  • (2022)HAMRAZ: Resilient Partitioning and Replication2022 IEEE Symposium on Security and Privacy (SP)10.1109/SP46214.2022.9833661(2267-2284)Online publication date: May-2022
  • (2021)BasilProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483552(1-17)Online publication date: 26-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media