[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2837614.2837650acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article
Public Access

PSync: a partially synchronous language for fault-tolerant distributed algorithms

Published: 11 January 2016 Publication History

Abstract

Fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the network dropping messages or computers crashing. We introduce PSync, a domain specific language based on the Heard-Of model, which views asynchronous faulty systems as synchronous ones with an adversarial environment that simulates asynchrony and faults by dropping messages. We define a runtime system for PSync that efficiently executes on asynchronous networks. We formalise the relation between the runtime system and PSync in terms of observational refinement. The high-level lockstep abstraction introduced by PSync simplifies the design and implementation of fault-tolerant distributed algorithms and enables automated formal verification. We have implemented an embedding of PSync in the Scala programming language with a runtime system for partially synchronous networks. We show the applicability of PSync by implementing several important fault-tolerant distributed algorithms and we compare the implementation of consensus algorithms in PSync against implementations in other languages in terms of code size, runtime efficiency, and verification.

References

[1]
P. A. Abdulla, A. Collomb-Annichini, A. Bouajjani, and B. Jonsson. Using Forward Reachability Analysis for Verification of Lossy Channel Systems. FMSD, 25(1):39–65, 2004.
[2]
Y. Afek and E. Gafni. Asynchrony from synchrony. In D. Frey, M. Raynal, S. Sarkar, R. K. Shyamasundar, and P. Sinha, editors, Distributed Computing and Networking, 14th International Conference, ICDCN 2013, Mumbai, India, January 3-6, 2013. Proceedings, pages 225–239, 2013.
[3]
S. F. Allen, R. L. Constable, R. Eaton, C. Kreitz, and L. Lorigo. The nuprl open logical environment. In D. A. McAllester, editor, Automated Deduction - CADE-17, 17th International Conference on Automated Deduction, Pittsburgh, PA, USA, June 17-20, 2000, Proceedings, pages 170–176, 2000.
[4]
P. Alvaro, T. Condie, N. Conway, J. M. Hellerstein, and R. Sears. I do declare: consensus in a logic language. Operating Systems Review, 43 (4):25–30, 2009.
[5]
P. Alvaro, N. Conway, J. Hellerstein, and W. R. Marczak. Consistency analysis in bloom: a CALM and collected approach. In CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9-12, 2011, Online Proceedings, pages 249–260, 2011.
[6]
Y. Amir and J. Kirsch. Paxos for system builders: An overview. In Workshop on Large-Scale Distributed Systems and Middleware (LADIS 2008), Yorktown, NY, September 2008, 2008.
[7]
K. R. Apt and D. Kozen. Limits for automatic verification of finitestate concurrent systems. Inf. Process. Lett., 22(6):307–309, 1986.
[8]
J. L. Armstrong. The development of erlang. In S. L. P. Jones, M. Tofte, and A. M. Berman, editors, Proceedings of the 1997 ACM SIGPLAN International Conference on Functional Programming (ICFP ’97), Amsterdam, The Netherlands, June 9-11, 1997.
[9]
, pages 196–203. ACM, 1997.
[10]
M. P. Ashley-Rollman, P. Lee, S. C. Goldstein, P. Pillai, and J. Campbell. A language for large ensembles of independently executing nodes. In Logic Programming, 25th International Conference, ICLP 2009, Pasadena, CA, USA, July 14-17, 2009. Proceedings, pages 265– 280, 2009.
[11]
S. Basu, T. Bultan, and M. Ouederni. Synchronizability for verification of asynchronously communicating systems. In V. Kuncak and A. Rybalchenko, editors, Verification, Model Checking, and Abstract Interpretation - 13th International Conference, VMCAI 2012, Philadelphia, PA, USA, January 22-24, 2012. Proceedings, pages 56–71, 2012.
[12]
M. Ben-Or. Another advantage of free choice (extended abstract): Completely asynchronous agreement protocols. In PODC, pages 27– 30. ACM, 1983.
[13]
M. Biely, B. Charron-Bost, A. Gaillard, M. Hutle, A. Schiper, and J. Widder. Tolerating corrupted communication. In PODC, pages 244–253, 2007.
[14]
M. Biely, Z. Milosevic, N. Santos, and A. Schiper. S-paxos: Offloading the leader for high throughput state machine replication. In IEEE 31st Symposium on Reliable Distributed Systems, SRDS 2012, Irvine, CA, USA, October 8-11, 2012, pages 111–120, 2012.
[15]
M. Biely, P. Delgado, Z. Milosevic, and A. Schiper. Distal: A framework for implementing fault-tolerant distributed algorithms. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Budapest, Hungary, June 24-27, 2013, pages 1–8, 2013.
[16]
D. Brand and P. Zafiropulo. On Communicating Finite-State Machines. J. ACM, 30(2):323–342, 1983.
[17]
M. Burrows. The chubby lock service for loosely-coupled distributed systems. In OSDI, Berkeley, CA, USA, 2006. USENIX Association. ISBN 1-931971-47-1.
[18]
C. Cachin, R. Guerraoui, and L. Rodrigues. Reliable broadcast. In Introduction to Reliable and Secure Distributed Programming, pages 73–135. Springer Berlin Heidelberg, 2011. ISBN 978-3-642-15259-7.
[19]
M. Castro and B. Liskov. Practical byzantine fault tolerance. In M. I. Seltzer and P. J. Leach, editors, Proceedings of the Third USENIX Symposium on Operating Systems Design and Implementation (OSDI), New Orleans, Louisiana, USA, February 22-25, 1999, pages 173–186. USENIX Association, 1999.
[20]
T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225–267, 1996.
[21]
T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing, PODC ’07, pages 398–407, New York, NY, USA, 2007. ACM. ISBN 978-1- 59593-616-5.
[22]
M. Chaouch-Saad, B. Charron-Bost, and S. Merz. A reduction theorem for the verification of round-based distributed algorithms. In RP, volume 5797 of LNCS, pages 93–106, 2009.
[23]
B. Charron-Bost and S. Merz. Formal verification of a consensus algorithm in the heard-of model. Int. J. Software and Informatics, 3(2-3):273–303, 2009.
[24]
B. Charron-Bost and A. Schiper. The heard-of model: computing in distributed systems with benign faults. Distributed Computing, 22(1): 49–71, 2009.
[25]
B. Charron-Bost, H. Debrat, and S. Merz. Formal verification of consensus algorithms tolerating malicious faults. In SSS, pages 120– 134. Springer, 2011. ISBN 978-3-642-24549-7.
[26]
S. Chaudhuri. More choices allow more faults: Set consensus problems in totally asynchronous systems. Information and Computation, 105(1):132 – 158, 1993. ISSN 0890-5401.
[27]
A. Clement, E. L. Wong, L. Alvisi, M. Dahlin, and M. Marchetti. Making byzantine fault tolerant systems tolerate byzantine faults. In J. Rexford and E. G. Sirer, editors, Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2009, April 22-24, 2009, Boston, MA, USA, pages 153–168. USENIX Association, 2009.
[28]
H. Debrat and S. Merz. Verifying fault-tolerant distributed algorithms in the heard-of model. Archive of Formal Proofs, 2012.
[29]
A. Desai, P. Garg, and P. Madhusudan. Natural proofs for asynchronous programs using almost-synchronous reductions. In A. P. Black and T. D. Millstein, editors, Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 709––725, 2014.
[30]
Dialyzer. http://www.erlang.org/doc/man/dialyzer.html.
[31]
D. Dolev and E. N. Hoch. Byzantine self-stabilizing pulse in a bounded-delay model. In T. Masuzawa and S. Tixeuil, editors, Stabilization, Safety, and Security of Distributed Systems, 9th International Symposium, SSS 2007, Paris, France, November 14-16, 2007, Proceedings, pages 234–252. Springer, 2007.
[32]
D. Dolev, C. Dwork, and L. Stockmeyer. On the minimal synchronism needed for distributed consensus. In Foundations of Computer Science, 1983, 24th Annual Symposium on, pages 393–402, 1983.
[33]
E. D’Osualdo, J. Kochems, and C. L. Ong. Automatic verification of erlang-style concurrency. In Static Analysis - 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings, pages 454–476, 2013.
[34]
C. Dragoi, T. A. Henzinger, H. Veith, J. Widder, and D. Zufferey. A logic-based framework for verifying consensus algorithms. In K. L. McMillan and X. Rival, editors, VMCAI, pages 161–181. Springer, 2014.
[35]
C. Dwork, N. Lynch, and L. Stockmeyer. Consensus in the presence of partial synchrony. JACM, 35(2):288–323, Apr. 1988.
[36]
T. Elrad and N. Francez. Decomposition of distributed programs into communication-closed layers. Sci. Comput. Program., 2(3):155–173, 1982.
[37]
J. M. Falerio, S. K. Rajamani, K. Rajan, G. Ramalingam, and K. Vaswani. Generalized lattice agreement. In ACM Symposium on Principles of Distributed Computing, PODC ’12, Funchal, Madeira, Portugal, July 16-18, 2012, pages 125–134, 2012.
[38]
I. Filipovic, P. W. O’Hearn, N. Rinetzky, and H. Yang. Abstraction for concurrent objects. Theor. Comput. Sci., 411(51-52):4379–4398, 2010.
[39]
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374– 382, Apr. 1985.
[40]
L. Fredlund and H. Svensson. Mcerlang: a model checker for a distributed functional programming language. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP 2007, Freiburg, Germany, October 1-3, 2007, pages 125– 136, 2007.
[41]
E. Gafni. Round-by-round fault detectors: Unifying synchrony and asynchrony (extended abstract). In B. A. Coan and Y. Afek, editors, Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing, PODC ’98, Puerto Vallarta, Mexico, June 28 - July 2, 1998, pages 143–152, 1998.
[42]
J. Gray. Notes on data base operating systems. In R. Bayer, R. Graham, and G. Seegmüller, editors, Operating Systems, volume 60 of Lecture Notes in Computer Science, pages 393–481. Springer Berlin Heidelberg, 1978. ISBN 978-3-540-08755-7.
[43]
R. Guerraoui and L. Rodrigues. Introduction to Reliable Distributed Programming. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. ISBN 3540288457.
[44]
P. Haller and M. Odersky. Scala actors: Unifying thread-based and event-based programming. Theor. Comput. Sci., 410(2-3):202–220, 2009.
[45]
C. Hewitt, P. Bishop, and R. Steiger. A universal modular ACTOR formalism for artificial intelligence. In Proceedings of the 3rd International Joint Conference on Artificial Intelligence. Standford, CA, August 1973, pages 235–245, 1973.
[46]
C. A. R. Hoare. Communicating sequential processes. Commun. ACM, 21(8):666–677, 1978.
[47]
M. Hutle and A. Schiper. Communication predicates: A high-level abstraction for coping with transient and dynamic faults. In DSN, pages 92–101, 2007.
[48]
A. John, I. Konnov, U. Schmid, H. Veith, and J. Widder. Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In FMCAD, pages 201–209, 2013.
[49]
F. P. Junqueira, B. C. Reed, and M. Serafini. Zab: High-performance broadcast for primary-backup systems. In Proceedings of the 2011 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2011, Hong Kong, China, June 27-30 2011, pages 245– 256. IEEE, 2011.
[50]
C. E. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: language support for building distributed systems. In SIGPLAN Conference on Programming Language Design and Implementation, pages 179–188, 2007.
[51]
A. D. Kshemkalyani and M. Singhal. Distributed Computing: Principles, Algorithms, and Systems. Cambridge University Press, 2011.
[52]
L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16 (2):133–169, May 1998. ISSN 0734-2071.
[53]
L. Lamport. Distributed algorithms in TLA (abstract). In G. Neiger, editor, Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, July 16-19, 2000, Portland, Oregon, USA. ACM, 2000.
[54]
L. Lamport. The pluscal algorithm language. In Theoretical Aspects of Computing - ICTAC 2009, 6th International Colloquium, Kuala Lumpur, Malaysia, August 16-20, 2009. Proceedings, pages 36–60, 2009.
[55]
Y. A. Liu, S. D. Stoller, B. Lin, and M. Gorbovitski. From clarity to efficiency for distributed algorithms. In G. T. Leavens and M. B. Dwyer, editors, Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21-25, 2012, pages 395–410, 2012.
[56]
N. Lynch. Distributed Algorithms. Morgan Kaufman, 1996.
[57]
N. A. Lynch and M. R. Tuttle. Hierarchical correctness proofs for distributed algorithms. In F. B. Schneider, editor, Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, Vancouver, British Columbia, Canada, August 10-12, 1987, pages 137–151, 1987.
[58]
Y. Mao, F. P. Junqueira, and K. Marzullo. Mencius: Building efficient replicated state machine for wans. In R. Draves and R. van Renesse, editors, 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, Proceedings, pages 369–384. USENIX Association, 2008.
[59]
O. Mari´c, C. Sprenger, and D. Basin. Consensus refined. In J. Karlsson and Y. Amir, editors, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015.
[60]
V. R. Mark Bickford, Robert L. Constable. Logic of events, a framework to reason about distributed systems. In 2012 Languages for Distributed Algorithms Workshop, Philadelphia, PA, 2012.
[61]
H. Miller, P. Haller, E. Burmako, and M. Odersky. Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization. In A. L. Hosking, P. T. Eugster, and C. V. Lopes, editors, OOPSLA, pages 183–202, 2013.
[62]
R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, I. Inf. Comput., 100(1):1–40, 1992.
[63]
R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, II. Inf. Comput., 100(1):41–77, 1992.
[64]
I. Moraru, D. G. Andersen, and M. Kaminsky. There is more consensus in egalitarian parliaments. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 358–372, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2388- 8.
[65]
L. Moura and N. Bjorner. Z3: An efficient SMT solver. In C. Ramakrishnan and J. Rehof, editors, TACAS, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer Berlin Heidelberg, 2008. ISBN 978-3-540-78799-0.
[66]
P. Parvedy, M. Raynal, and C. Travers. Early-stopping k-set agreement in synchronous systems prone to any number of process crashes. In V. Malyshkin, editor, Parallel Computing Technologies, volume 3606 of Lecture Notes in Computer Science, pages 49–58. Springer Berlin Heidelberg, 2005. ISBN 978-3-540-28126-9.
[67]
N. Santoro and P. Widmayer. Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci., 384(2-3):232–249, 2007.
[68]
N. Schiper, V. Rahli, R. van Renesse, M. Bickford, and R. L. Constable. Developing correctly replicated databases using formal tools. In 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014, Atlanta, GA, USA, June 23-26, 2014, pages 395–406, 2014.
[69]
P. Schnoebelen. Revisiting Ackermann-Hardness for Lossy Counter Machines and Reset Petri Nets. In Mathematical Foundations of Computer Science 2010, 35th International Symposium, MFCS 2010, Brno, Czech Republic, August 23-27, 2010. Proceedings, pages 616– 628, 2010.
[70]
The Netty project. http://netty.io/.
[71]
S. Torstendahl. Open telecom platform. Ericsson Review, 1, 1997.
[72]
T. Tsuchiya and A. Schiper. Model checking of consensus algorithms. In 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), Beijing, China, October 10-12, 2007, pages 137–148. IEEE Computer Society, 2007.
[73]
T. Tsuchiya and A. Schiper. Using bounded model checking to verify consensus algorithms. In G. Taubenfeld, editor, Distributed Computing, 22nd International Symposium, DISC 2008, Arcachon, France, September 22-24, 2008. Proceedings, pages 466–480, 2008.
[74]
T. Tsuchiya and A. Schiper. Verification of consensus algorithms using satisfiability solving. Distributed Computing, 23(5-6):341–358, 2011.
[75]
J. Widder and U. Schmid. The Theta-Model: Achieving synchrony without clocks. Distributed Computing, 22(1):29–47, Apr. 2009.
[76]
J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In D. Grove and S. Blackburn, editors, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 357–368. ACM, 2015.

Cited By

View all
  • (2025)Formal verification of timely knowledge propagation in airborne networksScience of Computer Programming10.1016/j.scico.2024.103184239:COnline publication date: 1-Jan-2025
  • (2024)Compositional Verification of Composite Byzantine ProtocolsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690355(34-48)Online publication date: 2-Dec-2024
  • (2024)LiDO: Linearizable Byzantine Distributed Objects with Refinement-Based Liveness ProofsProceedings of the ACM on Programming Languages10.1145/36564238:PLDI(1140-1164)Online publication date: 20-Jun-2024
  • Show More Cited By

Index Terms

  1. PSync: a partially synchronous language for fault-tolerant distributed algorithms

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
    January 2016
    815 pages
    ISBN:9781450335492
    DOI:10.1145/2837614
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 1
      POPL '16
      January 2016
      815 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2914770
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 January 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Automated verification
    2. Consensus
    3. Fault-tolerant distributed algorithms
    4. Partial synchrony
    5. Round model

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    POPL '16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 824 of 4,130 submissions, 20%

    Upcoming Conference

    POPL '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)265
    • Downloads (Last 6 weeks)27
    Reflects downloads up to 17 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Formal verification of timely knowledge propagation in airborne networksScience of Computer Programming10.1016/j.scico.2024.103184239:COnline publication date: 1-Jan-2025
    • (2024)Compositional Verification of Composite Byzantine ProtocolsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690355(34-48)Online publication date: 2-Dec-2024
    • (2024)LiDO: Linearizable Byzantine Distributed Objects with Refinement-Based Liveness ProofsProceedings of the ACM on Programming Languages10.1145/36564238:PLDI(1140-1164)Online publication date: 20-Jun-2024
    • (2024)Coarser Equivalences for Causal ConcurrencyProceedings of the ACM on Programming Languages10.1145/36328738:POPL(911-941)Online publication date: 5-Jan-2024
    • (2023)Counterexample Driven Quantifier Instantiations with Applications to Distributed ProtocolsProceedings of the ACM on Programming Languages10.1145/36228647:OOPSLA2(1878-1904)Online publication date: 16-Oct-2023
    • (2023)Compiling Distributed System Models with PGoProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575695(159-175)Online publication date: 27-Jan-2023
    • (2023)Quorum Tree Abstractions of Consensus ProtocolsProgramming Languages and Systems10.1007/978-3-031-30044-8_13(337-362)Online publication date: 22-Apr-2023
    • (2022)All in One: Design, Verification, and Implementation of SNOW-optimal Read Atomic TransactionsACM Transactions on Software Engineering and Methodology10.1145/349451731:3(1-44)Online publication date: 7-Mar-2022
    • (2021)A multiparty session typing discipline for fault-tolerant event-driven distributed programmingProceedings of the ACM on Programming Languages10.1145/34855015:OOPSLA(1-30)Online publication date: 15-Oct-2021
    • (2021)Verifying Safety of Parameterized Heard-Of AlgorithmsNetworked Systems10.1007/978-3-030-67087-0_14(209-226)Online publication date: 14-Jan-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media