Failure Transparency in Stateful Dataflow Systems

Abstract

Failure transparency enables users to reason about distributed systems at a higher level of abstraction, where complex failure-handling logic is hidden. This is especially true for stateful dataflow systems, which are the backbone of many cloud applications. In particular, this paper focuses on proving failure transparency in Apache Flink, a popular stateful dataflow system. Even though failure transparency is a critical aspect of Apache Flink, to date it has not been formally proven. Showing that the failure transparency mechanism is correct, however, is challenging due to the complexity of the mechanism itself. Nevertheless, this complexity can be effectively hidden behind a failure transparent programming interface. To show that Apache Flink is failure transparent, we model it in small-step operational semantics. Next, we provide a novel definition of failure transparency based on observational explainability, a concept which relates executions according to their observations. Finally, we provide a formal proof of failure transparency for the implementation model; i.e., we prove that the failure-free model correctly abstracts from the failure-related details of the implementation model. We also show liveness of the implementation model under a fair execution assumption. These results are a first step towards a verified stack for stateful dataflow systems.

Martín Abadi and Leslie Lamport. The existence of refinement mappings. Theor. Comput. Sci., 82(2):253-284, 1991. URL: https://doi.org/10.1016/0304-3975(91)90224-P.
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow., 8(12):1792-1803, 2015. URL: https://doi.org/10.14778/2824032.2824076.
Bowen Alpern and Fred B. Schneider. Defining liveness. Inf. Process. Lett., 21(4):181-185, 1985. URL: https://doi.org/10.1016/0020-0190(85)90056-0.
Joe Armstrong. Erlang-a survey of the language and its industrial applications. In Proc. INAP, volume 96, pages 16-18, 1996.
Joe Armstrong, Robert Virding, and Mike Williams. Concurrent programming in ERLANG. Prentice Hall, 1993.
Giorgio Audrito, Roberto Casadei, Ferruccio Damiani, Guido Salvaneschi, and Mirko Viroli. The exchange calculus (XC): A functional programming language design for distributed collective systems. J. Syst. Softw., 210:111976, 2024. URL: https://doi.org/10.1016/J.JSS.2024.111976.
Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker. Fault-tolerance in the Borealis distributed stream processing system. In Fatma Özcan, editor, Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005, pages 13-24. ACM, 2005. URL: https://doi.org/10.1145/1066157.1066160.
Sebastian Burckhardt, Chris Gillum, David Justo, Konstantinos Kallas, Connor McMahon, and Christopher S. Meiklejohn. Durable functions: semantics for stateful serverless. Proc. ACM Program. Lang., 5(OOPSLA):1-27, 2021. URL: https://doi.org/10.1145/3485510.
Christian Cachin, Rachid Guerraoui, and Luís E. T. Rodrigues. Introduction to Reliable and Secure Distributed Programming (2. ed.). Springer, 2011. URL: https://doi.org/10.1007/978-3-642-15260-3.
Paris Carbone. Scalable and Reliable Data Stream Processing. PhD thesis, Royal Institute of Technology, Stockholm, Sweden, 2018. URL: https://nbn-resolving.org/urn:nbn:se:kth:diva-233527.
Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas Tzoumas. State management in Apache Flinkregistered: Consistent stateful distributed stream processing. Proc. VLDB Endow., 10(12):1718-1729, 2017. URL: http://www.vldb.org/pvldb/vol10/p1718-carbone.pdf, URL: https://doi.org/10.14778/3137765.3137777.
Paris Carbone, Gyula Fóra, Stephan Ewen, Seif Haridi, and Kostas Tzoumas. Lightweight asynchronous snapshots for distributed dataflows. CoRR, abs/1506.08603, 2015. URL: https://arxiv.org/abs/1506.08603.
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. Apache Flinktrademark: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28-38, 2015. URL: http://sites.computer.org/debull/A15dec/p28.pdf.
K. Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63-75, 1985. URL: https://doi.org/10.1145/214451.214456.
Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein, and Mae Milano. New directions in cloud programming. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org, 2021. URL: http://cidrdb.org/cidr2021/papers/cidr2021_paper16.pdf.
Joonwon Choi, Muralidaran Vijayaraghavan, Benjamin Sherman, Adam Chlipala, and Arvind. Kami: a platform for high-level parametric hardware specification and its modular verification. Proc. ACM Program. Lang., 1(ICFP):24:1-24:30, 2017. URL: https://doi.org/10.1145/3110268.
David Cunningham, David Grove, Benjamin Herta, Arun Iyengar, Kiyokuni Kawachiya, Hiroki Murata, Vijay A. Saraswat, Mikio Takeuchi, and Olivier Tardieu. Resilient X10: efficient failure-aware programming. In José E. Moreira and James R. Larus, editors, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, Orlando, FL, USA, February 15-19, 2014, pages 67-80. ACM, 2014. URL: https://doi.org/10.1145/2555243.2555248.
Gabriela Jacques da Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric Johnson, Michael Spicer, and Ahmet Erdem Sariyüce. Consistent regions: Guaranteed tuple processing in IBM streams. Proc. VLDB Endow., 9(13):1341-1352, 2016. URL: https://doi.org/10.14778/3007263.3007272.
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In Eric A. Brewer and Peter Chen, editors, 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, December 6-8, 2004, pages 137-150. USENIX Association, 2004. URL: http://www.usenix.org/events/osdi04/tech/dean.html.
Joscha Drechsler, Ragnar Mogk, Guido Salvaneschi, and Mira Mezini. Thread-safe reactive programming. Proc. ACM Program. Lang., 2(OOPSLA):107:1-107:30, 2018. URL: https://doi.org/10.1145/3276477.
E. N. Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375-408, 2002. URL: https://doi.org/10.1145/568522.568525.
John Field and Carlos A. Varela. Transactors: a programming model for maintaining globally consistent distributed state in unreliable environments. In Jens Palsberg and Martín Abadi, editors, Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2005, Long Beach, California, USA, January 12-14, 2005, pages 195-208. ACM, 2005. URL: https://doi.org/10.1145/1040305.1040322.
The Apache Software Foundation. Unaligned checkpoints flip-76. https://issues.apache.org/jira/browse/FLINK-14551, 2020. Accessed on 2024-03-28.
Marios Fragkoulis, Paris Carbone, Vasiliki Kalavri, and Asterios Katsifodimos. A survey on the evolution of stream processing systems. VLDB J., 33(2):507-541, 2024. URL: https://doi.org/10.1007/S00778-023-00819-8.
Yupeng Fu and Chinmay Soman. Real-time data infrastructure at Uber. In Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava, editors, SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, pages 2503-2516. ACM, 2021. URL: https://doi.org/10.1145/3448016.3457552.
Felix C. Gärtner. Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput. Surv., 31(1):1-26, 1999. URL: https://doi.org/10.1145/311531.311532.
Jonathan Goldstein, Ahmed S. Abdelhamid, Mike Barnett, Sebastian Burckhardt, Badrish Chandramouli, Darren Gehring, Niel Lebeck, Christopher Meiklejohn, Umar Farooq Minhas, Ryan Newton, Rahee Peshawaria, Tal Zaccai, and Irene Zhang. A.M.B.R.O.S.I.A: providing performant virtual resiliency for distributed applications. Proc. VLDB Endow., 13(5):588-601, 2020. URL: https://doi.org/10.14778/3377369.3377370.
Philipp Haller, Heather Miller, and Normen Müller. A programming model and foundation for lineage-based distributed computation. J. Funct. Program., 28:e7, 2018. URL: https://doi.org/10.1017/S0956796818000035.
Roope Kaivola, Rajnish Ghughal, Naren Narasimhan, Amber Telfer, Jesse Whittemore, Sudhindra Pandav, Anna Slobodová, Christopher Taylor, Vladimir A. Frolov, Erik Reeber, and Armaghan Naik. Replacing testing with formal verification in Intel CoreTM i7 processor execution engine validation. In Ahmed Bouajjani and Oded Maler, editors, Computer Aided Verification, 21st International Conference, CAV 2009, Grenoble, France, June 26 - July 2, 2009. Proceedings, volume 5643 of Lecture Notes in Computer Science, pages 414-429. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-02658-4_32.
Konstantinos Kallas, Haoran Zhang, Rajeev Alur, Sebastian Angel, and Vincent Liu. Executing microservice applications on serverless, correctly. Proc. ACM Program. Lang., 7(POPL):367-395, 2023. URL: https://doi.org/10.1145/3571206.
Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David A. Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: formal verification of an OS kernel. In Jeanna Neefe Matthews and Thomas E. Anderson, editors, Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, SOSP 2009, Big Sky, Montana, USA, October 11-14, 2009, pages 207-220. ACM, 2009. URL: https://doi.org/10.1145/1629575.1629596.
Jay Kreps, Neha Narkhede, Jun Rao, et al. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB, volume 11, pages 1-7. Athens, Greece, 2011.
Ramana Kumar, Magnus O. Myreen, Michael Norrish, and Scott Owens. CakeML: a verified implementation of ML. In Suresh Jagannathan and Peter Sewell, editors, The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '14, San Diego, CA, USA, January 20-21, 2014, pages 179-192. ACM, 2014. URL: https://doi.org/10.1145/2535838.2535841.
Leslie Lamport. Proving the correctness of multiprocess programs. IEEE Trans. Software Eng., 3(2):125-143, 1977. URL: https://doi.org/10.1109/TSE.1977.229904.
Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558-565, 1978. URL: https://doi.org/10.1145/359545.359563.
Leslie Lamport. The temporal logic of actions. ACM Trans. Program. Lang. Syst., 16(3):872-923, 1994. URL: https://doi.org/10.1145/177492.177726.
Leslie Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2):133-169, 1998. URL: https://doi.org/10.1145/279227.279229.
Leslie Lamport. Specifying Systems, The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley, 2002. URL: http://research.microsoft.com/users/lamport/tla/book.html.
Leslie Lamport and Stephan Merz. Auxiliary variables in TLA+. CoRR, abs/1703.05121, 2017. URL: https://arxiv.org/abs/1703.05121.
Peter Alan Lee and Thomas Anderson. Fault Tolerance, pages 51-77. Springer Vienna, Vienna, 1990. URL: https://doi.org/10.1007/978-3-7091-8990-0_3.
Xavier Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107-115, 2009. URL: https://doi.org/10.1145/1538788.1538814.
Barbara Liskov. Distributed programming in Argus. Commun. ACM, 31(3):300-312, 1988. URL: https://doi.org/10.1145/42392.42399.
David E. Lowell. Theory and practice of failure transparency. PhD thesis, University of Michigan, USA, 1999. URL: https://hdl.handle.net/2027.42/132190.
David E. Lowell and Peter M. Chen. The theory and practice of failure transparency. Technical report, University of Michigan, 1999.
Nancy Lynch and Mark Tuttle. An introduction to input/output automata. CWI-Quarterly, 2(3):219-246, 1989. Also available as MIT Technical Memo MIT/LCS/TM-373, Laboratory for Computer Science, Massachusetts Institute of Technology.
Nancy A. Lynch and Eugene W. Stark. A proof of the Kahn principle for input/output automata. Inf. Comput., 82(1):81-92, 1989. URL: https://doi.org/10.1016/0890-5401(89)90066-7.
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Ahmed K. Elmagarmid and Divyakant Agrawal, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pages 135-146. ACM, 2010. URL: https://doi.org/10.1145/1807167.1807184.
Yancan Mao, Zhanghao Chen, Yifan Zhang, Meng Wang, Yong Fang, Guanghui Zhang, Rui Shi, and Richard T. B. Ma. StreamOps: Cloud-native runtime management for streaming services in ByteDance. Proc. VLDB Endow., 16(12):3501-3514, 2023. URL: https://doi.org/10.14778/3611540.3611543.
Monica Marcus and Amir Pnueli. Using ghost variables to prove refinement. In Martin Wirsing and Maurice Nivat, editors, Algebraic Methodology and Software Technology, 5th International Conference, AMAST '96, Munich, Germany, July 1-5, 1996, Proceedings, volume 1101 of Lecture Notes in Computer Science, pages 226-240. Springer, 1996. URL: https://doi.org/10.1007/BFB0014319.
Ragnar Mogk, Joscha Drechsler, Guido Salvaneschi, and Mira Mezini. A fault-tolerant programming model for distributed interactive applications. Proc. ACM Program. Lang., 3(OOPSLA):144:1-144:29, 2019. URL: https://doi.org/10.1145/3360570.
Suvam Mukherjee, Nitin John Raj, Krishnan Govindraj, Pantazis Deligiannis, Chandramouleswaran Ravichandran, Akash Lal, Aseem Rastogi, and Raja Krishnaswamy. Reliable state machines: A framework for programming reliable cloud services. In Alastair F. Donaldson, editor, 33rd European Conference on Object-Oriented Programming, ECOOP 2019, July 15-19, 2019, London, United Kingdom, volume 134 of LIPIcs, pages 18:1-18:29. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPICS.ECOOP.2019.18.
Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: a timely dataflow system. In Michael Kaminsky and Mike Dahlin, editors, ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3-6, 2013, pages 439-455. ACM, 2013. URL: https://doi.org/10.1145/2517349.2522738.
Daniel Patterson and Amal Ahmed. The next 700 compiler correctness theorems (functional pearl). Proc. ACM Program. Lang., 3(ICFP):85:1-85:29, 2019. URL: https://doi.org/10.1145/3341689.
Kyriakos Psarakis, Wouter Zorgdrager, Marios Fragkoulis, Guido Salvaneschi, and Asterios Katsifodimos. Stateful entities: Object-oriented cloud applications as distributed dataflows. In Letizia Tanca, Qiong Luo, Giuseppe Polese, Loredana Caruccio, Xavier Oriol, and Donatella Firmani, editors, Proceedings 27th International Conference on Extending Database Technology, EDBT 2024, Paestum, Italy, March 25 - March 28, pages 15-21. OpenProceedings.org, 2024. URL: https://doi.org/10.48786/EDBT.2024.02.
Alastair Reid, Rick Chen, Anastasios Deligiannis, David Gilday, David Hoyes, Will Keen, Ashan Pathirane, Owen Shepherd, Peter Vrabel, and Ali Zaidi. End-to-end verification of processors with ISA-Formal. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II, volume 9780 of Lecture Notes in Computer Science, pages 42-58. Springer, 2016. URL: https://doi.org/10.1007/978-3-319-41540-6_3.
Mehul A. Shah, Joseph M. Hellerstein, and Eric A. Brewer. Highly-available, fault-tolerant, parallel dataflows. In Gerhard Weikum, Arnd Christian König, and Stefan Deßloch, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13-18, 2004, pages 827-838. ACM, 2004. URL: https://doi.org/10.1145/1007568.1007662.
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The Hadoop distributed file system. In Mohammed G. Khatib, Xubin He, and Michael Factor, editors, IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2012, Lake Tahoe, Nevada, USA, May 3-7, 2010, pages 1-10. IEEE Computer Society, 2010. URL: https://doi.org/10.1109/MSST.2010.5496972.
George Siachamis, Kyriakos Psarakis, Marios Fragkoulis, Arie van Deursen, Paris Carbone, and Asterios Katsifodimos. CheckMate: Evaluating checkpointing protocols for streaming dataflows. CoRR, abs/2403.13629, 2024. URL: https://doi.org/10.48550/arXiv.2403.13629.
Pedro F. Silvestre, Marios Fragkoulis, Diomidis Spinellis, and Asterios Katsifodimos. Clonos: Consistent causal recovery for highly-available streaming dataflows. In Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava, editors, SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, pages 1637-1650. ACM, 2021. URL: https://doi.org/10.1145/3448016.3457320.
Jonas Spenger, Paris Carbone, and Philipp Haller. Portals: An extension of dataflow streaming for stateful serverless. In Christophe Scholliers and Jeremy Singer, editors, Proceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2022, Auckland, New Zealand, December 8-10, 2022, pages 153-171. ACM, 2022. URL: https://doi.org/10.1145/3563835.3567664.
Olivier Tardieu, David Grove, Gheorghe-Teodor Bercea, Paul Castro, Jaroslaw Cwiklik, and Edward A. Epstein. Reliable actors with retry orchestration. Proc. ACM Program. Lang., 7(PLDI):1293-1316, 2023. URL: https://doi.org/10.1145/3591273.
Aleksey Veresov, Jonas Spenger, Paris Carbone, and Philipp Haller. Failure transparency in stateful dataflow systems (technical report), 2024. URL: https://arxiv.org/abs/2407.06738.
John von Neumann. Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34(34):43-98, 1956.
Stephanie Wang, John Liagouris, Robert Nishihara, Philipp Moritz, Ujval Misra, Alexey Tumanov, and Ion Stoica. Lineage stash: fault tolerance off the critical path. In Tim Brecht and Carey Williamson, editors, Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019, pages 338-352. ACM, 2019. URL: https://doi.org/10.1145/3341301.3359653.
John H. Wensley. SIFT: software implemented fault tolerance. In American Federation of Information Processing Societies: Proceedings of the AFIPS '72 Fall Joint Computer Conference, December 5-7, 1972, Anaheim, California, USA - Part I, volume 41 of AFIPS Conference Proceedings, pages 243-253. AFIPS / ACM / Thomson Book Company, Washington D.C., 1972. URL: https://doi.org/10.1145/1479992.1480025.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Steven D. Gribble and Dina Katabi, editors, Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, April 25-27, 2012, pages 15-28. USENIX Association, 2012. URL: https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia.
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Erich M. Nahum and Dongyan Xu, editors, 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010. USENIX Association, 2010. URL: https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets.
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. Discretized streams: fault-tolerant streaming computation at scale. In Michael Kaminsky and Mike Dahlin, editors, ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3-6, 2013, pages 423-438. ACM, 2013. URL: https://doi.org/10.1145/2517349.2522737.
Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M. Grulich, Sebastian Breß, Jonas Traub, and Volker Markl. The NebulaStream platform for data and application management in the internet of things. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. www.cidrdb.org, 2020. URL: http://cidrdb.org/cidr2020/papers/p7-zeuch-cidr20.pdf.

Failure Transparency in Stateful Dataflow Systems

Authors Aleksey Veresov , Jonas Spenger , Paris Carbone , Philipp Haller

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Failure Transparency in Stateful Dataflow Systems

Authors Aleksey Veresov , Jonas Spenger , Paris Carbone , Philipp Haller

File

Document Identifiers

Author Details

Funding

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message