[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Randomized testing of distributed systems with probabilistic guarantees

Published: 24 October 2018 Publication History

Abstract

Several recently proposed randomized testing tools for concurrent and distributed systems come with theoretical guarantees on their success. The key to these guarantees is a notion of bug depth—the minimum length of a sequence of events sufficient to expose the bug—and a characterization of d-hitting families of schedules—a set of schedules guaranteed to cover every bug of given depth d. Previous results show that in certain cases the size of a d-hitting family can be significantly smaller than the total number of possible schedules. However, these results either assume shared-memory multithreading, or that the underlying partial ordering of events is known statically and has special structure. These assumptions are not met by distributed message-passing applications.
In this paper, we present a randomized scheduling algorithm for testing distributed systems. In contrast to previous approaches, our algorithm works for arbitrary partially ordered sets of events revealed online as the program is being executed. We show that for partial orders of width at most w and size at most n (both statically unknown), our algorithm is guaranteed to sample from at most w2 nd−1 schedules, for every fixed bug depth d. Thus, our algorithm discovers a bug of depth d with probability at least 1 / (w2 nd−1). As a special case, our algorithm recovers a previous randomized testing algorithm for multithreaded programs. Our algorithm is simple to implement, but the correctness arguments depend on difficult combinatorial results about online dimension and online chain partitioning of partially ordered sets.
We have implemented our algorithm in a randomized testing tool for distributed message-passing programs. We show that our algorithm can find bugs in distributed systems such as Zookeeper and Cassandra, and empirically outperforms naive random exploration while providing theoretical guarantees.

Supplementary Material

WEBM File (a160-kulahcioglu.webm)

References

[1]
Parosh Abdulla, Stavros Aronis, Bengt Jonsson, and Konstantinos Sagonas. 2014. Optimal Dynamic Partial Order Reduction. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 373–384.
[2]
Anurag Agarwal and Vijay K. Garg. 2007. Efficient dependency tracking for relevant events in concurrent systems. Distributed Computing 19, 3 (2007), 163–183.
[3]
Apache. 2012. Cassandra-2.0.0. Retrieved April 13, 2018 from http://archive.apache.org/dist/cassandra/2.0.0/
[4]
Bartłomiej Bosek, Stefan Felsner, Kamil Kloch, Tomasz Krawczyk, Grzegorz Matecki, and Piotr Micek. 2012. On-Line Chain Partitions of Orders: A Survey. Order 29, 1 (2012), 49–73.
[5]
Bartłomiej Bosek, Hal A. Kierstead, Tomasz Krawczyk, Grzegorz Matecki, and Matthew E. Smith. 2018. An Easy Subexponential Bound for Online Chain Partitioning. Electr. J. Comb. 25, 2 (2018), P2.28. http://www.combinatorics.org/ojs/ index.php/eljc/article/view/v25i2p28
[6]
Bartłomiej Bosek and Tomasz Krawczyk. 2010. The Sub-exponential Upper Bound for On-Line Chain Partitioning. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA. IEEE Computer Society, 347–354.
[7]
Ahmed Bouajjani and Michael Emmi. 2012. Bounded Phase Analysis of Message-Passing Programs. In TACAS ’12: Proc. 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (LNCS). Springer.
[8]
Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi, and Santosh Nagarakatte. 2010. A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY, USA, 167–178.
[9]
Dmitry Chistikov, Rupak Majumdar, and Filip Niksic. 2016. Hitting Families of Schedules for Asynchronous Programs. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II (Lecture Notes in Computer Science), Vol. 9780. Springer, 157–176.
[10]
Pantazis Deligiannis, Alastair F. Donaldson, Jeroen Ketema, Akash Lal, and Paul Thomson. 2015. Asynchronous Programming, Analysis and Testing with State Machines. SIGPLAN Not. 50, 6 (June 2015), 154–164.
[11]
Pantazis Deligiannis, Matt McCutchen, Paul Thomson, Shuo Chen, Alastair F. Donaldson, John Erickson, Cheng Huang, Akash Lal, Rashmi Mudduluru, Shaz Qadeer, and Wolfram Schulte. 2016. Uncovering Bugs in Distributed Storage Systems during Testing (Not in Production!). In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 249–262. https://www.usenix.org/conference/fast16/technical- sessions/presentation/ deligiannis
[12]
Ankush Desai, Shaz Qadeer, and Sanjit A. Seshia. 2015. Systematic Testing of Asynchronous Reactive Systems. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 73–83.
[13]
Robert P. Dilworth. 1950. A Decomposition Theorem for Partially Ordered Sets. Annals of Mathematics 51, 1 (1950), 161–166. http://www.jstor.org/stable/1969503
[14]
Dimitar Dimitrov, Martin T. Vechev, and Vivek Sarkar. 2015. Race Detection in Two Dimensions. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2015, Portland, OR, USA, June 13-15, 2015. ACM, 101–110.
[15]
Michael Emmi, Shaz Qadeer, and Zvonimir Rakamaric. 2011. Delay-bounded scheduling. In POPL ’11: Proc. 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 411–422.
[16]
Stefan Felsner. 1997. On-Line Chain Partitions of Orders. Theor. Comput. Sci. 175, 2 (1997), 283–292.
[17]
Cormac Flanagan and Patrice Godefroid. 2005. Dynamic Partial-order Reduction for Model Checking Software. In Proceedings of the 32Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’05). ACM, New York, NY, USA, 110–121.
[18]
Gatling Corp. 2011–2018. Gatling. Retrieved September 7, 2018 from https://gatling.io/
[19]
Patrice Godefroid. 1996. Partial-Order Methods for the Verification of Concurrent Systems: An Approach to the State-Explosion Problem. Springer-Verlag New York, Inc., Secaucus, NJ, USA.
[20]
F.P. Junqueira, B.C. Reed, and M. Serafini. 2011. Zab: High-performance Broadcast for Primary-backup Systems. In Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks (DSN ’11). IEEE Computer Society, 245–256.
[21]
Henry A. Kierstead. 1981. An Effective Version of Dilworth’s Theorem. Trans. Amer. Math. Soc. 268, 1 (1981), 63–77. http://www.jstor.org/stable/1998337
[22]
Kyle Kingsbury. 2013. Partitions for Everyone! Retrieved July 31, 2018 from https://www.infoq.com/presentations/ partitioning- comparison
[23]
Kyle Kingsbury. 2013–2018. Jepsen. Retrieved July 31, 2018 from http://jepsen.io/
[24]
Kamil Kloch. 2007. Online dimension of partially ordered sets. Reports on Mathematical Logic 42 (2007), 101–116. http: //www.iphils.uj.edu.pl/rml/rml- 42/a- klo- 42.htm
[25]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35–40.
[26]
Tanakorn Leesatapornwongsa, Mingzhe Hao, Pallavi Joshi, Jeffrey F. Lukman, and Haryadi S. Gunawi. 2014. SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 399–414. https: //www.usenix.org/conference/osdi14/technical- sessions/presentation/leesatapornwongsa
[27]
Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, and Haryadi S. Gunawi. 2016. TaxDC: A Taxonomy of NonDeterministic Concurrency Bugs in Datacenter Distributed Systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’16). ACM, New York, NY, USA, 517–530.
[28]
Rupak Majumdar and Filip Niksic. 2018. Why is random testing effective for partition tolerance bugs? PACMPL 2, POPL (2018), 46:1–46:24.
[29]
Rashmi Mudduluru, Pantazis Deligiannis, Ankush Desai, Akash Lal, and Shaz Qadeer. 2017. Lasso Detection using PartialState Caching. https://www.microsoft.com/en- us/research/publication/lasso- detection- using- partial- state- caching- 2/
[30]
Madanlal Musuvathi and Shaz Qadeer. 2007. Iterative Context Bounding for Systematic Testing of Multithreaded Programs. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, New York, NY, USA, 446–455.
[31]
Shaz Qadeer and Jakob Rehof. 2005. Context-Bounded Model Checking of Concurrent Software. In TACAS ’05: Proc. 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (LNCS), Vol. 3440. Springer, 93–107.
[32]
Veselin Raychev, Martin Vechev, and Manu Sridharan. 2013. Effective Race Detection for Event-driven Programs. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’13). ACM, New York, NY, USA, 151–166.
[33]
Samira Tasharofi, Michael Pradel, Yu Lin, and Ralph E. Johnson. 2013. Bita: Coverage-guided, automatic testing of actor programs. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013. IEEE, 114–124.
[34]
Xinhao Yuan, Junfeng Yang, and Ronghui Gu. 2018. Partial Order Aware Concurrency Sampling. In Computer Aided Verification - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II (Lecture Notes in Computer Science), Vol. 10982. Springer, 317–335.

Cited By

View all
  • (2024)Reward Augmentation in Reinforcement Learning for Testing Distributed SystemsProceedings of the ACM on Programming Languages10.1145/36897798:OOPSLA2(1928-1954)Online publication date: 8-Oct-2024
  • (2024)Generalized Concurrency Testing Tool for Distributed SystemsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685309(1861-1865)Online publication date: 11-Sep-2024
  • (2024)A Domain Specific Language for Testing Distributed Protocol ImplementationsNetworked Systems10.1007/978-3-031-67321-4_6(100-117)Online publication date: 29-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 2, Issue OOPSLA
November 2018
1656 pages
EISSN:2475-1421
DOI:10.1145/3288538
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2018
Published in PACMPL Volume 2, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed systems
  2. hitting families
  3. online chain partitioning
  4. partially ordered sets
  5. random testing

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)238
  • Downloads (Last 6 weeks)33
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Reward Augmentation in Reinforcement Learning for Testing Distributed SystemsProceedings of the ACM on Programming Languages10.1145/36897798:OOPSLA2(1928-1954)Online publication date: 8-Oct-2024
  • (2024)Generalized Concurrency Testing Tool for Distributed SystemsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685309(1861-1865)Online publication date: 11-Sep-2024
  • (2024)A Domain Specific Language for Testing Distributed Protocol ImplementationsNetworked Systems10.1007/978-3-031-67321-4_6(100-117)Online publication date: 29-May-2024
  • (2023)Psym: Efficient Symbolic Exploration of Distributed SystemsProceedings of the ACM on Programming Languages10.1145/35912477:PLDI(660-685)Online publication date: 6-Jun-2023
  • (2023)Randomized Testing of Byzantine Fault Tolerant AlgorithmsProceedings of the ACM on Programming Languages10.1145/35860537:OOPSLA1(757-788)Online publication date: 6-Apr-2023
  • (2023)Greybox Fuzzing of Distributed SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623097(1615-1629)Online publication date: 15-Nov-2023
  • (2023)Probabilistic Concurrency Testing for Weak Memory ProgramsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575729(603-616)Online publication date: 27-Jan-2023
  • (2023)Model Checking Guided Testing for Distributed SystemsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587442(127-143)Online publication date: 8-May-2023
  • (2023)VPP-ART: An Efficient Implementation of Fixed-Size-Candidate-Set Adaptive Random Testing Using Vantage Point PartitioningIEEE Transactions on Reliability10.1109/TR.2022.321860272:4(1632-1647)Online publication date: Dec-2023
  • (2023)Evolutionary Approach for Concurrency Testing of Ripple Blockchain Consensus Algorithm2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00009(36-47)Online publication date: May-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media