[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Validating SMT Solvers for Correctness and Performance via Grammar-Based Enumeration

Published: 08 October 2024 Publication History

Abstract

We introduce ET, a grammar-based enumerator for validating SMT solver correctness and performance. By compiling grammars of the SMT theories to algebraic datatypes, ET leverages the functional enumerator FEAT. ET is highly effective at bug finding and has many complimentary benefits. Despite the extensive and continuous testing of the state-of-the-art SMT solvers Z3 and cvc5, ET found 102 bugs, out of which 76 were confirmed and 32 were fixed. Moreover, ET can be used to understand the evolution of solvers. We derive eight grammars realizing all major SMT theories including the booleans, integers, reals, realints, bit-vectors, arrays, floating points, and strings. Using ET, we test all consecutive releases of the SMT solvers Z3 and CVC4/cvc5 from the last six years (61 versions) on 8 million formulas, and 488 million solver calls. Our results suggest improved correctness in recent versions of both solvers but decreased performance in newer releases of Z3 on small timeouts (since z3-4.8.11) and regressions in early cvc5 releases on larger timeouts. Due to its systematic testing and efficiency, we further advocate ET's use for continuous integration.

References

[1]
AdaCore. 2021. SPARK. https://github.com/AdaCore/spark2014
[2]
Alexandr Andoni, Dumitru Daniliuc, Sarfraz Khurshid, and Darko Marinov. 2003. Evaluating the “Small Scope Hypothesis”. In POPL ’03. 1–12.
[3]
John Backes, Pauline Bolignano, Byron Cook, Catherine Dodge, Andrew Gacek, Kasper Sœ Luckow, Neha Rungta, Oksana Tkachuk, and Carsten Varming. 2018. Semantic-based Automated Reasoning for AWS Access Policies using SMT. In FMCAD ’18. 1–9.
[4]
Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. CVC4. In CAV ’11. 171–177. https://doi.org/10.1007/978-3-642-22110-1_14
[5]
Clark Barrett, Leonardo de Moura, and Aaron Stump. 2005. SMT-COMP: Satisfiability Modulo Theories Competition. In CAV ’05. 20–23.
[6]
Clark Barrett, Pascal Fontaine, and Cesare Tinelli. 2019. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org
[7]
Clark Barrett, Aaron Stump, and Cesare Tinelli. 2010. The SMT-LIB Standard: Version 2.0. In SMT ’10.
[8]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA ’06. 169–190. https://doi.org/10.1145/1167473.1167488
[9]
Dmitry Blotsky, Federico Mora, Murphy Berzish, Yunhui Zheng, Ifaz Kabir, and Vijay Ganesh. 2018. StringFuzz: A Fuzzer for String Solvers. In CAV ’18. 45–51. https://doi.org/10.1007/978-3-642-00768-2_16
[10]
Robert Brummayer and Armin Biere. 2009. Boolector: An Efficient SMT Solver for Bit-Vectors and Arrays. In TACAS ’09. 174–177.
[11]
Robert Brummayer and Armin Biere. 2009. Fuzzing and delta-debugging SMT solvers. In SMT ’09. 1–5. https://doi.org/10.1145/1670412.1670413
[12]
Alexandra Bugariu and Peter Müller. 2020. Automatically Testing String Solvers. In ICSE ’20. 1459–1470. https://doi.org/10.1145/3377811.3380398
[13]
W.H. Burkhardt. 1967. Generating test programs from syntax. In Computing. 53–73.
[14]
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In OSDI ’08. 209–224.
[15]
Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In ICFP ’00. 268–279. https://doi.org/10.1145/351240.351266
[16]
SAT Competition. 2023. The International SAT Competition Web Page. https://github.com/AdaCore/spark2014
[17]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In TACAS ’08. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24
[18]
David Detlefs, Greg Nelson, and James B. Saxe. 2005. Simplify: A Theorem Prover for Program Checking. JACM, 365–473. https://doi.org/10.1145/1066100.1066102
[19]
Jonas Duregard, Patrick Jansson, and Meng Wang. 2012. FEAT: Functional Enumeration of Algebraic Types. In Haskell ’12. 61–72. https://doi.org/10.1145/2430532.2364515
[20]
Ambros Gleixner, Gregor Hendel, Gerald Gamrath, Tobias Achterberg, Michael Bastubbe, Timo Berthold, Philipp M. Christophel, Kati Jarck, Thorsten Koch, Jeff Linderoth, Marco Lübbecke, Hans D. Mittelmann, Derya Ozyurt, Ted K. Ralphs, Domenico Salvagnin, and Yuji Shinano. 2021. MIPLIB 2017: Data-Driven Compilation of the 6th Mixed-Integer Programming Library. Mathematical Programming Computation, https://doi.org/10.1007/s12532-020-00194-3
[21]
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In PLDI ’05. 213–223. https://doi.org/10.1145/1064978.1065036
[22]
Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox Fuzzing for Security Testing: SAGE Has Had a Remarkable Impact at Microsoft. Queue, 20–27.
[23]
K. V. Hanford. 1970. Automatic generation of test cases. IBM Systems Journal, 9, 4 (1970), 242–257.
[24]
Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments. In USENIX Security Symposium. https://doi.org/10.5555/2362793.2362831
[25]
Daniel Jackson. 2002. Alloy: a lightweight object modelling notation. ACM Trans. Softw. Eng. Methodol., 11, 2 (2002), apr, 256–290. issn:1049-331X https://doi.org/10.1145/505145.505149
[26]
D. Jackson and C.A. Damon. 1996. Elements of style: analyzing a software design feature with a counterexample detector. TSE ’96, 484–495. https://doi.org/10.1109/32.538605
[27]
Muhammad Numair Mansur, Maria Christakis, Valentin Wüstholz, and Fuyuan Zhang. 2020. Detecting Critical Bugs in SMT Solvers Using Blackbox Mutational Fuzzing. In FSE ’20. 701–712. https://doi.org/10.1145/3368089.3409763
[28]
Rudy Matela Braquehais. 2017. Tools for Discovery, Refinement and Generalization of Functional Properties by Enumerative Testing. Ph. D. Dissertation. University of York.
[29]
Aina Niemetz, Mathias Preiner, and Clark Barrett. 2022. Murxla: A Modular and Highly Extensible API Fuzzer for SMT Solvers. In CAV ’22. 92–106. https://doi.org/10.1007/978-3-031-13188-2_5
[30]
Aina Niemetz, Mathias Preiner, and Armin Biere. 2017. Model-based API testing for SMT solvers. In SMT ’17. 10.
[31]
Jiwon Park, Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2021. Generative Type-Aware Mutation for Testing SMT Solvers. In OOPSLA ’21. 1–19. https://doi.org/10.1145/3485529
[32]
David J. Pearce. 2019. On declarative rewriting for sound and complete union, intersection and negation types. Journal of Computer Languages, 50 (2019), 84–101. https://doi.org/10.1016/j.jvlc.2018.10.004
[33]
David J. Pearce. 2021. A Lightweight Formalism for Reference Lifetimes and Borrowing in Rust. ACM Trans. Program. Lang. Syst., 43, 1 (2021), https://doi.org/10.1145/3443420
[34]
Colin Runciman, Matthew Naylor, and Fredrik Lindblad. 2008. Smallcheck and Lazy Smallcheck: Automatic Exhaustive Testing for Small Values. In Haskell ’08. 37–48. https://doi.org/10.1145/1411286.1411292
[35]
Neha Rungta. 2022. A Billion SMT Queries a Day (Invited Paper). In CAV ’22. 3–18. https://doi.org/10.1007/978-3-031-13185-1_1
[36]
Joseph Scott, Federico Mora, and Vijay Ganesh. 2020. BanditFuzz: Fuzzing SMT Solvers with Reinforcement Learning. In CAV ’20. 68–86. https://doi.org/10.1007/978-3-030-63618-0_5
[37]
Armando Solar-Lezama. 2008. Program Synthesis by Sketching. Ph. D. Dissertation. University of California at Berkeley. isbn:9781109097450 https://doi.org/10.1007/978-3-642-10672-9_3
[38]
SPEC. 2023. SPEC’s Benchmarks and Tools. https://www.spec.org/benchmarks.html
[39]
Kevin Sullivan, Jinlin Yang, David Coppit, Sarfraz Khurshid, and Daniel Jackson. 2004. Software Assurance by Bounded Exhaustive Testing. In ISSTA ’04. 133–142. https://doi.org/10.1145/1013886.1007531
[40]
Geoff Sutcliffe. 2016. The CADE ATP System Competition — CASC. AI Magazine, Jul., 99–101. https://ojs.aaai.org/index.php/aimagazine/article/view/2620
[41]
Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In PLDI ’14. 530–541. https://doi.org/10.1145/2594291.2594340
[42]
Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. On the Unusal Effectiveness of Type-Aware Operator Mutation. OOPSLA ’20, 1–25. https://doi.org/10.1145/3428261
[43]
Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. Validating SMT Solvers via Semantic Fusion. PLDI ’20, 718––730. https://doi.org/10.1145/3385412.3385985
[44]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI ’11. 283–294. https://doi.org/10.1145/1993316.1993532
[45]
Peisen Yao, Heqing Huang, Wensheng Tang, Qingkai Shi, Rongxin Wu, and Charles Zhang. 2021. Fuzzing SMT Solvers via Two-Dimensional Input Space Exploration. In ISSTA ’21. 322–335. https://doi.org/10.1145/3460319.3464803
[46]
Peisen Yao, Heqing Huang, Wensheng Tang, Qingkai Shi, Rongxin Wu, and Charles Zhang. 2021. Skeletal Approximation Enumeration for SMT Solver Testing. In FSE ’21. 1141–1153. https://doi.org/10.1145/3468264.3468540
[47]
Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal program enumeration for rigorous compiler testing. In PLDI ’17. 347–361. https://doi.org/10.1145/3140587.3062379

Index Terms

  1. Validating SMT Solvers for Correctness and Performance via Grammar-Based Enumeration

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA2
    October 2024
    2691 pages
    EISSN:2475-1421
    DOI:10.1145/3554319
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 October 2024
    Published in PACMPL Volume 8, Issue OOPSLA2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Fuzz testing
    2. Grammar-based enumeration
    3. SMT solvers

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 150
      Total Downloads
    • Downloads (Last 12 months)150
    • Downloads (Last 6 weeks)60
    Reflects downloads up to 16 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media