More Web Proxy on the site http://driver.im/

research-article

Open access

Validating SMT Solvers for Correctness and Performance via Grammar-Based Enumeration

Authors:

Dominik Winterer,

Zhendong SuAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 8, Issue OOPSLA2

Article No.: 355, Pages 2378 - 2401

https://doi.org/10.1145/3689795

Published: 08 October 2024 Publication History

Abstract

We introduce ET, a grammar-based enumerator for validating SMT solver correctness and performance. By compiling grammars of the SMT theories to algebraic datatypes, ET leverages the functional enumerator FEAT. ET is highly effective at bug finding and has many complimentary benefits. Despite the extensive and continuous testing of the state-of-the-art SMT solvers Z3 and cvc5, ET found 102 bugs, out of which 76 were confirmed and 32 were fixed. Moreover, ET can be used to understand the evolution of solvers. We derive eight grammars realizing all major SMT theories including the booleans, integers, reals, realints, bit-vectors, arrays, floating points, and strings. Using ET, we test all consecutive releases of the SMT solvers Z3 and CVC4/cvc5 from the last six years (61 versions) on 8 million formulas, and 488 million solver calls. Our results suggest improved correctness in recent versions of both solvers but decreased performance in newer releases of Z3 on small timeouts (since z3-4.8.11) and regressions in early cvc5 releases on larger timeouts. Due to its systematic testing and efficiency, we further advocate ET's use for continuous integration.

References

[1]

AdaCore. 2021. SPARK. https://github.com/AdaCore/spark2014

[2]

Alexandr Andoni, Dumitru Daniliuc, Sarfraz Khurshid, and Darko Marinov. 2003. Evaluating the “Small Scope Hypothesis”. In POPL ’03. 1–12.

[3]

John Backes, Pauline Bolignano, Byron Cook, Catherine Dodge, Andrew Gacek, Kasper Sœ Luckow, Neha Rungta, Oksana Tkachuk, and Carsten Varming. 2018. Semantic-based Automated Reasoning for AWS Access Policies using SMT. In FMCAD ’18. 1–9.

[4]

Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. CVC4. In CAV ’11. 171–177. https://doi.org/10.1007/978-3-642-22110-1_14

[5]

Clark Barrett, Leonardo de Moura, and Aaron Stump. 2005. SMT-COMP: Satisfiability Modulo Theories Competition. In CAV ’05. 20–23.

[6]

Clark Barrett, Pascal Fontaine, and Cesare Tinelli. 2019. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org

[7]

Clark Barrett, Aaron Stump, and Cesare Tinelli. 2010. The SMT-LIB Standard: Version 2.0. In SMT ’10.

[8]

S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA ’06. 169–190. https://doi.org/10.1145/1167473.1167488

Digital Library

[9]

Dmitry Blotsky, Federico Mora, Murphy Berzish, Yunhui Zheng, Ifaz Kabir, and Vijay Ganesh. 2018. StringFuzz: A Fuzzer for String Solvers. In CAV ’18. 45–51. https://doi.org/10.1007/978-3-642-00768-2_16

Digital Library

[10]

Robert Brummayer and Armin Biere. 2009. Boolector: An Efficient SMT Solver for Bit-Vectors and Arrays. In TACAS ’09. 174–177.

[11]

Robert Brummayer and Armin Biere. 2009. Fuzzing and delta-debugging SMT solvers. In SMT ’09. 1–5. https://doi.org/10.1145/1670412.1670413

Digital Library

[12]

Alexandra Bugariu and Peter Müller. 2020. Automatically Testing String Solvers. In ICSE ’20. 1459–1470. https://doi.org/10.1145/3377811.3380398

Digital Library

[13]

W.H. Burkhardt. 1967. Generating test programs from syntax. In Computing. 53–73.

[14]

Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In OSDI ’08. 209–224.

Digital Library

[15]

Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In ICFP ’00. 268–279. https://doi.org/10.1145/351240.351266

Digital Library

[16]

SAT Competition. 2023. The International SAT Competition Web Page. https://github.com/AdaCore/spark2014

[17]

Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In TACAS ’08. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24

[18]

David Detlefs, Greg Nelson, and James B. Saxe. 2005. Simplify: A Theorem Prover for Program Checking. JACM, 365–473. https://doi.org/10.1145/1066100.1066102

Digital Library

[19]

Jonas Duregard, Patrick Jansson, and Meng Wang. 2012. FEAT: Functional Enumeration of Algebraic Types. In Haskell ’12. 61–72. https://doi.org/10.1145/2430532.2364515

Digital Library

[20]

Ambros Gleixner, Gregor Hendel, Gerald Gamrath, Tobias Achterberg, Michael Bastubbe, Timo Berthold, Philipp M. Christophel, Kati Jarck, Thorsten Koch, Jeff Linderoth, Marco Lübbecke, Hans D. Mittelmann, Derya Ozyurt, Ted K. Ralphs, Domenico Salvagnin, and Yuji Shinano. 2021. MIPLIB 2017: Data-Driven Compilation of the 6th Mixed-Integer Programming Library. Mathematical Programming Computation, https://doi.org/10.1007/s12532-020-00194-3

[21]

Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In PLDI ’05. 213–223. https://doi.org/10.1145/1064978.1065036

Digital Library

[22]

Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox Fuzzing for Security Testing: SAGE Has Had a Remarkable Impact at Microsoft. Queue, 20–27.

[23]

K. V. Hanford. 1970. Automatic generation of test cases. IBM Systems Journal, 9, 4 (1970), 242–257.

Digital Library

[24]

Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments. In USENIX Security Symposium. https://doi.org/10.5555/2362793.2362831

Digital Library

[25]

Daniel Jackson. 2002. Alloy: a lightweight object modelling notation. ACM Trans. Softw. Eng. Methodol., 11, 2 (2002), apr, 256–290. issn:1049-331X https://doi.org/10.1145/505145.505149

Digital Library

[26]

D. Jackson and C.A. Damon. 1996. Elements of style: analyzing a software design feature with a counterexample detector. TSE ’96, 484–495. https://doi.org/10.1109/32.538605

Digital Library

[27]

Muhammad Numair Mansur, Maria Christakis, Valentin Wüstholz, and Fuyuan Zhang. 2020. Detecting Critical Bugs in SMT Solvers Using Blackbox Mutational Fuzzing. In FSE ’20. 701–712. https://doi.org/10.1145/3368089.3409763

Digital Library

[28]

Rudy Matela Braquehais. 2017. Tools for Discovery, Refinement and Generalization of Functional Properties by Enumerative Testing. Ph. D. Dissertation. University of York.

[29]

Aina Niemetz, Mathias Preiner, and Clark Barrett. 2022. Murxla: A Modular and Highly Extensible API Fuzzer for SMT Solvers. In CAV ’22. 92–106. https://doi.org/10.1007/978-3-031-13188-2_5

Digital Library

[30]

Aina Niemetz, Mathias Preiner, and Armin Biere. 2017. Model-based API testing for SMT solvers. In SMT ’17. 10.

[31]

Jiwon Park, Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2021. Generative Type-Aware Mutation for Testing SMT Solvers. In OOPSLA ’21. 1–19. https://doi.org/10.1145/3485529

Digital Library

[32]

David J. Pearce. 2019. On declarative rewriting for sound and complete union, intersection and negation types. Journal of Computer Languages, 50 (2019), 84–101. https://doi.org/10.1016/j.jvlc.2018.10.004

[33]

David J. Pearce. 2021. A Lightweight Formalism for Reference Lifetimes and Borrowing in Rust. ACM Trans. Program. Lang. Syst., 43, 1 (2021), https://doi.org/10.1145/3443420

Digital Library

[34]

Colin Runciman, Matthew Naylor, and Fredrik Lindblad. 2008. Smallcheck and Lazy Smallcheck: Automatic Exhaustive Testing for Small Values. In Haskell ’08. 37–48. https://doi.org/10.1145/1411286.1411292

Digital Library

[35]

Neha Rungta. 2022. A Billion SMT Queries a Day (Invited Paper). In CAV ’22. 3–18. https://doi.org/10.1007/978-3-031-13185-1_1

Digital Library

[36]

Joseph Scott, Federico Mora, and Vijay Ganesh. 2020. BanditFuzz: Fuzzing SMT Solvers with Reinforcement Learning. In CAV ’20. 68–86. https://doi.org/10.1007/978-3-030-63618-0_5

Digital Library

[37]

Armando Solar-Lezama. 2008. Program Synthesis by Sketching. Ph. D. Dissertation. University of California at Berkeley. isbn:9781109097450 https://doi.org/10.1007/978-3-642-10672-9_3

Digital Library

[38]

SPEC. 2023. SPEC’s Benchmarks and Tools. https://www.spec.org/benchmarks.html

[39]

Kevin Sullivan, Jinlin Yang, David Coppit, Sarfraz Khurshid, and Daniel Jackson. 2004. Software Assurance by Bounded Exhaustive Testing. In ISSTA ’04. 133–142. https://doi.org/10.1145/1013886.1007531

Digital Library

[40]

Geoff Sutcliffe. 2016. The CADE ATP System Competition — CASC. AI Magazine, Jul., 99–101. https://ojs.aaai.org/index.php/aimagazine/article/view/2620

[41]

Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In PLDI ’14. 530–541. https://doi.org/10.1145/2594291.2594340

Digital Library

[42]

Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. On the Unusal Effectiveness of Type-Aware Operator Mutation. OOPSLA ’20, 1–25. https://doi.org/10.1145/3428261

Digital Library

[43]

Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. Validating SMT Solvers via Semantic Fusion. PLDI ’20, 718––730. https://doi.org/10.1145/3385412.3385985

Digital Library

[44]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI ’11. 283–294. https://doi.org/10.1145/1993316.1993532

Digital Library

[45]

Peisen Yao, Heqing Huang, Wensheng Tang, Qingkai Shi, Rongxin Wu, and Charles Zhang. 2021. Fuzzing SMT Solvers via Two-Dimensional Input Space Exploration. In ISSTA ’21. 322–335. https://doi.org/10.1145/3460319.3464803

Digital Library

[46]

Peisen Yao, Heqing Huang, Wensheng Tang, Qingkai Shi, Rongxin Wu, and Charles Zhang. 2021. Skeletal Approximation Enumeration for SMT Solver Testing. In FSE ’21. 1141–1153. https://doi.org/10.1145/3468264.3468540

Digital Library

[47]

Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal program enumeration for rigorous compiler testing. In PLDI ’17. 347–361. https://doi.org/10.1145/3140587.3062379

Digital Library

Index Terms

Validating SMT Solvers for Correctness and Performance via Grammar-Based Enumeration
1. Software and its engineering
  1. Software organization and properties
    1. Software functional properties
      1. Formal methods

Recommendations

Generative type-aware mutation for testing SMT solvers

We propose Generative Type-Aware Mutation, an effective approach for testing SMT solvers. The key idea is to realize generation through the mutation of expressions rooted with parametric operators from the SMT-LIB specification. Generative Type-Aware ...
Validating SMT solvers via semantic fusion
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

We introduce Semantic Fusion, a general, effective methodology for validating Satisfiability Modulo Theory (SMT) solvers. Our key idea is to fuse two existing equisatisfiable (i.e., both satisfiable or unsatisfiable) formulas into a new formula that ...
NNSMT: Deep Neural Networks for SMT Solvers Fuzzing
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence

SMT solvers are important tools in the field of software engineering, which is often used to determine the satisfiability of formulas in formal methods, such as software verification, program synthesis, program verification, etc. However, due to their ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 8, Issue OOPSLA2

October 2024

2691 pages

EISSN:2475-1421

DOI:10.1145/3554319

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Published in PACMPL Volume 8, Issue OOPSLA2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
150
Total Downloads

Downloads (Last 12 months)150
Downloads (Last 6 weeks)60

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents