[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

SMT2Test: From SMT Formulas to Effective Test Cases

Published: 08 October 2024 Publication History

Abstract

One of the primary challenges in software testing is generating high-quality test inputs and obtaining corresponding test oracles. This paper introduces a novel methodology to mitigate this challenge in testing program verifiers by employing SMT (Satisfiability Modulo Theories) formulas as a universal test case generator. The key idea is to transform SMT formulas into programs and link the satisfiability of the formulas with the safety property of the programs, allowing the satisfiability of the formulas to act as a test oracle for program verifiers. This method was implemented as a framework named SMT2Test, which enables the transformation of SMT formulas into Dafny and C programs. An intermediate representation was designed to augment the flexibility of this framework, streamlining the transformation for other programming languages and fostering modular transformation strategies. We evaluated the effectiveness of SMT2Test by finding defects in two program verifiers: the Dafny verifier and CPAchecker. Utilizing the SMT2Test framework with the SMT formulas from the SMT competition and SMT solver fuzzers, we discovered and reported a total of 14 previously unknown defects in these program verifiers that were not found by previous methods. After reporting, all of them have been confirmed, and 6 defects have been fixed. These findings show the effectiveness of our method and imply its potential application in testing other programming language infrastructures.

References

[1]
Haniel Barbosa, Clark Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, and Andres Nötzli. 2022. cvc5: A versatile and industrial-strength SMT solver. In TACAS. 415–442. https://doi.org/10.1007/978-3-030-99524-9_24
[2]
Mike Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K Rustan M Leino. 2006. Boogie: A modular reusable verifier for object-oriented programs. In FMCO. 364–387. https://doi.org/10.1007/11804192_17
[3]
Earl T Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2014. The oracle problem in software testing: A survey. IEEE transactions on software engineering, 41, 5 (2014), 507–525. https://doi.org/10.1109/tse.2014.2372785
[4]
Clark Barrett, Leonardo De Moura, and Aaron Stump. 2005. SMT-COMP: Satisfiability modulo theories competition. In CAV. 20–23. https://doi.org/10.1007/11513988_4
[5]
Dirk Beyer. 2023. Competition on software verification and witness validation: SV-COMP 2023. In TACAS. 495–522. https://doi.org/10.1007/978-3-031-30820-8_29
[6]
Dirk Beyer and M Erkan Keremoglu. 2011. CPAchecker: A tool for configurable software verification. In CAV. 184–190. https://doi.org/10.1007/978-3-642-22110-1_16
[7]
Dmitry Blotsky, Federico Mora, Murphy Berzish, Yunhui Zheng, Ifaz Kabir, and Vijay Ganesh. 2018. Stringfuzz: A fuzzer for string solvers. In CAV. 45–51. https://doi.org/10.1007/978-3-319-96142-2_6
[8]
Marco Bozzano, Roberto Bruttomesso, Alessandro Cimatti, Tommi Junttila, Peter Van Rossum, Stephan Schulz, and Roberto Sebastiani. 2005. MathSAT: Tight integration of SAT and mathematical decision procedures. Journal of Automated Reasoning, 35, 1-3 (2005), 265–293. https://doi.org/10.1007/978-1-4020-5571-3_12
[9]
Alexandra Bugariu and Peter Müller. 2020. Automatically testing string solvers. In ICSE. 1459–1470. https://doi.org/10.1145/3377811.3380398
[10]
Cristian Cadar, Daniel Dunbar, and Dawson R Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI. 8, 209–224.
[11]
Alessandro Cimatti, Alberto Griggio, Bastiaan Joost Schaafsma, and Roberto Sebastiani. 2013. The mathsat5 SMT solver. In TACAS. 93–107. https://doi.org/10.1007/978-3-642-36742-7_7
[12]
Edmund Clarke, Daniel Kroening, and Flavio Lerda. 2004. A tool for checking ANSI-C programs. In TACAS. 168–176. https://doi.org/10.1007/978-3-540-24730-2_15
[13]
Edmund M Clarke, E Allen Emerson, and Joseph Sifakis. 2009. Model checking: algorithmic verification and debugging. Commun. ACM, 52, 11 (2009), 74–84. https://doi.org/10.1145/1592761.1592781
[14]
Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL. 238–252. https://doi.org/10.1145/512950.512973
[15]
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In ISSTA. 95–105. https://doi.org/10.1145/3213846.3213848
[16]
Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, Boris Yakobowski, and Xuejun Yang. 2012. Testing static analyzers with randomly generated programs. In NASA FM. 120–125. https://doi.org/10.1007/978-3-642-28891-3_12
[17]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In TACAS. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24
[18]
Bruno Dutertre. 2014. Yices 2.2. In CAV. 737–744. https://doi.org/10.1007/978-3-319-08867-9_49
[19]
Andrew Ferraiuolo, Andrew Baumann, Chris Hawblitzel, and Bryan Parno. 2017. Komodo: Using verification to disentangle secure-enclave hardware from software. In SOSP. 287–305. https://doi.org/10.1145/3132747.3132782
[20]
Robert W Floyd. 1993. Assigning meanings to programs. In Program Verification: Fundamental Issues in Computer Science. 65–81. https://doi.org/10.1007/978-94-011-1793-7_4
[21]
Alex Groce, Iftekhar Ahmed, Josselin Feist, Gustavo Grieco, Jiri Gesi, Mehran Meidani, and Qihong Chen. 2021. Evaluating and improving static analysis tools via differential mutation analysis. In QRS. 207–218. https://doi.org/10.1109/qrs54544.2021.00032
[22]
Arie Gurfinkel, Temesghen Kahsai, Anvesh Komuravelli, and Jorge A Navas. 2015. The SeaHorn verification framework. In CAV. 343–361. https://doi.org/10.1007/978-3-319-21690-4_20
[23]
Travis Hance, Andrea Lattuada, Chris Hawblitzel, Jon Howell, Rob Johnson, and Bryan Parno. 2020. Storage systems are distributed systems (so verify them that way!). In OSDI. 99–115.
[24]
William Gallard Hatch, Pierce Darragh, Sorawee Porncharoenwase, Guy Watson, and Eric Eide. 2023. Generating conforming programs with Xsmith. In GPCE. https://doi.org/10.1145/3624007.3624056
[25]
Chris Hawblitzel, Jon Howell, Manos Kapritsos, Jacob R Lorch, Bryan Parno, Michael L Roberts, Srinath Setty, and Brian Zill. 2015. IronFleet: proving practical distributed systems correct. In SOSP. 1–17. https://doi.org/10.1145/2815400.2815428
[26]
Charles Antony Richard Hoare. 1969. An axiomatic basis for computer programming. Commun. ACM, 12, 10 (1969), 576–580. https://doi.org/10.1145/363235.363259
[27]
Chiao Hsieh and Sayan Mitra. 2019. Dione: A protocol verification system built with dafny for i/o automata. In IFM. 227–245. https://doi.org/10.1007/978-3-030-34968-4_13
[28]
Ahmed Irfan, Sorawee Porncharoenwase, Zvonimir Rakamarić, Neha Rungta, and Emina Torlak. 2022. Testing dafny (experience paper). In ISSTA. 556–567. https://doi.org/10.1145/3533767.3534382
[29]
Timotej Kapus and Cristian Cadar. 2017. Automatic testing of symbolic execution engines via program generation and differential testing. In ASE. 590–600. https://doi.org/10.1109/ase.2017.8115669
[30]
James C King. 1976. Symbolic execution and program testing. Commun. ACM, 19, 7 (1976), 385–394. https://doi.org/10.1109/c-m.1978.218139
[31]
Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and Boris Yakobowski. 2015. Frama-C: A software analysis perspective. Formal aspects of computing, 27 (2015), 573–609. https://doi.org/10.1007/s00165-014-0326-7
[32]
Christian Klinger, Maria Christakis, and Valentin Wüstholz. 2019. Differentially testing soundness and precision of program analyzers. In ISSTA. 239–250. https://doi.org/10.1145/3293882.3330553
[33]
Daniel Kroening and Michael Tautschnig. 2014. CBMC–C bounded model checker. In TACAS. 389–391. https://doi.org/10.1007/978-3-642-54862-8_26
[34]
Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. PLDI, 49, 6 (2014), 216–226. https://doi.org/10.1145/2594291.2594334
[35]
Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding deep compiler bugs via guided stochastic program mutation. OOPSLA, 50, 10 (2015), 386–399. https://doi.org/10.1145/2858965.2814319
[36]
K Rustan M Leino. 2010. Dafny: An automatic program Verifier for functional correctness. In LPAR. 348–370. https://doi.org/10.1007/978-3-642-17511-4_20
[37]
K Rustan M Leino and Peter Müller. 2009. A basis for verifying multi-threaded programs. In ESOP. 378–393. https://doi.org/10.1007/978-3-642-00590-9_27
[38]
K Rustan M Leino and Wolfram Schulte. 2004. Exception Safety for C#. In SEFM. 4, 218–227.
[39]
Guodong Li and Konrad Slind. 2008. Trusted source translation of a total function language. In TACAS. 471–485. https://doi.org/10.1007/978-3-540-78800-3_37
[40]
Daniel Liew, Cristian Cadar, Alastair F Donaldson, and J Ryan Stinnett. 2019. Just fuzz it: solving floating-point constraints using coverage-guided fuzzing. In ESEC/FSE. 521–532. https://doi.org/10.1145/3338906.3338921
[41]
Xiao Liu, Xiaoting Li, Rupesh Prajapati, and Dinghao Wu. 2019. DeepFuzz: Automatic generation of syntax valid c programs for fuzz testing. In AAAI. 33, 1044–1051. https://doi.org/10.1609/aaai.v33i01.33011044
[42]
Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random testing for C and C++ compilers with YARPGen. In OOPSLA. 4, 1–25. https://doi.org/10.1145/3428264
[43]
Benjamin Mikek and Qirun Zhang. 2023. Speeding up SMT solving via compiler pptimization. In ESEC/FSE. https://doi.org/10.1145/3611643.3616357
[44]
Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: Automatic optimization of declarative queries. In PLDI. 121–131. https://doi.org/10.1145/1993316.1993513
[45]
Glenford J Myers, Corey Sandler, and Tom Badgett. 2011. The art of software testing. John Wiley & Sons.
[46]
Jiwon Park, Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2021. Generative type-aware mutation for testing SMT solvers. In OOPSLA. 5, 1–19. https://doi.org/10.1145/3485529
[47]
Terence J. Parr and Russell W. Quong. 1995. ANTLR: A predicated-LL (k) parser generator. Software: Practice and Experience, 25, 7 (1995), 789–810. https://doi.org/10.1002/spe.4380250705
[48]
John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-case reduction for C compiler bugs. In PLDI. 335–346. https://doi.org/10.1145/2254064.2254104
[49]
Jubi Taneja, Zhengyang Liu, and John Regehr. 2020. Testing static analyses for precision and soundness. In CGO. 81–93. https://doi.org/10.1145/3368826.3377927
[50]
David A Terei and Manuel MT Chakravarty. 2010. An LLVM backend for GHC. In Proceedings of the third ACM Haskell symposium on Haskell. 109–120. https://doi.org/10.1145/1863523.1863538
[51]
Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. On the unusual effectiveness of type-aware operator mutations for testing SMT solvers. In OOPSLA. https://doi.org/10.1145/3428261
[52]
Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. Validating SMT solvers via semantic fusion. In PLDI. 718–730. https://doi.org/10.1145/3385412.3385985
[53]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In PLDI. 283–294. https://doi.org/10.1145/2345156.1993532
[54]
Peisen Yao, Heqing Huang, Wensheng Tang, Qingkai Shi, Rongxin Wu, and Charles Zhang. 2021. Fuzzing SMT solvers via two-dimensional input space exploration. In ISSTA. 322–335. https://doi.org/10.1145/3460319.3464803
[55]
Peisen Yao, Heqing Huang, Wensheng Tang, Qingkai Shi, Rongxin Wu, and Charles Zhang. 2021. Skeletal approximation enumeration for SMT solver testing. In ESEC/FSE. 1141–1153. https://doi.org/10.1145/3468264.3468540
[56]
Chengyu Zhang, Ting Su, Yichen Yan, Fuyuan Zhang, Geguang Pu, and Zhendong Su. 2019. Finding and understanding bugs in software model checkers. In ESEC/FSE. 763–773. https://doi.org/10.1145/3338906.3338932
[57]
Chengyu Zhang and Zhendong Su. 2024. Artifact: SMT2Test: From SMT Formulas to Effective Test Cases. https://doi.org/10.5281/zenodo.12669638

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA2
October 2024
2691 pages
EISSN:2475-1421
DOI:10.1145/3554319
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024
Published in PACMPL Volume 8, Issue OOPSLA2

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Program Verification
  2. SMT Solving
  3. Software Testing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 330
    Total Downloads
  • Downloads (Last 12 months)330
  • Downloads (Last 6 weeks)138
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media