[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3650105.3652292acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

Planning to Guide LLM for Code Coverage Prediction

Published: 12 June 2024 Publication History

Abstract

Code coverage serves as a crucial metric to assess testing effectiveness, measuring the degree to which a test suite exercises different facets of the code, such as statements, branches, or paths. Despite its significance, coverage profilers necessitate access to the entire codebase, constraining their usefulness in situations where the code is incomplete or execution is not feasible, and even cost-prohibitive. In this paper, we present CodePilot, a plan-based prompting approach grounded in program semantics, which collaborates with a Large Language Model (LLM) to enhance code coverage prediction. To address the intricacies of predicting code coverage, CodePilot employs planning by discerning various types of statements in an execution flow. Planning empowers GPT to autonomously generate plans based on guided examples, and then CodePilot prompts the GPT model to predict code coverage (Action) based on the plan it generated (Reasoning). Our experiments evaluating CodePilot demonstrate high accuracy, achieving up to 55% in exact-match and 89% in statement-match. It performs relatively better than the baselines, achieving up to 33% and 19% relatively higher in those metrics. We also showed that due to highly accurate plans (90%), GPT model predicts better code coverage. Moreover, we show CodePilot's utility in correctly predicting the least covered statements.

References

[1]
2024. Planning to Guide LLM for Code Coverage Prediction. https://github.com/code-planning/code-coverage-planning
[2]
Marcel Böhme, Valentin J. M. Manès, and Sang Kil Cha. 2020. Boosting fuzzer efficiency: an information theoretic perspective. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 678--689.
[3]
Marcel Böhme, László Szekeres, and Jonathan Metzman. 2022. On the reliability of coverage-based fuzzer benchmarking. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE '22). Association for Computing Machinery, New York, NY, USA, 1621--1633.
[4]
Bryan Roger Buck and Jeffrey K. Hollingsworth. 2000. An API for Runtime Code Patching. International Journal of High Performance Computing Applications 14, 4 (2000), 317--329.
[5]
Marcel Böhme, Cristian Cadar, and Abhik Roychoudhury. 2021. Fuzzing: Challenges and Opportunities. IEEE Software 38, 3 (2021), 79--86.
[6]
Boyuan Chen, Jian Song, Peng Xu, Xing Hu, and Zhen Ming Jack Jiang. 2018. An Automated Approach to Estimating Code Coverage Measures via Execution Logs. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 305--316.
[7]
Kalyan-Ram Chilakamarri and Sebastian G. Elbaum. 2006. Leveraging Disposable Instrumentation to Reduce Coverage Collection Overhead. Software Testing, Verification and Reliability 16, 4 (2006), 267--288.
[8]
Andrea Fioraldi and Michael Maier. 2020. Combining Incremental Steps of Fuzzing Research. In 14th USENIX Workshop on Offensive Technologies (WOOT 20). USENIX Association.
[9]
Gregory Gay. 2017. Generating effective test suites by combining coverage criteria. In Proceedings of the 11th International Workshop on Search-Based Software Testing (SBST). ACM, 65--82.
[10]
Google Atheris [n. d.]. Google Atheris. https://github.com/google/atheris.
[11]
Rahul Gopinath, Carlos Jensen, and Alex Groce. 2014. Code coverage for suite evaluation by developers. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 72--82.
[12]
GPT [n. d.]. OpenAI. https://openai.com/.
[13]
Ziniu Hu, Ahmet Iscen, Chen Sun, Kai-Wei Chang, Yizhou Sun, David A Ross, Cordelia Schmid, and Alireza Fathi. 2023. AVIS: Autonomous Visual Information Seeking with Large Language Model Agent. arXiv:2306.08129 [cs.CV]
[14]
Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser. 2019. Code coverage at Google. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 955--963.
[15]
George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada) (CCS '18). Association for Computing Machinery, New York, NY, USA, 2123--2138.
[16]
Chenxiao Liu, Shuai Lu, Weizhu Chen, Daxin Jiang, Alexey Svyatkovskiy, Shengyu Fu, Neel Sundaresan, and Nan Duan. 2023. Code Execution with Pre-trained Language Models. arXiv:2305.05383 [cs.PL]
[17]
Danushka Liyanage, Marcel Böhme, Chakkrit Tantithamthavorn, and Stephan Lipp. 2023. Reachable Coverage: Estimating Saturation in Fuzzing. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 371--383.
[18]
Yiling Lou, Qihao Zhu, Jinhao Dong, Xia Li, Zeyu Sun, Dan Hao, Lu Zhang, and Lingming Zhang. 2021. Boosting Coverage-Based Fault Localization via Graph-Based Representation Learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 664--676.
[19]
Breno Miranda and Antonia Bertolino. 2020. Testing Relative to Usage Scope: Revisiting Software Coverage Criteria. ACM Transactions on Software Engineering and Methodology (TOSEM) 29, 3 (2020), 1--30.
[20]
J. Misurda, J.A. Clause, J.L. Reed, B.R. Childers, and M.L. Soffa. 2005. Demand-driven Structural Testing with Dynamic Instrumentation. In Proceedings of the 27th International Conference on Software Engineering, 2005. 156--165.
[21]
Stefan Nagy and Matthew Hicks. 2019. Full-Speed Fuzzing: Reducing Fuzzing Overhead through Coverage-Guided Tracing. In 2019 IEEE Symposium on Security and Privacy, SP 2019. IEEE, San Francisco, CA, USA, 787--802.
[22]
Siqi Ouyang and Lei Li. 2023. AutoPlan: Automatic Planning of Interactive Decision-Making Tasks With Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 3114--3128.
[23]
C. Pavlopoulou and M. Young. 1999. Residual test coverage monitoring. In Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002). 277--284.
[24]
Juan Altmayer Pizzorno and Emery D. Berger. 2023. SlipCover: Near Zero-Overhead Code Coverage for Python. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17--21, 2023, René Just and Gordon Fraser (Eds.). ACM, 1195--1206.
[25]
Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, and Tushar Khot. 2023. ADaPT: As-Needed Decomposition and Planning with Language Models. arXiv (2023).
[26]
Kostya Serebryany. 2021. libFuzzer - a library for coverage-guided fuzz testing. https://llvm.org/docs/LibFuzzer.html
[27]
Robert Swiecki. 2021. Honggfuzz. https://github.com/google/honggfuzz
[28]
Clang Team. 2023. Source-based Code Coverage. https://clang.llvm.org/docs/SourceBasedCodeCoverage.html Clang 15 Documentation.
[29]
Mustafa M. Tikir and Jeffrey K. Hollingsworth. 2005. Efficient Online Computation of Statement Coverage. Journal of Systems and Software 78, 2 (2005), 146--165.
[30]
Michele Tufano, Shubham Chandel, Anisha Agarwal, Neel Sundaresan, and Colin Clement. 2023. Predicting Code Coverage without Execution. arXiv:2307.13383 [cs.SE]
[31]
Yi Wei, Bertrand Meyer, and Manuel Oriol. 2012. Is Branch Coverage a Good Measure of Testing Effectiveness?. In Empirical Software Engineering and Verification. Springer Berlin Heidelberg, 194--212.
[32]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629 (2022).
[33]
Xiaogang Zhu, Shigang Liu, Xian Li, Sheng Wen, Jun Zhang, Seyit Ahmet Çamtepe, and Yang Xiang. 2020. DeFuzz: Deep Learning Guided Directed Fuzzing. arXiv preprint arXiv:2010.12149 (2020).
[34]
Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A. Rossi, Somdeb Sarkhel, and Chao Zhang. 2023. ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search. arXiv:2310.13227 [cs.CL]

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FORGE '24: Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering
April 2024
140 pages
ISBN:9798400706097
DOI:10.1145/3650105
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2024

Check for updates

Author Tags

  1. AI4SE
  2. large language models
  3. planning
  4. code coverage analysis

Qualifiers

  • Research-article

Funding Sources

Conference

FORGE '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 324
    Total Downloads
  • Downloads (Last 12 months)324
  • Downloads (Last 6 weeks)80
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media