More Web Proxy on the site http://driver.im/

research-article

Open access

Planning to Guide LLM for Code Coverage Prediction

Authors:

Hridya Dhulipala,

Aashish Yadavally,

Tien N. NguyenAuthors Info & Claims

FORGE '24: Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering

Pages 24 - 34

https://doi.org/10.1145/3650105.3652292

Published: 12 June 2024 Publication History

Abstract

Code coverage serves as a crucial metric to assess testing effectiveness, measuring the degree to which a test suite exercises different facets of the code, such as statements, branches, or paths. Despite its significance, coverage profilers necessitate access to the entire codebase, constraining their usefulness in situations where the code is incomplete or execution is not feasible, and even cost-prohibitive. In this paper, we present CodePilot, a plan-based prompting approach grounded in program semantics, which collaborates with a Large Language Model (LLM) to enhance code coverage prediction. To address the intricacies of predicting code coverage, CodePilot employs planning by discerning various types of statements in an execution flow. Planning empowers GPT to autonomously generate plans based on guided examples, and then CodePilot prompts the GPT model to predict code coverage (Action) based on the plan it generated (Reasoning). Our experiments evaluating CodePilot demonstrate high accuracy, achieving up to 55% in exact-match and 89% in statement-match. It performs relatively better than the baselines, achieving up to 33% and 19% relatively higher in those metrics. We also showed that due to highly accurate plans (90%), GPT model predicts better code coverage. Moreover, we show CodePilot's utility in correctly predicting the least covered statements.

References

[1]

2024. Planning to Guide LLM for Code Coverage Prediction. https://github.com/code-planning/code-coverage-planning

[2]

Marcel Böhme, Valentin J. M. Manès, and Sang Kil Cha. 2020. Boosting fuzzer efficiency: an information theoretic perspective. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 678--689.

Digital Library

[3]

Marcel Böhme, László Szekeres, and Jonathan Metzman. 2022. On the reliability of coverage-based fuzzer benchmarking. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE '22). Association for Computing Machinery, New York, NY, USA, 1621--1633.

Digital Library

[4]

Bryan Roger Buck and Jeffrey K. Hollingsworth. 2000. An API for Runtime Code Patching. International Journal of High Performance Computing Applications 14, 4 (2000), 317--329.

Digital Library

[5]

Marcel Böhme, Cristian Cadar, and Abhik Roychoudhury. 2021. Fuzzing: Challenges and Opportunities. IEEE Software 38, 3 (2021), 79--86.

[6]

Boyuan Chen, Jian Song, Peng Xu, Xing Hu, and Zhen Ming Jack Jiang. 2018. An Automated Approach to Estimating Code Coverage Measures via Execution Logs. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 305--316.

Digital Library

[7]

Kalyan-Ram Chilakamarri and Sebastian G. Elbaum. 2006. Leveraging Disposable Instrumentation to Reduce Coverage Collection Overhead. Software Testing, Verification and Reliability 16, 4 (2006), 267--288.

[8]

Andrea Fioraldi and Michael Maier. 2020. Combining Incremental Steps of Fuzzing Research. In 14th USENIX Workshop on Offensive Technologies (WOOT 20). USENIX Association.

[9]

Gregory Gay. 2017. Generating effective test suites by combining coverage criteria. In Proceedings of the 11th International Workshop on Search-Based Software Testing (SBST). ACM, 65--82.

[10]

Google Atheris [n. d.]. Google Atheris. https://github.com/google/atheris.

[11]

Rahul Gopinath, Carlos Jensen, and Alex Groce. 2014. Code coverage for suite evaluation by developers. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 72--82.

Digital Library

[12]

GPT [n. d.]. OpenAI. https://openai.com/.

[13]

Ziniu Hu, Ahmet Iscen, Chen Sun, Kai-Wei Chang, Yizhou Sun, David A Ross, Cordelia Schmid, and Alireza Fathi. 2023. AVIS: Autonomous Visual Information Seeking with Large Language Model Agent. arXiv:2306.08129 [cs.CV]

[14]

Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser. 2019. Code coverage at Google. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 955--963.

Digital Library

[15]

George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada) (CCS '18). Association for Computing Machinery, New York, NY, USA, 2123--2138.

Digital Library

[16]

Chenxiao Liu, Shuai Lu, Weizhu Chen, Daxin Jiang, Alexey Svyatkovskiy, Shengyu Fu, Neel Sundaresan, and Nan Duan. 2023. Code Execution with Pre-trained Language Models. arXiv:2305.05383 [cs.PL]

[17]

Danushka Liyanage, Marcel Böhme, Chakkrit Tantithamthavorn, and Stephan Lipp. 2023. Reachable Coverage: Estimating Saturation in Fuzzing. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 371--383.

Digital Library

[18]

Yiling Lou, Qihao Zhu, Jinhao Dong, Xia Li, Zeyu Sun, Dan Hao, Lu Zhang, and Lingming Zhang. 2021. Boosting Coverage-Based Fault Localization via Graph-Based Representation Learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 664--676.

Digital Library

[19]

Breno Miranda and Antonia Bertolino. 2020. Testing Relative to Usage Scope: Revisiting Software Coverage Criteria. ACM Transactions on Software Engineering and Methodology (TOSEM) 29, 3 (2020), 1--30.

Digital Library

[20]

J. Misurda, J.A. Clause, J.L. Reed, B.R. Childers, and M.L. Soffa. 2005. Demand-driven Structural Testing with Dynamic Instrumentation. In Proceedings of the 27th International Conference on Software Engineering, 2005. 156--165.

[21]

Stefan Nagy and Matthew Hicks. 2019. Full-Speed Fuzzing: Reducing Fuzzing Overhead through Coverage-Guided Tracing. In 2019 IEEE Symposium on Security and Privacy, SP 2019. IEEE, San Francisco, CA, USA, 787--802.

[22]

Siqi Ouyang and Lei Li. 2023. AutoPlan: Automatic Planning of Interactive Decision-Making Tasks With Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 3114--3128.

[23]

C. Pavlopoulou and M. Young. 1999. Residual test coverage monitoring. In Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002). 277--284.

Digital Library

[24]

Juan Altmayer Pizzorno and Emery D. Berger. 2023. SlipCover: Near Zero-Overhead Code Coverage for Python. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17--21, 2023, René Just and Gordon Fraser (Eds.). ACM, 1195--1206.

Digital Library

[25]

Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, and Tushar Khot. 2023. ADaPT: As-Needed Decomposition and Planning with Language Models. arXiv (2023).

[26]

Kostya Serebryany. 2021. libFuzzer - a library for coverage-guided fuzz testing. https://llvm.org/docs/LibFuzzer.html

[27]

Robert Swiecki. 2021. Honggfuzz. https://github.com/google/honggfuzz

[28]

Clang Team. 2023. Source-based Code Coverage. https://clang.llvm.org/docs/SourceBasedCodeCoverage.html Clang 15 Documentation.

[29]

Mustafa M. Tikir and Jeffrey K. Hollingsworth. 2005. Efficient Online Computation of Statement Coverage. Journal of Systems and Software 78, 2 (2005), 146--165.

Digital Library

[30]

Michele Tufano, Shubham Chandel, Anisha Agarwal, Neel Sundaresan, and Colin Clement. 2023. Predicting Code Coverage without Execution. arXiv:2307.13383 [cs.SE]

[31]

Yi Wei, Bertrand Meyer, and Manuel Oriol. 2012. Is Branch Coverage a Good Measure of Testing Effectiveness?. In Empirical Software Engineering and Verification. Springer Berlin Heidelberg, 194--212.

[32]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629 (2022).

[33]

Xiaogang Zhu, Shigang Liu, Xian Li, Sheng Wen, Jun Zhang, Seyit Ahmet Çamtepe, and Yang Xiang. 2020. DeFuzz: Deep Learning Guided Directed Fuzzing. arXiv preprint arXiv:2010.12149 (2020).

[34]

Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A. Rossi, Somdeb Sarkhel, and Chao Zhang. 2023. ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search. arXiv:2310.13227 [cs.CL]

Index Terms

Planning to Guide LLM for Code Coverage Prediction
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software reliability

Recommendations

Efficient Incremental Code Coverage Analysis for Regression Test Suites
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

Code coverage analysis has been widely adopted in the continuous integration of open-source and industry software repositories to monitor the adequacy of regression test suites. However, computing code coverage can be costly, introducing significant ...
Self-Planning Code Generation with Large Language Models
Although large language models (LLMs) have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose ...
Domain-independent temporal planning in a planning-graph-based approach

Many planning domains have to deal with temporal features that can be expressed using durations that are associated to actions. This paper presents a temporal planning approach that combines the principles of Graphplan and TGP, and uses the information ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FORGE '24: Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering

April 2024

140 pages

ISBN:9798400706097

DOI:10.1145/3650105

Chair:
David Lo,
Co-chair:
Xin Xia,
Program Chairs:
Massimiliano Di Penta,
Xing Hu
Zhejiang University, China

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

FORGE '24

Sponsor:

SIGSOFT

FORGE '24: 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering

April 14, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
324
Total Downloads

Downloads (Last 12 months)324
Downloads (Last 6 weeks)80

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents