The Effects of Semantic Information on LLM-Based Program Repair

Shota Hori¹¹,
Shinsuke Matsumoto¹¹,
Yoshiki Higo¹¹,
Shinji Kusumoto¹¹,
Kazuya Yasuda¹²,
Shinji Ito¹² &
…
Phan Thi Thanh Huyen¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15452))

Included in the following conference series:

International Conference on Product-Focused Software Process Improvement

154 Accesses

Abstract

Large Language Model-based Automated Program Repair (LLM-APR) has recently received significant attention as a debugging assistance. Our objective is to improve the performance of LLM-APR. In this study, we focus on semantic information contained in the source code. Semantic information refers to elements used by the programmer to understand the source code, which does not contribute to compilation or execution. We picked out specification, method names and variable names as semantic information. In the investigation, we prepared eight prompts, each consisting of all combinations of three types of semantic information. The experimental results showed that all semantic information improves the performance of LLM-APR, and variable names are particularly significant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 59.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/kusumotolab/Mutanerator (accessed April 7, 2024).
2.
https://github.com/jhy/jsoup (accessed April 20, 2024).
3.
https://github.com/google/gson (accessed April 26, 2024).

References

Alsuhaibani, R.S., Newman, C.D., Decker, M.J., Collard, M.L., Maletic, J.I.: On the naming of methods: a survey of professional developers. In: Proceedings of International Conference on Software Engineering, pp. 587–599 (2021)
Google Scholar
Higo, Y., et al.: kGenProg: a high-performance, high-extensibility and high-portability APR system. In: Proceedings of Asia-Pacific Software Engineering Conference, pp. 697–698 (2018)
Google Scholar
Just, R., Jalali, D., Ernst, M.D.: Defects4J: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of International Symposium on Software Testing and Analysis, pp. 437–440 (2014)
Google Scholar
Liu, K., Koyuncu, A., Kim, D., Bissyandé, T.F.: TBar: revisiting template-based automated program repair. In: Proceedings of International Symposium on Software Testing and Analysis, pp. 31–42 (2019)
Google Scholar
OuYang, S., Zhang, J., Harman, M.,Wang, M.: LLM is like a box of chocolates: the non-determinism of ChatGPT in code generation. ArXiv arXiv:2308.02828 (2023)
Parasaram, N., et al.: The fact selection problem in LLM-based program repair. ArXiv arXiv:2404.05520 (2024)
Schankin, A., Berger, A., Holt, D.V., Hofmeister, J.C., Riedel, T., Beigl, M.: Descriptive compound identifier names improve source code comprehension. In: Proceedings of Conference on Program Comprehension, pp. 31–40 (2018)
Google Scholar
Sobania, D., Briesch, M., Hanna, C., Petke, J.: An analysis of automatic bug fixing performance of ChatGpt. In: Proceedings of International Workshop on Automated Program Repair, pp. 23–30 (2023)
Google Scholar
White, J., et al.: A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv arXiv:2302.11382 (2023)
Xia, C.S., Wei, Y., Zhang, L.: Automated program repair in the era of large pretrained language models. In: Proceedings of International Conference on Software Engineering, pp. 1482–1494 (2023)
Google Scholar

Download references

Acknowledgements

This research was partially supported by JSPS KAKENHI Japan (JP24H00692, JP21H04877, JP21K18302, JP23K24823, JP22K11985, 21K11829).

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Osaka University, Suita, Japan
Shota Hori, Shinsuke Matsumoto, Yoshiki Higo & Shinji Kusumoto
Hitachi, Ltd., Tokyo, Japan
Kazuya Yasuda, Shinji Ito & Phan Thi Thanh Huyen

Authors

Shota Hori
View author publications
You can also search for this author in PubMed Google Scholar
Shinsuke Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiki Higo
View author publications
You can also search for this author in PubMed Google Scholar
Shinji Kusumoto
View author publications
You can also search for this author in PubMed Google Scholar
Kazuya Yasuda
View author publications
You can also search for this author in PubMed Google Scholar
Shinji Ito
View author publications
You can also search for this author in PubMed Google Scholar
Phan Thi Thanh Huyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shota Hori .

Editor information

Editors and Affiliations

University of Tartu, Tartu, Estonia
Dietmar Pfahl
Blekinge Institute of Technology, Karlskrona, Sweden
Javier Gonzalez Huerta
Leibniz Universität Hannover, Hannover, Germany
Jil Klünder
University of Tartu, Tartu, Estonia
Hina Anwar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hori, S. et al. (2025). The Effects of Semantic Information on LLM-Based Program Repair. In: Pfahl, D., Gonzalez Huerta, J., Klünder, J., Anwar, H. (eds) Product-Focused Software Process Improvement. PROFES 2024. Lecture Notes in Computer Science, vol 15452. Springer, Cham. https://doi.org/10.1007/978-3-031-78386-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-78386-9_28
Published: 27 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78385-2
Online ISBN: 978-3-031-78386-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics