LenANet: A Length-Controllable Attention Network for Source Code Summarization

Peng Chen¹⁰,
Shaojuan Wu¹⁰,
Ziqiang Chen¹⁰,
Jiarui Zhang¹⁰,
Xiaowang Zhang¹⁰ &
…
Zhiyong Feng¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1965))

Included in the following conference series:

International Conference on Neural Information Processing

815 Accesses

Abstract

Source code summarization aims at generating brief description of a source code. Existing approaches have made great breakthroughs through encoder-decoder models. They focus on learning common features contained in translation from source code to natural language summaries. As a result, they tend to generate generic summaries independent of the context and lack of details. However, specific summaries which characterize specific features of code snippets are widely present in real-world scenarios. Such summaries are rarely studied as capturing specific features of source code would be difficult. What’s more, only the common features learned would result in only the generic short summaries generated. In this paper, we present LenANet to generate specific summaries by considering the desired length information and extracting the specific code sentence. Firstly, we introduce length offset vector to force the generation of summaries which could contain specific amount of information, laying the groundwork for generating specific summaries. Further, forcing the model to generate summaries with a certain length would bring in invalid or generic descriptions, a context-aware code sentence extractor is proposed to extract specific features corresponding to specific information. Besides, we present a innovative sentence-level code tree to capture the structural semantics and learn the representation of code sentence by graph attention network, which is crucial for specific features extraction. The experiments on CodeXGLUE datasets with six programming language demonstrate that LenANet significantly outperforms the baselines and has the potential to generate specific summaries. In particular, the overall BLEU-4 is improved by 0.53 on the basis of CodeT5 with length control.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 63.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 79.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

FCSO: Source Code Summarization by Fusing Multiple Code Features and Ensuring Self-consistency Output

Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation

Code Summarization with Abstract Syntax Tree

Notes

1.
https://github.com/salesforce/CodeT5.

References

Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.: A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 4998–5007. Association for Computational Linguistics (2020)
Google Scholar
Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.: Unified pre-training for program understanding and generation. CoRR abs/2103.06333
Google Scholar
Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2091–2100. JMLR.org
Google Scholar
Dathathri, S., et al.: Plug and play language models: a simple approach to controlled text generation. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net
Google Scholar
Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1536–1547. Association for Computational Linguistics (2020)
Google Scholar
Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. CoRR abs/2002.08155
Google Scholar
Holtzman, A., Buys, J., Forbes, M., Choi, Y.: The curious case of neural text degeneration. CoRR abs/1904.09751
Google Scholar
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics (2016)
Google Scholar
Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C., Socher, R.: CTRL: a conditional transformer language model for controllable generation. CoRR abs/1909.05858
Google Scholar
LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13–15, 2020, pp. 184–195. ACM (2020)
Google Scholar
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. CoRR abs/2104.08691
Google Scholar
Liu, S., Chen, Y., Xie, X., Siow, J.K., Liu, Y.: Retrieval-augmented generation for code summarization via hybrid GNN. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net
Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692
Google Scholar
Lu, S., et al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. CoRR abs/2102.04664
Google Scholar
McBurney, P.W., McMillan, C.: Automatic source code summarization of context for Java methods. IEEE Trans. Softw. Eng. 42(2), 103–119 (2016)
Article Google Scholar
Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L.L., Vijay-Shanker, K.: Automatic generation of natural language summaries for java classes. In: IEEE 21st International Conference on Program Comprehension, ICPC 2013, San Francisco, CA, USA, 20–21 May, 2013, pp. 23–32. IEEE Computer Society (2013)
Google Scholar
Rozière, B., Lachaux, M.A., Szafraniec, M., Lample, G.: Dobf: A deobfuscation pre-training objective for programming languages. In: NeurIPS
Google Scholar
Shido, Y., Kobayashi, Y., Yamamoto, A., Miyamoto, A., Matsumura, T.: Automatic source code summarization with extended tree-lstm. In: International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14–19, 2019, pp. 1–8. IEEE (2019)
Google Scholar
Sridhara, G., Hill, E., Muppaneni, D., Pollock, L.L., Vijay-Shanker, K.: Towards automatically generating summary comments for java methods. In: ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, September 20–24, 2010, pp. 43–52. ACM (2010)
Google Scholar
Sun, W., et al.: An extractive-and-abstractive framework for source code summarization. CoRR abs/2206.07245 (2022)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215
Google Scholar
Tang, Z., Li, C., Ge, J., Shen, X., Zhu, Z., Luo, B.: Ast-transformer: encoding abstract syntax trees efficiently for code summarization. In: 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15–19, 2021, pp. 1193–1195. IEEE (2021)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations
Google Scholar
Wang, Y., Wang, W., Joty, S., Hoi., S.C.: Codet 5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Google Scholar
Zhang, J., Wang, X., Zhang, H., Sun, H., Liu, X.: Retrieval-based neural source code summarization. In: ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, pp. 1385–1397. ACM (2020)
Google Scholar
Zhao, Y., Shen, X., Bi, W., Aizawa, A.: Unsupervised rewriter for multi-sentence compression. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 2235–2240. Association for Computational Linguistics (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Peng Chen, Shaojuan Wu, Ziqiang Chen, Jiarui Zhang, Xiaowang Zhang & Zhiyong Feng

Authors

Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shaojuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ziqiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiarui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaowang Zhang .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, P., Wu, S., Chen, Z., Zhang, J., Zhang, X., Feng, Z. (2024). LenANet: A Length-Controllable Attention Network for Source Code Summarization. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_43

Download citation

DOI: https://doi.org/10.1007/978-981-99-8145-8_43
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8144-1
Online ISBN: 978-981-99-8145-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics