Abstract
Source code summarization aims at generating brief description of a source code. Existing approaches have made great breakthroughs through encoder-decoder models. They focus on learning common features contained in translation from source code to natural language summaries. As a result, they tend to generate generic summaries independent of the context and lack of details. However, specific summaries which characterize specific features of code snippets are widely present in real-world scenarios. Such summaries are rarely studied as capturing specific features of source code would be difficult. What’s more, only the common features learned would result in only the generic short summaries generated. In this paper, we present LenANet to generate specific summaries by considering the desired length information and extracting the specific code sentence. Firstly, we introduce length offset vector to force the generation of summaries which could contain specific amount of information, laying the groundwork for generating specific summaries. Further, forcing the model to generate summaries with a certain length would bring in invalid or generic descriptions, a context-aware code sentence extractor is proposed to extract specific features corresponding to specific information. Besides, we present a innovative sentence-level code tree to capture the structural semantics and learn the representation of code sentence by graph attention network, which is crucial for specific features extraction. The experiments on CodeXGLUE datasets with six programming language demonstrate that LenANet significantly outperforms the baselines and has the potential to generate specific summaries. In particular, the overall BLEU-4 is improved by 0.53 on the basis of CodeT5 with length control.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.: A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 4998–5007. Association for Computational Linguistics (2020)
Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.: Unified pre-training for program understanding and generation. CoRR abs/2103.06333
Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2091–2100. JMLR.org
Dathathri, S., et al.: Plug and play language models: a simple approach to controlled text generation. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net
Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1536–1547. Association for Computational Linguistics (2020)
Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. CoRR abs/2002.08155
Holtzman, A., Buys, J., Forbes, M., Choi, Y.: The curious case of neural text degeneration. CoRR abs/1904.09751
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics (2016)
Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C., Socher, R.: CTRL: a conditional transformer language model for controllable generation. CoRR abs/1909.05858
LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13–15, 2020, pp. 184–195. ACM (2020)
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. CoRR abs/2104.08691
Liu, S., Chen, Y., Xie, X., Siow, J.K., Liu, Y.: Retrieval-augmented generation for code summarization via hybrid GNN. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692
Lu, S., et al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. CoRR abs/2102.04664
McBurney, P.W., McMillan, C.: Automatic source code summarization of context for Java methods. IEEE Trans. Softw. Eng. 42(2), 103–119 (2016)
Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L.L., Vijay-Shanker, K.: Automatic generation of natural language summaries for java classes. In: IEEE 21st International Conference on Program Comprehension, ICPC 2013, San Francisco, CA, USA, 20–21 May, 2013, pp. 23–32. IEEE Computer Society (2013)
Rozière, B., Lachaux, M.A., Szafraniec, M., Lample, G.: Dobf: A deobfuscation pre-training objective for programming languages. In: NeurIPS
Shido, Y., Kobayashi, Y., Yamamoto, A., Miyamoto, A., Matsumura, T.: Automatic source code summarization with extended tree-lstm. In: International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14–19, 2019, pp. 1–8. IEEE (2019)
Sridhara, G., Hill, E., Muppaneni, D., Pollock, L.L., Vijay-Shanker, K.: Towards automatically generating summary comments for java methods. In: ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, September 20–24, 2010, pp. 43–52. ACM (2010)
Sun, W., et al.: An extractive-and-abstractive framework for source code summarization. CoRR abs/2206.07245 (2022)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215
Tang, Z., Li, C., Ge, J., Shen, X., Zhu, Z., Luo, B.: Ast-transformer: encoding abstract syntax trees efficiently for code summarization. In: 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15–19, 2021, pp. 1193–1195. IEEE (2021)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations
Wang, Y., Wang, W., Joty, S., Hoi., S.C.: Codet 5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Zhang, J., Wang, X., Zhang, H., Sun, H., Liu, X.: Retrieval-based neural source code summarization. In: ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, pp. 1385–1397. ACM (2020)
Zhao, Y., Shen, X., Bi, W., Aizawa, A.: Unsupervised rewriter for multi-sentence compression. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 2235–2240. Association for Computational Linguistics (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, P., Wu, S., Chen, Z., Zhang, J., Zhang, X., Feng, Z. (2024). LenANet: A Length-Controllable Attention Network for Source Code Summarization. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_43
Download citation
DOI: https://doi.org/10.1007/978-981-99-8145-8_43
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8144-1
Online ISBN: 978-981-99-8145-8
eBook Packages: Computer ScienceComputer Science (R0)