[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

LenANet: A Length-Controllable Attention Network for Source Code Summarization

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1965))

Included in the following conference series:

  • 749 Accesses

Abstract

Source code summarization aims at generating brief description of a source code. Existing approaches have made great breakthroughs through encoder-decoder models. They focus on learning common features contained in translation from source code to natural language summaries. As a result, they tend to generate generic summaries independent of the context and lack of details. However, specific summaries which characterize specific features of code snippets are widely present in real-world scenarios. Such summaries are rarely studied as capturing specific features of source code would be difficult. What’s more, only the common features learned would result in only the generic short summaries generated. In this paper, we present LenANet to generate specific summaries by considering the desired length information and extracting the specific code sentence. Firstly, we introduce length offset vector to force the generation of summaries which could contain specific amount of information, laying the groundwork for generating specific summaries. Further, forcing the model to generate summaries with a certain length would bring in invalid or generic descriptions, a context-aware code sentence extractor is proposed to extract specific features corresponding to specific information. Besides, we present a innovative sentence-level code tree to capture the structural semantics and learn the representation of code sentence by graph attention network, which is crucial for specific features extraction. The experiments on CodeXGLUE datasets with six programming language demonstrate that LenANet significantly outperforms the baselines and has the potential to generate specific summaries. In particular, the overall BLEU-4 is improved by 0.53 on the basis of CodeT5 with length control.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 63.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 79.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/salesforce/CodeT5.

References

  1. Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.: A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 4998–5007. Association for Computational Linguistics (2020)

    Google Scholar 

  2. Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.: Unified pre-training for program understanding and generation. CoRR abs/2103.06333

    Google Scholar 

  3. Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2091–2100. JMLR.org

    Google Scholar 

  4. Dathathri, S., et al.: Plug and play language models: a simple approach to controlled text generation. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net

    Google Scholar 

  5. Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1536–1547. Association for Computational Linguistics (2020)

    Google Scholar 

  6. Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. CoRR abs/2002.08155

    Google Scholar 

  7. Holtzman, A., Buys, J., Forbes, M., Choi, Y.: The curious case of neural text degeneration. CoRR abs/1904.09751

    Google Scholar 

  8. Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics (2016)

    Google Scholar 

  9. Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C., Socher, R.: CTRL: a conditional transformer language model for controllable generation. CoRR abs/1909.05858

    Google Scholar 

  10. LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13–15, 2020, pp. 184–195. ACM (2020)

    Google Scholar 

  11. Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. CoRR abs/2104.08691

    Google Scholar 

  12. Liu, S., Chen, Y., Xie, X., Siow, J.K., Liu, Y.: Retrieval-augmented generation for code summarization via hybrid GNN. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net

    Google Scholar 

  13. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692

    Google Scholar 

  14. Lu, S., et al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. CoRR abs/2102.04664

    Google Scholar 

  15. McBurney, P.W., McMillan, C.: Automatic source code summarization of context for Java methods. IEEE Trans. Softw. Eng. 42(2), 103–119 (2016)

    Article  Google Scholar 

  16. Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L.L., Vijay-Shanker, K.: Automatic generation of natural language summaries for java classes. In: IEEE 21st International Conference on Program Comprehension, ICPC 2013, San Francisco, CA, USA, 20–21 May, 2013, pp. 23–32. IEEE Computer Society (2013)

    Google Scholar 

  17. Rozière, B., Lachaux, M.A., Szafraniec, M., Lample, G.: Dobf: A deobfuscation pre-training objective for programming languages. In: NeurIPS

    Google Scholar 

  18. Shido, Y., Kobayashi, Y., Yamamoto, A., Miyamoto, A., Matsumura, T.: Automatic source code summarization with extended tree-lstm. In: International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14–19, 2019, pp. 1–8. IEEE (2019)

    Google Scholar 

  19. Sridhara, G., Hill, E., Muppaneni, D., Pollock, L.L., Vijay-Shanker, K.: Towards automatically generating summary comments for java methods. In: ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, September 20–24, 2010, pp. 43–52. ACM (2010)

    Google Scholar 

  20. Sun, W., et al.: An extractive-and-abstractive framework for source code summarization. CoRR abs/2206.07245 (2022)

    Google Scholar 

  21. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215

    Google Scholar 

  22. Tang, Z., Li, C., Ge, J., Shen, X., Zhu, Z., Luo, B.: Ast-transformer: encoding abstract syntax trees efficiently for code summarization. In: 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15–19, 2021, pp. 1193–1195. IEEE (2021)

    Google Scholar 

  23. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations

    Google Scholar 

  24. Wang, Y., Wang, W., Joty, S., Hoi., S.C.: Codet 5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021

    Google Scholar 

  25. Zhang, J., Wang, X., Zhang, H., Sun, H., Liu, X.: Retrieval-based neural source code summarization. In: ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, pp. 1385–1397. ACM (2020)

    Google Scholar 

  26. Zhao, Y., Shen, X., Bi, W., Aizawa, A.: Unsupervised rewriter for multi-sentence compression. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 2235–2240. Association for Computational Linguistics (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaowang Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, P., Wu, S., Chen, Z., Zhang, J., Zhang, X., Feng, Z. (2024). LenANet: A Length-Controllable Attention Network for Source Code Summarization. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_43

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8145-8_43

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8144-1

  • Online ISBN: 978-981-99-8145-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics