[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3544902.3546251acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

MMF3: Neural Code Summarization Based on Multi-Modal Fine-Grained Feature Fusion

Published: 19 September 2022 Publication History

Abstract

Background: Code summarization automatically generates the corresponding natural language descriptions according to the input code to characterize the function implemented by source code. Comprehensiveness of code representation is critical to code summarization task. However, most existing approaches typically use coarse-grained fusion methods to integrate multi-modal features. They generally represent different modalities of a piece of code, such as an Abstract Syntax Tree (AST) and a token sequence, as two embeddings and then fuse the two ones at the AST/code levels. Such a coarse integration makes it difficult to learn the correlations between fine-grained code elements across modalities effectively. Aims: This study intends to improve the model’s prediction performance for high-quality code summarization by accurately aligning and fully fusing semantic and syntactic structure information of source code at node/token levels. Method: This paper proposes a Multi-Modal Fine-grained Feature Fusion approach (MMF3) for neural code summarization. The method uses the Transformer architecture. In particular, we introduce a novel fine-grained fusion method, which allows fine-grained fusion of multiple code modalities at the token and node levels. Specifically, we use this method to fuse information from both token and AST modalities and apply the fused features to code summarization. Results: We conduct experiments on one Java and one Python datasets, and evaluate generated summaries using four metrics. The results show that: 1) the performance of our model outperforms the current state-of-the-art models, and 2) the ablation experiments show that our proposed fine-grained fusion method can effectively improve the accuracy of generated summaries. Conclusion: MMF3 can mine the relationships between cross-modal elements and perform accurate fine-grained element-level alignment fusion accordingly. As a result, more clues can be provided to improve the accuracy of the generated code summaries.

References

[1]
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. arXiv preprint arXiv:2005.00653(2020).
[2]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400(2018).
[3]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.
[4]
Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of python functions and documentation strings for automated code documentation and code generation. arXiv preprint arXiv:1707.02275(2017).
[5]
Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5, 2 (1994), 157–166.
[6]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014).
[7]
Thomas A Corbi. 1989. Program understanding: Challenge for the 1990s. IBM Systems Journal 28, 2 (1989), 294–306.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
[9]
Brian P Eddy, Jeffrey A Robinson, Nicholas A Kraft, and Jeffrey C Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 13–22.
[10]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155(2020).
[11]
Shuzheng Gao, Cuiyun Gao, Yulan He, Jichuan Zeng, Lun Yiu Nie, and Xin Xia. 2021. Code structure guided transformer for source code summarization. arXiv preprint arXiv:2104.09340(2021).
[12]
Y. Gao and C. Lyu. 2022. M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization. In 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC). IEEE Computer Society, 24–35.
[13]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249–256.
[14]
Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, and Zheng Zhang. 2020. Multi-scale self-attention for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7847–7854.
[15]
Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In 2010 acm/ieee 32nd international conference on software engineering, Vol. 2. IEEE, 223–226.
[16]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In 2010 17th Working Conference on Reverse Engineering. IEEE, 35–44.
[17]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[18]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). IEEE, 200–20010.
[19]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2020. Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 25, 3 (2020), 2179–2217.
[20]
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred api knowledge. (2018).
[21]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2073–2083.
[22]
Thomas D LaToza, Gina Venolia, and Robert DeLine. 2006. Maintaining mental models: a study of developer work habits. In Proceedings of the 28th international conference on Software engineering. 492–501.
[23]
Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved code summarization via a graph neural network. In Proceedings of the 28th international conference on program comprehension. 184–195.
[24]
Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A neural model for generating natural language summaries of program subroutines. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 795–806.
[25]
Jia Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, and Zhi Jin. 2021. Editsum: A retrieve-and-edit framework for source code summarization. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 155–166.
[26]
Zheng Li, Yonghao Wu, Bin Peng, Xiang Chen, Zeyu Sun, Yong Liu, and Deli Yu. 2021. SeCNN: A semantic CNN parser for code comment generation. Journal of Systems and Software 181 (2021), 111036.
[27]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
[28]
Yangyang Lu, Zelong Zhao, Ge Li, and Zhi Jin. 2017. Learning to generate comments for api-based code snippets. In Software Engineering and Methodology for Emerging Domains. Springer, 3–14.
[29]
Paul W McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension. 279–290.
[30]
Paul W McBurney and Collin McMillan. 2015. Automatic source code summarization of context for java methods. IEEE Transactions on Software Engineering 42, 2 (2015), 103–119.
[31]
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 23–32.
[32]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Thirtieth AAAI conference on artificial intelligence.
[33]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
[34]
Paige Rodeghero, Cheng Liu, Paul W McBurney, and Collin McMillan. 2015. An eye-tracking study of java programmers and application to source code summarization. IEEE Transactions on Software Engineering 41, 11 (2015), 1038–1054.
[35]
Ensheng Shi, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2021. Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. arXiv preprint arXiv:2108.12987(2021).
[36]
Yusuke Shido, Yasuaki Kobayashi, Akihiro Yamamoto, Atsushi Miyamoto, and Tadayuki Matsumura. 2019. Automatic source code summarization with extended tree-lstm. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
[37]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 43–52.
[38]
Giriprasad Sridhara, Lori Pollock, and K Vijay-Shanker. 2011. Automatically detecting and describing high level actions within methods. In 2011 33rd International Conference on Software Engineering (ICSE). IEEE, 101–110.
[39]
Ze Tang, Chuanyi Li, Jidong Ge, Xiaoyu Shen, Zheling Zhu, and Bin Luo. 2021. AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code Summarization. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1193–1195.
[40]
Ted Tenny. 1988. Program readability: Procedures versus comments. IEEE Transactions on Software Engineering 14, 9 (1988), 1271.
[41]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[42]
Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4566–4575.
[43]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 397–407.
[44]
Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip Yu, and Guandong Xu. 2020. Reinforcement-learning-guided source code summarization via hierarchical attention. IEEE Transactions on software Engineering(2020).
[45]
Yanlin Wang, Ensheng Shi, Lun Du, Xiaodi Yang, Yuxuan Hu, Shi Han, Hongyu Zhang, and Dongmei Zhang. 2021. CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network. arXiv preprint arXiv:2107.01933(2021).
[46]
Bolin Wei, Yongmin Li, Ge Li, Xin Xia, and Zhi Jin. 2020. Retrieve and refine: exemplar-based neural comment generation. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 349–360.
[47]
Edmund Wong, Taiyue Liu, and Lin Tan. 2015. Clocom: Mining existing source code for automatic comment generation. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 380–389.
[48]
Scott N Woodfield, Hubert E Dunsmore, and Vincent Yun Shen. 1981. The effect of modularization and comments on program comprehension. In Proceedings of the 5th international conference on Software engineering. 215–223.
[49]
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shanping Li. 2017. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering 44, 10 (2017), 951–976.
[50]
Rui Xie, Wei Ye, Jinan Sun, and Shikun Zhang. 2021. Exploiting method names to improve code summarization: A deliberation multi-task learning approach. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 138–148.
[51]
Guang Yang, Xiang Chen, Jinxin Cao, Shuyuan Xu, Zhanqi Cui, Chi Yu, and Ke Liu. 2021. ComFormer: Code Comment Generation via Transformer and Fusion Method-based Hybrid Code Representation. In 2021 8th International Conference on Dependable Systems and Their Applications (DSA). IEEE, 30–41.
[52]
Zhen Yang, Jacky Keung, Xiao Yu, Xiaodong Gu, Zhengyuan Wei, Xiaoxue Ma, and Miao Zhang. 2021. A multi-modal transformer-based code summarization approach for smart contracts. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 1–12.
[53]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based neural source code summarization. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 1385–1397.
[54]
Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, and Stephan Günnemann. 2021. Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:2103.11318(2021).

Cited By

View all
  • (2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
  • (2024)Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is NeededACM Transactions on Software Engineering and Methodology10.1145/365215633:6(1-35)Online publication date: 27-Jun-2024
  • (2024)Enhancing code summarization with action word predictionNeurocomputing10.1016/j.neucom.2023.126777563:COnline publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement
September 2022
318 pages
ISBN:9781450394277
DOI:10.1145/3544902
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code Summarization
  2. Fine-grained Fusion
  3. Multi-modal Features
  4. Transformer

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ESEM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)5
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
  • (2024)Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is NeededACM Transactions on Software Engineering and Methodology10.1145/365215633:6(1-35)Online publication date: 27-Jun-2024
  • (2024)Enhancing code summarization with action word predictionNeurocomputing10.1016/j.neucom.2023.126777563:COnline publication date: 1-Jan-2024
  • (2024)A review of automatic source code summarizationEmpirical Software Engineering10.1007/s10664-024-10553-629:6Online publication date: 7-Oct-2024
  • (2023)Multimodal Learning for Automatic Summarization: A SurveyAdvanced Data Mining and Applications10.1007/978-3-031-46664-9_25(362-376)Online publication date: 27-Aug-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media