[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is Needed

Published: 27 June 2024 Publication History

Abstract

With the fast development of large software projects, automatic code summarization techniques, which summarize the main functionalities of a piece of code using natural languages as comments, play essential roles in helping developers understand and maintain large software projects. Many research efforts have been devoted to building automatic code summarization approaches. Typical code summarization approaches are based on deep learning models. They transform the task into a sequence-to-sequence task, which inputs source code and outputs summarizations in natural languages. All code summarization models impose different input size limits, such as 50 to 10,000, for the input source code. However, how the input size limit affects the performance of code summarization models still remains under-explored. In this article, we first conduct an empirical study to investigate the impacts of different input size limits on the quality of generated code comments. To our surprise, experiments on multiple models and datasets reveal that setting a low input size limit, such as 20, does not necessarily reduce the quality of generated comments.
Based on this finding, we further propose to use function signatures instead of full source code to summarize the main functionalities first and then input the function signatures into code summarization models. Experiments and statistical results show that inputs with signatures are, on average, more than 2 percentage points better than inputs without signatures and thus demonstrate the effectiveness of involving function signatures in code summarization. We also invite programmers to do a questionnaire to evaluate the quality of code summaries generated by two inputs with different truncation levels. The results show that function signatures generate, on average, 9.2% more high-quality comments than full code.

References

[1]
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 4998–5007.
[2]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18)). OpenReview.net.
[3]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the 33nd International Conference on Machine Learning (ICML’16) (JMLR Workshop and Conference Proceedings), Maria-Florina Balcan and Kilian Q. Weinberger (Eds.). Vol. 48. JMLR.org, 2091–2100.
[4]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proc. ACM Program. Lang. 3, POPL (2019), 40:1–40:29.
[5]
Rosemary A. Bailey. 2008. Design of Comparative Experiments. Vol. 25. Cambridge University Press.
[6]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare R. Voss (Eds.). Association for Computational Linguistics, 65–72.
[7]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33, 1877–1901.
[8]
Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, and Yao Chen. 2020. TAG : Type auxiliary guiding for code comment generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 291–301.
[9]
Qiuyuan Chen, Xin Xia, Han Hu, David Lo, and Shanping Li. 2021. Why my code summarization model does not work: Code comment improvement with category prediction. ACM Trans. Softw. Eng. Methodol. 30, 2 (2021), 25:1–25:29.
[10]
Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation. In Proceedings fo the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.
[11]
Wuyan Cheng, Po Hu, Shaozhi Wei, and Ran Mo. 2022. Keyword-guided abstractive code summarization via incorporating structural and contextual information. Inf. Softw. Technol. 150 (2022), 106987.
[12]
Nadezhda Chirkova and Sergey Troshin. 2021. Empirical study of transformers for source code. In Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’21), Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 703–715.
[13]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL 2019), Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 2978–2988.
[14]
Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. The Association for Computer Linguistics.
[15]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP’20, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Vol. EMNLP 2020. Association for Computational Linguistics, 1536–1547.
[16]
Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured neural summarization. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.
[17]
Zhipeng Gao, Xin Xia, John Grundy, David Lo, and Yuan-Fang Li. 2020. Generating question titles for stack overflow from mined code snippets. ACM Trans. Softw. Eng. Methodol. 29, 4 (2020), 1–37.
[18]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training code representations with data flow. In Proceedings of the 9th International Conference on Learning Representations (ICLR’21). OpenReview.net.
[19]
Juncai Guo, Jin Liu, Yao Wan, Li Li, and Pingyi Zhou. 2022. Modeling hierarchical syntax structure with triplet position for source code summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22), Volume 1: Long Papers, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 486–500.
[20]
Dorsaf Haouari, Houari A. Sahraoui, and Philippe Langlais. 2011. How good is your comment? A study of comments in Java programs. In Proceedings of the 5th International Symposium on Empirical Software Engineering and Measurement (ESEM’11). IEEE Computer Society, 137–146.
[21]
Hao He. 2019. Understanding source code comments at large-scale. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/SIGSOFT FSE’19), Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 1217–1219.
[22]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension (ICPC’18), Foutse Khomh, Chanchal K. Roy, and Janet Siegmund (Eds.). ACM, 200–210.
[23]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2020. Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 25, 3 (2020), 2179–2217.
[24]
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred API knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18), Jérôme Lang (Ed.). ijcai.org, 2269–2275.
[25]
Xing Hu, Xin Xia, David Lo, Zhiyuan Wan, Qiuyuan Chen, and Thomas Zimmermann. 2022. Practitioners’ expectations on automated code comment generation. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). ACM, 1693–1705.
[26]
Yuan Huang, Hanyang Guo, Xi Ding, Junhuai Shu, Xiangping Chen, Xiapu Luo, Zibin Zheng, and Xiaocong Zhou. 2023. A comparative study on method comment and inline comment. ACM Trans. Softw. Eng. Methodol. 32, 5 (2023), 126:1–126:26.
[27]
Yuan Huang, Xinyu Hu, Nan Jia, Xiangping Chen, Yingfei Xiong, and Zibin Zheng. 2020. Learning code context information to predict comment locations. IEEE Trans. Reliab. 69, 1 (2020), 88–105.
[28]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. The Association for Computer Linguistics.
[29]
Junaed Younus Khan and Gias Uddin. 2022. Automatic code documentation generation using GPT-3. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE’22). ACM, 174:1–174:6.
[30]
Li Kuang, Cong Zhou, and Xiaoxian Yang. 2022. Code comment generation based on graph neural network enhanced transformer model for code understanding in open-source software ecosystems. Autom. Softw. Eng. 29, 2 (2022), 43.
[31]
Md. Tahmid Rahman Laskar, M. Saiful Bari, Mizanur Rahman, Md Amran Hossen Bhuiyan, Shafiq Joty, and Jimmy Xiangji Huang. 2023. A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023. Association for Computational Linguistics. 431–469. DOI:
[32]
Alexander LeClair, Aakash Bansal, and Collin McMillan. 2021. Ensemble models for neural source code summarization of subroutines. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’21). IEEE, 286–297.
[33]
Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved code summarization via a graph neural network. In Proceedings of the 28th International Conference on Program Comprehension (ICPI’20). ACM, 184–195.
[34]
Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A neural model for generating natural language summaries of program subroutines. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19), Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE/ACM, 795–806.
[35]
Wenpeng Li, Yingkui Cao, Junfeng Zhao, Yanzhen Zou, and Bing Xie. 2017. Toward summary extraction method for functional topic. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C’17). IEEE, 16–23.
[36]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated graph sequence neural networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), Yoshua Bengio and Yann LeCun (Eds.).
[37]
Zheng Li, Yonghao Wu, Bin Peng, Xiang Chen, Zeyu Sun, Yong Liu, and Deli Yu. 2021. SeCNN: A semantic CNN parser for code comment generation. J. Syst. Softw. 181 (2021), 111036.
[38]
Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, and Rongxin Wu. 2021. Improving code summarization with block-wise abstract syntax tree splitting. In Proceedings of the 29th IEEE/ACM International Conference on Program Comprehension (ICPC’21). IEEE, 184–195.
[39]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, 74–81.
[40]
Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A neural architecture for generating natural language descriptions from source code changes. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Volume 2: Short Papers, Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, 287–292.
[41]
Zheng Ma, Yuexiu Gao, Lei Lyu, and Chen Lyu. 2022. MMF3: Neural code summarization based on multi-modal fine-grained feature fusion. In Proceedings of the ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’22), Fernanda Madeiral, Casper Lassenius, Tayana Conte, and Tomi Männistö (Eds.). ACM, 171–182.
[42]
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori L. Pollock, and K. Vijay-Shanker. 2013. Automatic generation of natural language summaries for Java classes. In Proceedings of the IEEE 21st International Conference on Program Comprehension (ICPC’13). IEEE Computer Society, 23–32.
[43]
Najam Nazar, Yan Hu, and He Jiang. 2016. Summarizing software artifacts: A literature review. J. Comput. Sci. Technol. 31, 5 (2016), 883–909.
[44]
Pengyu Nie, Jiyang Zhang, Junyi Jessy Li, Raymond J. Mooney, and Milos Gligoric. 2022. Impact of evaluation methodologies on code summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22), Volume 1: Long Papers, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 4936–4960.
[45]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, Vol. 35, 27730–27744.
[46]
Sebastiano Panichella. 2018. Summarization techniques for code, change, testing, and user feedback. In Proceedings of the IEEE Workshop on Validation, Analysis and Evolution of Software Tests (VST@SANER’18), Cyrille Artho and Rudolf Ramler (Eds.). IEEE, 1–5.
[47]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL, 311–318.
[48]
Sawan Rai, Ramesh Chandra Belwal, and Atul Gupta. 2022. A review on source code documentation. ACM Trans. Intell. Syst. Technol. 13, 5 (2022), 84:1–84:44.
[49]
Pooja Rani, Arianna Blasi, Nataliia Stulova, Sebastiano Panichella, Alessandra Gorla, and Oscar Nierstrasz. 2023. A decade of code comment quality assessment: A systematic literature review. J. Syst. Softw. 195 (2023), 111515.
[50]
Devjeet Roy, Sarah Fakhoury, and Venera Arnaoudova. 2021. Reassessing automatic evaluation metrics for code summarization tasks. In Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’21), Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 1105–1116.
[51]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Volume 1: Long Papers, Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, 1073–1083.
[52]
Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18), Volume 2 (Short Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 464–468.
[53]
Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, and Hongbin Sun. 2022. On the evaluation of neural code summarization. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). ACM, 1597–1608.
[54]
Lin Shi, Fangwen Mu, Xiao Chen, Song Wang, Junjie Wang, Ye Yang, Ge Li, Xin Xia, and Qing Wang. 2022. Are we building on the rock? on the importance of data preprocessing for code summarization. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’22), Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 107–119.
[55]
Yusuke Shido, Yasuaki Kobayashi, Akihiro Yamamoto, Atsushi Miyamoto, and Tadayuki Matsumura. 2019. Automatic source code summarization with extended tree-LSTM. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’19). IEEE, 1–8.
[56]
Xiaotao Song, Hailong Sun, Xu Wang, and Jiafei Yan. 2019. A survey of automatic generation of source code comments: Algorithms and techniques. IEEE Access 7 (2019), 111411–111428.
[57]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori L. Pollock, and K. Vijay-Shanker. 2010. Towards automatically generating summary comments for Java methods. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (ASE’10), Charles Pecheur, Jamie Andrews, and Elisabetta Di Nitto (Eds.). ACM, 43–52.
[58]
Giriprasad Sridhara, Lori L. Pollock, and K. Vijay-Shanker. 2011. Generating parameter comments and integrating with method summaries. In Proceedings of the 19th IEEE International Conference on Program Comprehension (ICPC’11). IEEE Computer Society, 71–80.
[59]
Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley Weimer, Kevin Leach, and Yu Huang. 2020. A human study of comprehension and code summarization. In Proceedings of the 28th International Conference on Program Comprehension (ICPC’20). ACM, 2–13.
[60]
Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization. Autom. Softw. Eng. 31, 1 (2024), 22. DOI:
[61]
Weisong Sun, Chunrong Fang, Yudu You, Yun Miao, Yi Liu, Yuekang Li, Gelei Deng, Shenghan Huang, Yuchen Chen, Quanjun Zhang, et al. 2023. Automatic code summarization via ChatGPT: How far are we? arXiv:2305.12865. Retrieved from https://arxix.org/abs/2305.12865
[62]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15), Volume 1: Long Papers. The Association for Computer Linguistics, 1556–1566.
[63]
Ze Tang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, Liguo Huang, Zheling Zhu, and Bin Luo. 2022. AST-Trans: Code summarization with efficient tree-structured attention. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). ACM, 150–162.
[64]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https://arxiv.org/abs/2302.13971
[65]
Jonathan Tow, Marco Bellagente, Dakota Mahan, and Carlos Riquelme. 2023. StableLM 3B 4E1T. Retrieved from https://huggingface.co/stabilityai/stablelm-3b-4e1t
[66]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008.
[67]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18), Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 397–407.
[68]
Ruyun Wang, Hanwen Zhang, Guoliang Lu, Lei Lyu, and Chen Lyu. 2020. Fret: Functional reinforced transformer with BERT for code summarization. IEEE Access 8 (2020), 135591–135604.
[69]
Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip S. Yu, and Guandong Xu. 2022. Reinforcement-learning-guided source code summarization using hierarchical attention. IEEE Trans. Softw. Eng. 48, 2 (2022), 102–119.
[70]
Xiaoran Wang, Lori L. Pollock, and K. Vijay-Shanker. 2017. Automatically generating natural language descriptions for object-related statement sequences. In Proceedings of the IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER’17), Martin Pinzger, Gabriele Bavota, and Andrian Marcus (Eds.). IEEE Computer Society, 205–216.
[71]
Yu Wang, Yu Dong, Xuesong Lu, and Aoying Zhou. 2022. GypSum: Learning hybrid representations for code summarization. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension (ICPC 2022), Ayushi Rastogi, Rosalia Tufano, Gabriele Bavota, Venera Arnaoudova, and Sonia Haiduc (Eds.). ACM, 12–23.
[72]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V. Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, Vol. 35 (2022), 24824–24837.
[73]
Edmund Wong, Taiyue Liu, and Lin Tan. 2015. CloCom: Mining existing source code for automatic comment generation. In Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER’15), Yann-Gaël Guéhéneuc, Bram Adams, and Alexander Serebrenik (Eds.). IEEE Computer Society, 380–389.
[74]
Edmund Wong, Jinqiu Yang, and Lin Tan. 2013. AutoComment: Mining question and answer sites for automatic comment generation. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’13), Ewen Denney, Tevfik Bultan, and Andreas Zeller (Eds.). IEEE, 562–567.
[75]
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li. 2018. Measuring program comprehension: A large-scale field study with professionals. IEEE Trans. Softw. Eng. 44, 10 (2018), 951–976.
[76]
Liu Yu, Sun Xiaobing, and Li Bin. 2017. Research on automatic summarization for java packages. J. Front. Comput. Sci. Technol. 11, 2 (2017), 212.
[77]
Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. CodeCMR: Cross-modal retrieval for function-level binary source code matching. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS’20), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).
[78]
Chunyan Zhang, Qinglei Zhou, Meng Qiao, Ke Tang, Lianqiu Xu, and Fudong Liu. 2022. Re_Trans: Combined retrieval and transformer model for source code summarization. Entropy 24, 10 (2022), 1372.
[79]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based neural source code summarization. In Proceedings of the 42nd International Conference on Software Engineering (ICSE’20), Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1385–1397.
[80]
Fengrong Zhao, Junqi Zhao, and Yang Bai. 2020. A survey of automatic generation of code comments. In Proceedings of the 4th International Conference on Management Engineering, Software Engineering and Service Sciences (ICMSS’20). ACM, 21–25.

Cited By

View all
  • (2025)An alternative to code comment generation? Generating comment from bytecodeInformation and Software Technology10.1016/j.infsof.2024.107623179(107623)Online publication date: Mar-2025

Index Terms

  1. Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is Needed

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 6
    July 2024
    951 pages
    EISSN:1557-7392
    DOI:10.1145/3613693
    • Editor:
    • Mauro Pezzé
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2024
    Online AM: 14 March 2024
    Accepted: 17 February 2024
    Revised: 15 February 2024
    Received: 25 July 2023
    Published in TOSEM Volume 33, Issue 6

    Check for updates

    Author Tags

    1. Code summarization
    2. function signature
    3. empirical study

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R&D Program of China
    • National Natural Science Foundation of China
    • Natural Science Foundation of Guangdong Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)522
    • Downloads (Last 6 weeks)84
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)An alternative to code comment generation? Generating comment from bytecodeInformation and Software Technology10.1016/j.infsof.2024.107623179(107623)Online publication date: Mar-2025

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media