More Web Proxy on the site http://driver.im/

research-article

Deep code comment generation

Authors:

Zhi JinAuthors Info & Claims

ICPC '18: Proceedings of the 26th Conference on Program Comprehension

Pages 200 - 210

https://doi.org/10.1145/3196321.3196334

Published: 28 May 2018 Publication History

Abstract

During software maintenance, code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in the software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named DeepCom to automatically generate code comments for Java methods. The generated comments aim to help developers understand the functionality of Java methods. DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. We use a deep neural network that analyzes structural information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects from GitHub. We evaluate the experimental results on a machine translation metric. Experimental results demonstrate that our method DeepCom outperforms the state-of-the-art by a substantial margin.

References

[1]

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281--293.

Digital Library

[2]

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 38--49.

Digital Library

[3]

Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2017. A Survey of Machine Learning for Big Code and Naturalness. arXiv preprint arXiv:1709.06182 (2017).

[4]

Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International Conference on Machine Learning. 2091--2100.

[5]

Miltos Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In International Conference on Machine Learning. 2123--2132.

Digital Library

[6]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. Computer Science (2014).

[7]

Manfred Broy, Florian Deißenböck, and Markus Pizka. 2005. A holistic approach to software quality at work. In Proc. 3rd World Congress for Software Quality (3WCSQ).

[8]

Raymond PL Buse and Westley R Weimer. 2010. Automatically documenting program changes. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 33--42.

Digital Library

[9]

Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, and Shankar Kumar. 2012. Large scale language modeling in automatic speech recognition. arXiv preprint arXiv:1210.8440 (2012).

[10]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).

Digital Library

[11]

Brian P Eddy, Jeffrey A Robinson, Nicholas A Kraft, and Jeffrey C Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on. IEEE, 13--22.

[12]

Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 631--642.

Digital Library

[13]

Xiaodong Gu, Hongyu Zhang, DongmeiZhang, and SunghunKim. 2017. DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning. arXiv preprint arXiv:1704.07734 (2017).

Digital Library

[14]

Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2. ACM, 223--226.

Digital Library

[15]

Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In Reverse Engineering (WCRE), 2010 17th Working Conference on. IEEE, 35--44.

Digital Library

[16]

Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 763--773.

Digital Library

[17]

Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 837--847.

Digital Library

[18]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[19]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In ACL (1).

[20]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135--146.

Digital Library

[21]

Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, et al. 2016. Google's multilingual neural machine translation system: enabling zero-shot translation. arXiv preprint arXiv:1611.04558 (2016).

[22]

Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M Rush. 2017. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 (2017).

[23]

Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes. arXiv preprint arXiv:1704.04856 (2017).

[24]

Paul W McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 279--290.

Digital Library

[25]

Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on. IEEE, 23--32.

[26]

Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In AAAI, Vol. 2. 4.

Digital Library

[27]

Lili Mou, Rui Men, Ge Li, Lu Zhang, and Zhi Jin. 2015. On end-to-end program generation from user intention by deep neural networks. arXiv preprint arXiv:1510.07211 (2015).

[28]

Dana Movshovitz-Attias and William W Cohen. 2013. Natural language models for predicting programming comments. (2013).

[29]

Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2013. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, 532--542.

Digital Library

[30]

Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation (t). In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, 574--584.

Digital Library

[31]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.

Digital Library

[32]

Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the naturalness of buggy code. In Proceedings of the 38th International Conference on Software Engineering. ACM, 428--439.

Digital Library

[33]

Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from big code. In ACM SIGPLAN Notices, Vol. 50. ACM, 111--124.

Digital Library

[34]

Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).

[35]

Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 43--52.

Digital Library

[36]

Giriprasad Sridhara, Lori Pollock, and K Vijay-Shanker. 2011. Automatically detecting and describing high level actions within methods. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 101--110.

Digital Library

[37]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

Digital Library

[38]

Jeffrey Svajlenko and Chanchal K Roy. 2016. A Machine Learning Based Approach for Evaluating Clone Detection Tools for a Generalized and Accurate Precision. International Journal of Software Engineering and Knowledge Engineering 26, 09n10 (2016), 1399--1429.

[39]

Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).

[40]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering. ACM, 297--308.

Digital Library

[41]

Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 87--98.

Digital Library

[42]

Edmund Wong, Taiyue Liu, and Lin Tan. 2015. Clocom: Mining existing source code for automatic comment generation. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on. IEEE, 380--389.

[43]

Edmund Wong, Jinqiu Yang, and Lin Tan. 2013. Autocomment: Mining question and answer sites for automatic comment generation. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 562--567.

Digital Library

[44]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).

[45]

Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shanping Li. 2017. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering (2017).

[46]

Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, and Xiaoming Li. 2015. Neural generative question answering. arXiv preprint arXiv:1512.01337 (2015).

Digital Library

[47]

Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. arXiv preprint arXiv:1704.01696 (2017).

[48]

Sai Zhang, Cheng Zhang, and Michael D Ernst. 2011. Automated documentation inference to explain failed tests. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 63--72.

Digital Library

Cited By

Chen XChen JLian ZHuang YZhou XWu YZheng Z(2025)An alternative to code comment generation? Generating comment from bytecodeInformation and Software Technology10.1016/j.infsof.2024.107623179(107623)Online publication date: Mar-2025
https://doi.org/10.1016/j.infsof.2024.107623
Vu TBui TDo TNguyen TVo HNguyen S(2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
https://doi.org/10.1016/j.infsof.2024.107543
Ni AAllamanis MCohan ADeng YShi KSutton CYin PSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)NEXTProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693610(37929-37956)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693610
Show More Cited By

Index Terms

Deep code comment generation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation

Recommendations

DeepCommenter: a deep code comment generation tool with hybrid lexical and syntactical information
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

As the scale of software projects increases, the code comments are more and more important for program comprehension. Unfortunately, many code comments are missing, mismatched or outdated due to tight development schedule or other reasons. Automatic code ...
Deep code comment generation with hybrid lexical and syntactical information
Abstract
During software maintenance, developers spend a lot of time understanding the source code. Existing studies show that code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. ...
A Comparative Study on Method Comment and Inline Comment
Code comments are one of the important documents to help developers review and comprehend source code. In recent studies, researchers have proposed many deep learning models to generate the method header comments (i.e., method comment), which have ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC '18: Proceedings of the 26th Conference on Program Comprehension

May 2018

423 pages

ISBN:9781450357142

DOI:10.1145/3196321

General Chair:
Foutse Khomh
École Polytechnique de Montréal, Canada
,
Program Chairs:
Chanchal K. Roy
University of Saskatchewan, Canada
,
Janet Siegmund
University of Passau, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Basic Research Program of China (the 973 Program)

Conference

ICSE '18

Sponsor:

SIGSOFT
IEEE-CS

ICSE '18: 40th International Conference on Software Engineering

May 28 - 29, 2018

Gothenburg, Sweden

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

400
Total Citations
View Citations
2,497
Total Downloads

Downloads (Last 12 months)388
Downloads (Last 6 weeks)36

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen XChen JLian ZHuang YZhou XWu YZheng Z(2025)An alternative to code comment generation? Generating comment from bytecodeInformation and Software Technology10.1016/j.infsof.2024.107623179(107623)Online publication date: Mar-2025
https://doi.org/10.1016/j.infsof.2024.107623
Vu TBui TDo TNguyen TVo HNguyen S(2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
https://doi.org/10.1016/j.infsof.2024.107543
Ni AAllamanis MCohan ADeng YShi KSutton CYin PSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)NEXTProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693610(37929-37956)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693610
Fu HTan JZhang PLi FSun JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)PinNetProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692636(14157-14174)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692636
Shi YYin YYu MChu L(2024)CogCol: Code Graph-Based Contrastive Learning Model for Code SummarizationElectronics10.3390/electronics1310181613:10(1816)Online publication date: 8-May-2024
https://doi.org/10.3390/electronics13101816
Delmo ADespi JArpon P(2024)A 5 Year Bibliometric Review of Programming Language Research Dynamics in Southeast Asia (2018-2023)SSRN Electronic Journal10.2139/ssrn.4706647Online publication date: 2024
https://doi.org/10.2139/ssrn.4706647
Hou XZhao YLiu YYang ZWang KLi LLuo XLo DGrundy JWang H(2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3695988
Zhang JWang CLi AWang WLi TLiu YFilkov VRay BZhou M(2024)VulAdvisor: Natural Language Suggestion Generation for Software Vulnerability RepairProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695555(1932-1944)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695555
Hu CChai YZhou HMeng FZhou JGu XFilkov VRay BZhou M(2024)How Effectively Do Code Language Models Understand Poor-Readability Code?Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695072(795-806)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695072
Xu QYang DZhang L(2024)Code Optimization Chain-of-Thought: Structured Understanding and Self-CheckingProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690479(425-430)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3690407.3690479
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents