[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3196321.3196334acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Deep code comment generation

Published: 28 May 2018 Publication History

Abstract

During software maintenance, code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in the software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named DeepCom to automatically generate code comments for Java methods. The generated comments aim to help developers understand the functionality of Java methods. DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. We use a deep neural network that analyzes structural information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects from GitHub. We evaluate the experimental results on a machine translation metric. Experimental results demonstrate that our method DeepCom outperforms the state-of-the-art by a substantial margin.

References

[1]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281--293.
[2]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 38--49.
[3]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2017. A Survey of Machine Learning for Big Code and Naturalness. arXiv preprint arXiv:1709.06182 (2017).
[4]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International Conference on Machine Learning. 2091--2100.
[5]
Miltos Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In International Conference on Machine Learning. 2123--2132.
[6]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. Computer Science (2014).
[7]
Manfred Broy, Florian Deißenböck, and Markus Pizka. 2005. A holistic approach to software quality at work. In Proc. 3rd World Congress for Software Quality (3WCSQ).
[8]
Raymond PL Buse and Westley R Weimer. 2010. Automatically documenting program changes. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 33--42.
[9]
Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, and Shankar Kumar. 2012. Large scale language modeling in automatic speech recognition. arXiv preprint arXiv:1210.8440 (2012).
[10]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[11]
Brian P Eddy, Jeffrey A Robinson, Nicholas A Kraft, and Jeffrey C Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on. IEEE, 13--22.
[12]
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 631--642.
[13]
Xiaodong Gu, Hongyu Zhang, DongmeiZhang, and SunghunKim. 2017. DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning. arXiv preprint arXiv:1704.07734 (2017).
[14]
Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2. ACM, 223--226.
[15]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In Reverse Engineering (WCRE), 2010 17th Working Conference on. IEEE, 35--44.
[16]
Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 763--773.
[17]
Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 837--847.
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[19]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In ACL (1).
[20]
Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135--146.
[21]
Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, et al. 2016. Google's multilingual neural machine translation system: enabling zero-shot translation. arXiv preprint arXiv:1611.04558 (2016).
[22]
Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M Rush. 2017. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 (2017).
[23]
Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes. arXiv preprint arXiv:1704.04856 (2017).
[24]
Paul W McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension. ACM, 279--290.
[25]
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In Program Comprehension (ICPC), 2013 IEEE 21st International Conference on. IEEE, 23--32.
[26]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In AAAI, Vol. 2. 4.
[27]
Lili Mou, Rui Men, Ge Li, Lu Zhang, and Zhi Jin. 2015. On end-to-end program generation from user intention by deep neural networks. arXiv preprint arXiv:1510.07211 (2015).
[28]
Dana Movshovitz-Attias and William W Cohen. 2013. Natural language models for predicting programming comments. (2013).
[29]
Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2013. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, 532--542.
[30]
Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation (t). In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, 574--584.
[31]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.
[32]
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the naturalness of buggy code. In Proceedings of the 38th International Conference on Software Engineering. ACM, 428--439.
[33]
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from big code. In ACM SIGPLAN Notices, Vol. 50. ACM, 111--124.
[34]
Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).
[35]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering. ACM, 43--52.
[36]
Giriprasad Sridhara, Lori Pollock, and K Vijay-Shanker. 2011. Automatically detecting and describing high level actions within methods. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 101--110.
[37]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[38]
Jeffrey Svajlenko and Chanchal K Roy. 2016. A Machine Learning Based Approach for Evaluating Clone Detection Tools for a Generalized and Accurate Precision. International Journal of Software Engineering and Knowledge Engineering 26, 09n10 (2016), 1399--1429.
[39]
Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).
[40]
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering. ACM, 297--308.
[41]
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 87--98.
[42]
Edmund Wong, Taiyue Liu, and Lin Tan. 2015. Clocom: Mining existing source code for automatic comment generation. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on. IEEE, 380--389.
[43]
Edmund Wong, Jinqiu Yang, and Lin Tan. 2013. Autocomment: Mining question and answer sites for automatic comment generation. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 562--567.
[44]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
[45]
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shanping Li. 2017. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering (2017).
[46]
Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, and Xiaoming Li. 2015. Neural generative question answering. arXiv preprint arXiv:1512.01337 (2015).
[47]
Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. arXiv preprint arXiv:1704.01696 (2017).
[48]
Sai Zhang, Cheng Zhang, and Michael D Ernst. 2011. Automated documentation inference to explain failed tests. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 63--72.

Cited By

View all
  • (2025)An alternative to code comment generation? Generating comment from bytecodeInformation and Software Technology10.1016/j.infsof.2024.107623179(107623)Online publication date: Mar-2025
  • (2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
  • (2024)NEXTProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693610(37929-37956)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPC '18: Proceedings of the 26th Conference on Program Comprehension
May 2018
423 pages
ISBN:9781450357142
DOI:10.1145/3196321
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. comment generation
  2. deep learning
  3. program comprehension

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • National Basic Research Program of China (the 973 Program)

Conference

ICSE '18
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)388
  • Downloads (Last 6 weeks)36
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)An alternative to code comment generation? Generating comment from bytecodeInformation and Software Technology10.1016/j.infsof.2024.107623179(107623)Online publication date: Mar-2025
  • (2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
  • (2024)NEXTProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693610(37929-37956)Online publication date: 21-Jul-2024
  • (2024)PinNetProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692636(14157-14174)Online publication date: 21-Jul-2024
  • (2024)CogCol: Code Graph-Based Contrastive Learning Model for Code SummarizationElectronics10.3390/electronics1310181613:10(1816)Online publication date: 8-May-2024
  • (2024)A 5 Year Bibliometric Review of Programming Language Research Dynamics in Southeast Asia (2018-2023)SSRN Electronic Journal10.2139/ssrn.4706647Online publication date: 2024
  • (2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
  • (2024)VulAdvisor: Natural Language Suggestion Generation for Software Vulnerability RepairProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695555(1932-1944)Online publication date: 27-Oct-2024
  • (2024)How Effectively Do Code Language Models Understand Poor-Readability Code?Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695072(795-806)Online publication date: 27-Oct-2024
  • (2024)Code Optimization Chain-of-Thought: Structured Understanding and Self-CheckingProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690479(425-430)Online publication date: 21-Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media