[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

A Comparative Study on Method Comment and Inline Comment

Published: 22 July 2023 Publication History

Abstract

Code comments are one of the important documents to help developers review and comprehend source code. In recent studies, researchers have proposed many deep learning models to generate the method header comments (i.e., method comment), which have achieved encouraging results. The comments in the method, which is called inline comment, are also important for program comprehension. Unfortunately, they have not received enough attention in automatic generation when comparing with the method comments. In this paper, we compare and analyze the similarities and differences between the method comments and the inline comments. By applying the existing models of generating method comments to the inline comment generation, we find that these existing models perform worse on the task of inline comment generation. We then further explore the possible reasons and obtain a number of new observations. For example, we find that there are a lot of templates (i.e., comments with the same or similar structures) in the method comment dataset, which makes the models perform better. Some terms were thought to be important (e.g., API calls) in the comment generation by previous study does not significantly affect the quality of the generated comments, which seems counter-intuitive. Our findings may give some implications for building the approaches of method comment or inline comment generation in the future.

1 Introduction

Code comments in programs are very important because they record the thoughts and intentions of the developers [1, 2, 3, 4]. They play a vital role in program comprehension, software maintenance, and other software related work [5, 6, 7, 8]. However, writing comments in the source code is time-consuming and tedious [9, 10]. Previous research shows that developers often neglect to write comments due to the tight development schedule [11, 12, 14]. To automatically complete code documentation and improve programming efficiency, researchers have proposed many approaches to automatically generate comments [16, 19, 20, 21, 22, 23, 24].
Existing comment generation approaches are mainly for method comments, which refer to the comment locating before a method [13, 25] to provide a summary description of the entire method [15, 27]. Another kind of comments are inside the method, which are called inline comments [14, 26] (also called block comment). They usually explain the next few lines of code, such as the implementation of a more specific functionality, and form a complementary relationship with method comments. As shown in Figure 1, Mark 1 shows a method comment, and Mark 2 is an inline comment.
Fig. 1.
Fig. 1. Example of method comment and inline comment.
Since the majority of existing studies focus on the automatic generation of the method comment [16, 19, 20, 21, 22, 23, 24, 28] and they rarely pay attention to inline comments, we are curious about the proportion of method comments and inline comments used by developers. We collect method comments and inline comments from a large number of open source projects to see how prevalent are them in the program documentation. The distribution of dataset is shown in Table 1. The dataset including 998 projects shows that the number of method comments and inline comments are 975,765 and 973,525, respectively. These comments are distributed in 167,466 classes and 143,763 classes respectively. Each class has 5.83 method comments and 6.77 inline comments in average. It indicates that developers use the inline comment as many as method comment. Besides, the average code length of method and inline are 94.86 tokens and 85.48 tokens respectively, which indicates that method code length is usually longer than inline code length.
Table 1.
TypeNumberClassComments contained in each classAverage code length
Method comment975,765167,4665.8394.86
Inline comment973,525143,7636.7785.48
Table 1. Dataset Distribution for Method and Inline Comments
In this article, we first build a method comment dataset and an inline comment dataset, and then conduct a comparative study of the method comments and inline comments. In detail, we compare the template usage, word usage, the relationship between comments and code, language style in method comments and inline comments, respectively. We further apply existing comment generation models to both method comment and inline comment and analyze their performance.
To have a meaningful comparative study, we explore the following Research Questions (RQs):
RQ1. What is the number of comments generated based on templates in the method comment dataset and the inline comment dataset, respectively? Template-based comment generation technology is widely used in code and comment generation [23, 29, 30]. In order to find out the distribution of the template-based comments in different kinds of comments, we employ an automatic approach to find the templates.
RQ2. Are there different writing styles for method comment and inline comment? In this study, we investigate the writing styles including word usage in comment, tokens in comments (i.e., token specifically refers to API, variable, basic data type, reference data type in the code), and part of speech (hereinafter referred to as POS).
RQ3. Can method comment generation models be well applied to generating inline comment and why? We apply method comment generation models to generating inline comment, and use the comment generation evaluation criteria to assess the performance of the models.
After studying the three RQs, we come to some conclusions.
There are obviously more method comments generated based on templates than inline comments.
In terms of writing styles, the words used in method comments are more concentrated and the words used in inline comments are more diverse.
We also find that method comments tend to mention tokens in the code more than inline comments.
The existing comment generation model performs better on method comments than on inline comments. The wording style of the comments is a reason. But there is no obvious evidence to support that comments that mention tokens in the code are easier to generate. At the same time, the existence of template comments will make method comments perform better.
To facilitate research and application, our source code1 and datasets2 are released, including: the experimental scripts, a method comment dataset, and an inline comment dataset. We describe the basic requirements and steps for running the proposed method. To the best of our knowledge, the inline comment dataset is the first pure dataset that only includes inline comments, which can be used in the inline comment generation task for researchers. We also release the method comment dataset, and we remove a template comments from the dataset and keep only one comment per template. Then, the dataset is better at reflecting the true performance of a model if using our released dataset for training and testing the model.
The rest of the article is organized as follows. First, we introduce related work in Section 2, including empirical research on code comments and code comment generation. Then we introduce several methodologies used in this article in Section 3, including data collection and analysis, detecting the comments automatically generated based on templates and comment generation. The major findings will be arranged in Section 4. Section 5 gives a discussion and Section 6 is the threats to validity. Finally, the conclusion and future work will be given in Section 7.

2 Related Work

2.1 Empirical Study on Code Comment

In recent years, more and more researchers have carried out empirical research on code comment. Abdulkadir Şeker et al. [38] were curious about if comments are as crucial as code contributions on open-source software platforms. They proposed novel developer metrics to do empirical research and concluded that writing comments to describe any feature of the code is as valuable as code. Vishal Misra et al. [39] were interested in the correlation between code comments and issues in GitHub. They first classified the comments into two categories, Relevant or Auxiliary. They then performed various experiments to explore the correlation between code comments and issues on 625 Python repositories from GitHub. They pointed out that there was some relation between code comments and issues that the higher the relevant comment percentage, the fewer days it took to solve the issues. The novelty of this research is to explore the relationship between comments and issues, which can guide developers to write more standardized comments. In addition, Chen et al. [62] classified code comment into six categories, include what, why, how-to-use, how-it-is-done, property, and others. They conducted an experiment to investigate the performance of different state-of-the-art code summarization approaches on the categories, and found that the performance of different code summarization approaches varies substantially across the categories. In the procedure of classifying comments, three programmers manually labeled the data which consists of 20,000 code-comment pairs. They proved that with a simple and basic classifier, we can promote the performance of code summarization. With a view to what types of comment researchers focus on when assessing comment quality, Pooja Rani et al. [63] presented a systematic literature review of the last decade of research in software engineering and investigated the comment types researchers target. The scope of comments under assessment includes class, API, method(function), package, license, or inline comments. They observed that 50% of the studies analyze all types of code comments and the rest focus on studying a specific type of comments, indicating research interest in leveraging a particular type of comment for specific development tasks.
Fengcai Wen et al. [40] launched a large-scale empirical study on code-comment inconsistencies. They analyzed different types of commits to find out which commit types were more likely to trigger comment updates. They believed that their findings could be used to guide the development of tools for fixing code-comment inconsistencies. Sean Stapleton et al. [41] noticed the deficiencies of the metrics currently used to evaluate the quality of model-generated comments, such as BLEU and ROUGE. They did a human study in which students and professional developers were asked to do a series of tasks around code comments. Some of the participants were faced with human-written comments, while others were faced with model-generated comments. They found that participants were able to complete the task significantly better with the help of human-written comments. Although the participants did not perceive the difference in quality between human-written and model-generated comments. Moreover, whether the model-generated comments could help developers complete tasks better was not related to evaluation metrics. They believed that new evaluation metrics are needed to measure the quality of the model-generated comments. Gros et al. [42] analyzed the differences between code-comment translation and nature language translation based on data feature and evaluation metrics. They found that the outputs of code-comment translation are more repetitive. Besides, they found that nature language translation has more dependency between input and output. They also analyzed the effectiveness of different version of BLEU score, which showed that different BLEU versions would cause huge differences in the BLEU value, which may have a great impact on the experimental results. Therefore, it was necessary to establish an identical comment evaluation standard. Moreover, in order to ensure the readability and naturalness of code comments as a natural language, some studies will also use manual evaluation as qualitative evaluation. For example, Wang et al. [13] proposed a python comment generation approach based on reinforcement learning. It not only utilized BLEU to evaluate the approach but also adopted human evaluation. They invited some people to evaluate the generated comment from naturalness and informativeness. Shi et al. [43] also proposed a human evaluation based on naturalness, informativeness, and similarity. In summary, the manual evaluation of code comments is mainly based on whether the comments conform to natural language grammar (i.e., naturalness) and whether they can accurately reflect the function of the code (i.e., informativeness).
Another empirical study mainly focuses on the comment density. Oman et al. [31] and Barranco et al. [32] assessed the proportion of code comments in a software system to evaluate the comment quality. However, this evaluation metric is crude, as some redundant comments (e.g., copyright comments) were also considered. Arafat et al. [33, 34] conducted an empirical study and the results showed that the open source projects were consistently well documented with an average comment density of 18.67%. In another two studies, Siy et al. [35] found a consistent comment density of around 50%, while Elish et al. [36] found an average comment density of 15.2% with a standard deviation of 12.2% in 100 Java open source classes. However, the small size of these two studies make it hard to compare these studies with our work. Jiang et al. [37] study the evolution of code comments in the PostgreSQL project via utilizing the data recovered from CVS. Their study reveals that the percentage of functions with header and non-header comments remains consistent throughout the development history.
These empirical research can help developer understand the characteristic of comments from different perspectives. Although there are many empirical studies on code comments, it seems that few researchers regard inline comment as the main research object. In this article, we focus on the similarities and differences between method comments and inline comments, and explore some practical issues about inline comments, such as comment generation.

2.2 Code Comment Generation

A variety of methods for automatic code comment generation have been proposed [16, 19, 20, 21, 22, 23, 24, 44, 47, 48, 57, 58, 59]. These methods aimed to generate brief natural language summaries for source code. It is a critical task in software engineering and programmers can benefit a lot from it whenever they are reading or writing codes. According to different objects to be commented, code comment generation can be divided into three types: class comment generation [44], method comment generation [19], and inline comment generation [1]. Since a single class often covers a lot of content, it is difficult to generate comments describing all the functions of the class at once. Therefore, there is a limited approach based on generating code comments directly at the class level. The most representative research came from Moreno L et al. [48]. They presented a technique to automatically generate readable comments for Java classes, and they determined the class and method stereotypes and uses them, in conjunction with heuristics, to select the information to be included in the generated comment.
Currently, most of the approaches focused on method comment generation and inline comment generation. As for method comment generation, formerly the main approaches of method comment generation were manually template-based. For example, Giriprasad Sridhara et al. [44] utilized the Software Word Usage Model (SWUM) and predefined some heuristic rules to identify keywords from code text and generated the templated comments for Java methods. This kind of approaches could generated well-formed comments and sometimes could accurately summary the code functions. However, creating a such model needed quite a few manpower to design the rules and templates, which was the main influence on the performance of the model. After that, some research proposed to mine external sources libraries (e.g., technical Q&A websites, code corpus, bug tracking systems, mailing lists) to generate method comment generation. Stack Overflow and GitHub are the main mining sources for these research [16]. For instance, Vassallo et al. [45] proposed an approach of mining large-scale Q&A data from the technical Q&A website StackOverflow to automatically generate method comments. Specifically, they mined discussions on StackOverflow based on heuristics with the aim of identifying method descriptions. Recently, some learning-based techniques of natural language processing were used to generate method comment. This kind of approaches took the code and the comment as two different types of language and translated one into another [49]. Besides, because a method can be parsed to a intermediate representation like AST, some research also utilized AST as another kind of input. DeepCom (Xing Hu et al.) [19] was one of them, exploiting the structural property of source code by means of ASTs and using SBT method to traverse ASTs to generate input sequences. Moreover, Code2Seq (Uri Alon et al.) [20] picked K pairs of leave nodes randomly in AST, which could form K paths of the tree, and then used them to represent the source code. Yusuke Shido et al. [21] developed multi-way Tree-LSTM, using the LSTM-based model to encode the nodes of ASTs from bottom up. The approaches above extracted features information either from text or AST. Hybrid-Deepcom (Xing Hu et al.) [22] and ast-attendgru (LeClair A et al.) [23] were models fusing both semantic and structural features by taking both code text and AST as inputs. Experiment results showed that AST could well represent the structural property and improve the quality of producing comment, but it was language-specific with a fixed size dictionary. To solve the problem, Moore J et al. [24] proposed a CNN-based model, splitting all the tokens into characters and some frequent subtokens. In this case, all codes were treated as character sequences and the size of the dictionary was limited and numerically small.
Similar approaches were also utilized in inline comment generation. As for template-based comment generation approach, Sridhara G et al. [47] presented an approach for identifying code fragments of statement sequences, conditionals, and loops that can be abstracted as a high level action, then they automatically synthesized a natural language description for these fragments based on the predefined templates. Some inline comment generation research also proposed to mine external sources libraries. Wong et al. [1] proposed to mine code-descriptions from a large programming Q&A site, and then leveraged these mappings to generate comments automatically for similar code segments matched in open-source projects. Based on this research, Wong et al. also used code clone detection technology to search for reusable code comments from open source software code libraries [2]. This approach could only generate usable code comments for 85 code fragments in 21 large open source projects. Therefore, the approaches based on mining the external resource library had a large room for improvement in the success rate of generating inline comments. Learning-based techniques were also utilized in inline comment generation. Some approaches treated the code text as a sequence while some treated the Abstract Syntax Tree (AST) as a sequence. Srinivasan Iyer et al. [16] presented CODE-NN, an LSTM-based neural network with attention, whose input was code text sequence and the output was comment tokens sequence. Learning-based methods do not require templates and rules anymore and they can learn the patterns by themselves. CODE-NN splits the code text into several tokens and treats these tokens as a sequence. The model can extract semantic information from the names of tokens but the structural information of codes is not used. Huang et al. [17] proposed to utilize heuristic rules and learning-based approach to collect inline code-comment pairs and constructed a reinforcement learning-based approach to generate inline comments. They utilized code snippets and AST sequences which were attained with a statement-based traversal way. The result outperformed the baselines and state-of-the-art in comment generation.
In this article, we will use several classic method comment generation and inline comment generation approaches (i.e., Seq2Seq, DeepCom, Code2Seq) to generate the method and inline comments, and then make a comparative study between the generated method comments and inline comments.

3 Methodology

3.1 Overview

The process of our research can be divided into four steps, which are shown in Figure 2. Firstly, we collect the method code-comment data and inline code-comment data. Then we analyze the feature differences between the two kinds of comments. We counted the distribution of these two types of comments on word usage, tokens in code and commnets and POS, and so no. After that, we utilize AEL algorithm [53] to identify and extract the template comment from method comment and inline comment. The AEL algorithm includes anonymize, tokenize, categorize and reconcile. At last, we utilize some comment generation models (Seq2seq, code2seq and Deepcom) to evaluate the performance of generation of these two kinds of comments.
Fig. 2.
Fig. 2. Overview of our approach.

3.2 Data Collection and Analysis

In order to investigate the comments in the source code, we collect a dataset from GitHub. According to the score provided by GitHub, we download the top 1,000 Java projects. All these projects have comments. There are 998 projects after we filter out the ones with non-English comments. The dataset contains 1,949,290 comments from the 998 projects: 975,765 method comments and 973,525 inline comments.
For method comment, its coverage is the whole method, which can be easily determined [46]. For inline comments, we employ our previous inline comment scope detection method [50] to identify their covered code snippets. The detection method utilizes features of code snippets and comments to detect the scope of the inline comment in the Java program. The accuracy of the detection method is about 81.45%. The inline comment scope detection method is described as follows:
Features Extraction: To automatically identify the comment scope, the detection method extracts features both from the code line and the comment. The features are divided into three dimensions including code features, comment features, and code comment relationship features. Code features determine the comment scope from the code line types, the nested level of code lines, method calls and variables usage, and so on, while comment features capture comment scope from the words choice in a comment, i.e., counting the verbs and nouns in the comment. Meanwhile, the detection method also extracts the features from the correlation relationship between the comment and the code line to determine the scope of a comment, such as the textual and semantic similarity between the comment and the code line. The more introduction regarding features extraction can be found in [50].
Comment Scope Detection: In order to classify statements into two categories: within and outside the scope of the inline comments, a comment scope detection model by utilizing the supervised machine learning algorithms is built. In our previous study, we manually validate the scope of the inline comments to collect an inline comment dataset. Specifically, for the code lines in the comment-code pair, the first out-of-scope code line is regarded as the demarcation point of the scope of the comment. The more explicit example is in Figure 3. As we can see, line 5 is classified as the first out-of-scope statement, so the scope of the comment is line 2 to line 3. Then, the code lines in the scope of the comments are labeled as “1”, and the ones out-of-scope are labeled as “0”.
Fig. 3.
Fig. 3. An example of comment scope.
After well training, we apply the inline comment scope detection method to identify the scope of each inline comment, we can collect an inline comment dataset from the selected projects. Table 2 shows the numbers, distributions, and sizes of method and inline comments collected from the 998 projects. Here we use Prop(%) to represent the average proportion of the methods with the method comment in the three kinds of methods (i.e., public method, protected method, and private method). For example, if the proportions of the methods with method comment in three projects are 10%, 20%, and 15% respectively, then the Prop(%) here is 15% (i.e., the average result). It is worth noting that inline comments do not have an obvious “total amount” like methods. So we use the proportion of the number of code lines covered by the inline comments on all code lines to calculate the proportion of inline comments in each project. Project(%) refers to the proportion of projects that contain comments to all projects. # Sent refers to the average number of sentences of comments. # Word refers to the average number of words per sentence in comment.
Table 2.
TypeNumberProp(%)Project(%)# Sent# Word
Method975,76515.9893.092.7215.07
Method (public)823,18515.5188.942.2015.17
Method (protected)58,64027.6956.062.1814.31
Method (private)93,94015.8673.421.9314.59
Inline973,52510.9098.701.139.16
Table 2. Statistical Results for the Comments
We can observe that 15.98% of methods are commented. We further study the method comments according to the visibility of methods, i.e., public, protected, and private. It shows that the number of protected methods with method comment is the least, but it has the highest proportion of comments. The Prop(%) of public methods is closed to the Prop(%) of private methods.
There are 2.72 and 1.13 sentences on average in method comments and inline comments, respectively. The method and inline comment contain an average of 15.07, and 9.16 words per sentence, respectively. It indicates that developers use shorter sentences in inline comments. It should be noted that not all the projects contain these two types of comments. Among 998 projects, 6.91% of the projects have no method comments, and only 1.30% of the projects have no inline comments.

3.3 Detecting the Comments Automatically Generated Based on Templates

In practice, most auto-generated comments in open source software are generated by IDEs with predefined templates [52]. Considering the template definition can be very flexible, these auto-generated comments are of different documentation style and cannot be filtered using simple rules, e.g., filtering the comments which contain keyword “auto-generated”. In order to find out the comments generated using the same template, we utilize the abstraction technique to recognize and recover the internal structure of each comment. Using the recovered structure, comments can be easily categorized. The recovery of text comment structure is similar to the recovery of log file structure. We apply the AEL approach [53], which was used to abstract execution logs, to detect the comments automatically generated based on templates. In [53], the precision and recall of this approach were not less than 84.2% and 82.4% respectively. Our task is similar with the task in [53]. Figure 4 is a flow chart of AEL. There are four steps: Anonymize, Tokenize, Categorize, and Reconcile.
Fig. 4.
Fig. 4. The AEL approach.
(1)
Anonymize: In this step, AEL uses heuristics to recognize dynamic tokens in comments. The heuristic rules are defined based on domain knowledge. The following are two heuristics to recognize dynamic parts in comment: 1. Phrases like “@author value”; 2. Phrases like “Date: value”. If the AEL recognizes the dynamic tokens, it will replace them with a generic token (we use \(\lt *\gt\) in this article).
(2)
Tokenize: The tokenize step clusters the comments in a coarse-grained level. After the anonymize step, a comment consists of two parts: word part and generic token part. AEL uses the number of words and the number of generic tokens to do the clustering. Comment messages with the same number of words and the same number of generic tokens are divided into the same cluster.
(3)
Categorize: Based on the clustering results from the tokenize step, the categorize step further clusters the comments at a fine-grained level. In each cluster, AEL first selects a comment message to form a sub-cluster and extract its template. Then, it compares the template with the other comments in the cluster. If a comment conforms to the template, it will be added to the sub-cluster. If not all the comments are added to existing sub-clusters, AEL will continue this process in the comments which are not added to any sub-cluster, by randomly selecting a comment in the cluster to form a new sub-cluster and compare its template with other comments. After the categorize step, all the comments in the cluster are divided into sub-clusters and each sub-cluster has a template.
(4)
Reconcile: The incomplete definition of heuristic rules in the anonymize step, results in some similar comment messages that are assigned to different fine-grained clusters in the categorize step. The reconcile step is proposed to deal with this problem. For each coarse-grained cluster, AEL re-examines all the existing templates. Two sub-clusters are merged if the similarity between their representative templates is larger than a user-defined threshold (50% in the experiments). The reconcile step reduces the number of clusters, making the result more reasonable.
By applying AEL, we distinguish many kinds of template-based generated comments and some similar noise comments. One of the noise comments is commented code. The details of the results will be shown in RQ1. We eliminate these automatically generated comments and noise comments before we do other analysis.

3.4 Comment Generation Models

To evaluate whether the existing method comment generation models can be applied to generate inline comments. We select some representative models to evaluate the efficiency of comment generation in our experiments, and they are:
Seq2Seq [18]: this is a very famous model in the field of natural language processing (i.e., NLP). This model was originally proposed to realize automatic translation, that is, translating from a kind of language (e.g., English) to another kind of language (e.g., German). In our task, we treat code as a kind of language and comment as another kind of language to apply this model. Then it can achieve comment generation. Seq2Seq can be utilized in both method comment generation and inline comment generation. It consists of an encoder and a decoder. In this experiment, we treat the code text as a feature to input the encoder and then use the decoder to translate the code text tokens to a comment.
DeepCom [19]: this method is often used as the baseline in the code comment generation task. It treats the ASTs as sequences by taking SBT and then uses seq2seq models to translate every sequence to a brief description.
Code2Seq3 [20]: this method first takes AST to leave nodes as terminals and non-leaf nodes as nonterminals. After that we consider it extracts all pairwise paths between terminals and represent them as sequences of terminal and nonterminal nodes. At last, the approach randomly selects K paths from sequences, and uses the decoder to translate every K paths to a brief description.
The characteristic differences of the three models above are shown in Table 3. These tools are open source. Because they exploit the characteristics of source code or source code syntax structure, and establish associations with natural language, these tools can be applied not only in the field of code comment generation in local development with IDE, but also in the field of commit message generation in code review. The input of Seq2Seq is the code text of the code snippet and the others are the AST. The AST of a code snippet is of tree-structure when the code snippet is a complete unit such as function and class [54]. However, the code snippet of an inline comment is not a complete unit and it can have several sub-tree structures. To have the same process, we first obtain all the sub-trees in the AST of the function including the code snippet, and then find out the LCA (Least Common Ancestors) of these sub-trees. Eventually, we use the LCA to connect all the sub-trees and this is the tree-structure feature of the code snippet.
Table 3.
 Method Comment Generation ModelInline Comment Generation ModelAST InputSource Code Input
Seq2Seq
DeepCom
Code2Seq
Table 3. The Differences among Comment Generation Models
Considering training efficiency for these three models, we do not use the entire dataset (which the number of method comments and inline comments are 975,765 and 973,525, respectively) to conduct experiments. Instead, we randomly select the data sets of the same order of magnitude as those models used in the original studies [18, 19, 20]. Then, the size of the training set and testset of method comment are 483,410 and 48,374, respectively. The size of the training set and testset of the inline comment are 439,068 and 43,969, respectively.
We use BLEU [56], a widely used metric for machine translation problem, to measure the quality of the comments generated by the models. BLEU (Bilingual Evaluation Understudy) measures the similarity between a generated comment and an original comment. The higher the BLEU score is, the more similar the generated comment is to the original comment, and the better the model performs. BLEU uses n-gram for matching and calculates the ratio of N groups of word similarity between generated comments and original comments. The formula of BLEU is as following:
\begin{equation} BLEU = BP \cdot exp\left(\sum _{n = 1}^{m} w_{n}logp_{n}\right) , \end{equation}
(1)
where \(p_{n}\) is the ratio of the subsequence of the generated comment with length n to the reference comment. As the value of n increases(the value of n is 4 in this study), the BLEU score decreases exponentially. BP is the length penalty factor, and its formula is as follows:
\begin{equation} BP = {\left\lbrace \begin{array}{ll}1,& if\quad c \gt r\\ e^{(1-r/c)},& if\quad c \le r \\ \end{array}\right.} , \end{equation}
(2)
where, c represents the length of the generated comment, and r represents the length of the reference comment.

4 Major Findings

In this section, we present our results and discuss the main findings regarding the research questions.
RQ1. What is the number of comments generated based on templates in the method comment dataset and the inline comment dataset?
Previous study [61] found that the duplication in the dataset of code-comment pairs will directly effect the results of the comment generation models (i.e., it produces a better result than the real value). For example, if some samples in the training set and the testset are exactly the same, the model will perform better on these samples. However, the model is actually not as good as it performs. It is likely that the model overfits these samples and has poor generalization ability. In order to analyze how much influence the template has on the training comment generation model later, we firstly propose this RQ. Specifically, we analyze the use of templates in real comments.
We notice that some comments in the dataset are almost the same. These comments differ only in a few words. For example, comments “Find the _Fields constant that matches name, or null if its not found.” and “Find the _Fields constant that matches fieldId, or null if its not found.”. We have marked the different words in italic style. After investigation, we find that most of these similar comments were generated by IDE or some other code language conversion tools with pre-defined comment templates. They add a few specific words to these templates to generate comments. The comments generated from predefined comment templates will be referred to as “template comment”. “template comment” are almost the same when they share the same template.
We try to find out the “template comment” on the comment datasets. We use the AEL algorithm introduced in Section 3 to find out “template comment”. There are three people working on utilizing AEL algorithm. Two people check the results of the AEL algorithm and another one makes the final decision if there is a disagreement between the first two people. It takes about 3 days to execute the AEL process. Table 4 summarizes the results of the AEL algorithm on method comment dataset and inline comment dataset. We can see that in the method comment dataset, the proportion of the “template comment” is very high. The proportion of “template comment” in the method comment dataset is 10 times more than that of the inline comment dataset. Some comments are exactly the same, and we call them “duplicate comment”. “duplicate comment” is also “template comment”, which can be categorized by the AEL algorithm as well. Therefore, the number in the Table 4 includes both “template comment” and “duplicate comment”.
Table 4.
 Method CommentInline Comment
Total975,765973,525
Template Comment131,47010,217
Proportion13.47%1.05%
Table 4. Template Comments in Two Comment Datasets
In the 975,765 method comments, the AEL algorithm divides a total of 457,672 clusters and there are 140 clusters with more than 100 items. While in the 973,525 inline comments, the AEL algorithm gets a total of 433,169 clusters and there are 168 clusters with more than 100 items. We manually check a few clusters with more than 100 items to study the cause of the template. Tables 5 and 6 are the summaries of templates detected by AEL in method and inline comment dataset. Table 7 is the summary of the noisy data in both datasets. The summaries include the explanation of comment clusters, example of templates, and the corresponding number.
Table 5.
No.Explanation of Comment ClusterExample of TemplateMost Recurrent CommentLess Recurrent CommentNumber
1Comments are generated by the predefined comment template in the IDE comment plugin“Returns true if field \(\lt *\gt\) is set (has been \(\lt *\gt\) a value) and false otherwise”“Returns true if field corresponding to is set (has been assigned a value) and false otherwise”“Returns true if field locations is set (has been assigned a value) and false otherwise”2,659
“Find the _Fields constant that matches \(\lt *\gt\) or null if its not found.”“Find the _Fields constant that matches fieldId or null if its not found.”“Find the _Fields constant that matches name or null if its not found.”2,152
“Util method to write an attribute \(\lt *\gt\) the ns prefix”“Util method to write an attribute without the ns prefix”“Util method to write an attribute with the ns prefix”2,001
“@return \(\lt *\gt\) the \(\lt *\gt\)“@return Returns the id”“@return IntegrationType corresponding to the value”1,456
“@param \(\lt *\gt\) The \(\lt *\gt\) to \(\lt *\gt\)“@param id The id to set”“@param Balance The Balance”1,257
2Comments generated when java methods are automatically generated“Auto generated getter method @return \(\lt *\gt\)“Auto generated getter method @return java.lang.String”“Auto generated getter method @return com.amazon.s3.GetObjectResult”1,361
“Auto generated setter method @param param \(\lt *\gt\)“Auto generated setter method @param param RequestId”“Auto generated setter method @param param Topic”1,361
“auto generated Axis2 call back method for \(\lt *\gt\) method”“auto generated Axis2 call back method for putObject method”“auto generated Axis2 call back method for createBucket method”111
“Auto generated add method for the array for convenience @param param \(\lt *\gt\)“Auto generated add method for the array for convenience @param param com.amazon.s3.MetadataEntry”“Auto generated add method for the array for convenience @param param com.amazon.ec2.VpcType”102
Table 5. Template Comments Detected by AEL in Method Comment Dataset
Table 6.
No.Explanation of Comment ClusterExample of TemplateMost Recurrent CommentLess Recurrent CommentNumber
1Comments generated by the open source tool Thrift“check for required fields check for sub-struct validity”“check for required fields check for sub-struct validity”“check for required fields check for sub-struct validity”1,108
2Generated by the Android framework“Inflate the menu; this adds items to the action bar if it is present.”“Inflate the menu; this adds items to the action bar if it is present.”“Inflate the menu; this adds items to the action bar if it is present.”1,090
3Comment of AWS SDK“Bail out if this isn’t the right error code that this marshaller understands”“Bail out if this isn’t the right error code that this marshaller understands”“Bail out if this isn’t the right error code that this marshaller understands”548
4Comments generated when generating Java code from WSDL using Apache Axis2“We can safely assume an element has only one type associated with it”“We can safely assume an element has only one type associated with it”“We can safely assume an element has only one type associated with it”392
5A large number of other template comments appear in the same project“Since the test is generated for protocol version \(\lt *\gt\) which is earlier than latest change in the message (version \(\lt *\gt\) only the bytes after frame length fields are compared)”“Since the test is generated for protocol version (1.0) which is earlier than latest change in the message (version (1.2) only the bytes after frame length fields are compared)”“Since the test is generated for protocol version (1.0) which is earlier than latest change in the message (version (1.4) only the bytes after frame length fields are compared)”240
Table 6. Template Comments Detected by AEL in Inline Comment Dataset
Table 7.
No.Comment typeExplanation of Comment ClusterExample of TemplateNumber
1MethodIndicates inheritance document“{@inheritDoc}” “@inheritDoc”22,989
Commented out code“private \(\lt name\gt \lt *\gt \lt /name\gt\) (long n) this.n = n; ”7,771
Symbolic noise comments“——————————————————”4,783
URL information“https: \(\lt *\gt\)317
2InlineCommented out code“if \(\lt *\gt\) {”“System.out.println(\(\lt *\gt\));”4,029
Symbolic noise comments“========================”2,137
Table 7. Noisy Data Detected by AEL in Method and Inline Comment Dataset
We can observe from Table 5 that the first reason for the comment template is: “Comments are generated by the predefined comment template in the IDE comment plugin”. That is, the comments are generated by the template predefined in the IDE comment plugin. For example, the comment “Returns true if field \(\lt *\gt\) is set (has been \(\lt *\gt\) a value) and false otherwise” describes that the return value is determined by the field \(\lt *\gt\), and the field \(\lt *\gt\) can be replace by a variable when generate the real comment. Another reason for causing the comment template is: “Comments generated when java methods are automatically generated”. This template shows that the comments are generated along with the automatically generated source code. A motivate example is the comments of the method getter() and setter(), “Auto generated getter method @return \(\lt *\gt\)” and “Auto generated setter method @param param \(\lt *\gt\)”, as shown in Table 5. These two comments are generated along with the automatically generated methods getter() and setter().
Table 6 shows the template comments detected by AEL in inline comment dataset. We summary five of the most common template types. We can see that most of the “template comments” are the “duplicate comments”. There is no variable tokens in the template comments. For example, there are 1,108 identical comments “check for required fields check for sub-struct validity” in the first template “Comments generated by the open source tool Thrift”. Therefore, the most recurrent comments and less recurrent comments for each template are the same. We also observe from Table 6 that most of the template comments in inline comment dataset are from the programming frameworks, such as Android (i.e., template 2), AWS SDK (i.e., Amazon Web Services,4 template 3), WSDL (i.e., Web Services Description Language,5 template 4). In addition, there are a large number of template comments appear in the same project, such as the template 5 in Table 6.
Table 7 shows the noisy comments detected by AEL in method and inline comment dataset. Noisy data could be some meaningless symbols or commented out code. The most common noisy comment in the method comment dataset is the “inheritance document”. These comments use the marks “{@inheritDoc}” or “@inheritDoc” as placeholder in the source code, but they are meaningless for explaining the source code, so we classify them as the noisy comments. We can also observe from Table 7 that another two common templates “Symbolic noise comments” and “commented out code” can be found both in the method and inline comment dataset. Obversely, these two comments do nothing to explain the source code.
Summary: we can see by using the AEL algorithm, we find a large number of “template comment” in these comment datasets. Most of the “template comments” in the method comment dataset are generated by the predefined comment template in the IDE comment plugin or generated along with the automatically generated source code. Most of the “template comments” in the inline comment dataset are the “duplicate comments”. Besides, the number of “template comment” for method comments is more than that of inline comments. This is because current IDEs mainly provide the function of generating method comments based on templates, but rarely provide the function of generating inline comments based on templates. Moreover, the noisy comments “Symbolic noise comments” and “commented out code” can be found in both method and inline comment datasets.
RQ2. Are there different writing styles for method comment and inline comment?
In this RQ, we want to explore the writing style difference between method comment and inline comment. The difference in writing styles might explain how the same model behaves differently on different comment generation tasks in the next RQ.
Word usage. We focus on the overall situation of the words used in the two types of comments. After a series of preprocessing of the comment dataset, we analyze the word dictionary composed of the method and inline comments. The preprocessing is as follows. First, we take the first sentence of the comment as the subject of the study. According to the statistical results, the number of words in the first sentence of method and inline comments are 14.71 and 9.84, respectively. We then conduct CamelCase splitting, snake_case splitting, and implement the lemmatization for each word in the comment. Then, we turn every word into a lower case. The first sentence in a method comment is usually considered a summary sentence [51], so we take the first sentence of the method comment for study. Similarly, we also take the first sentence of the inline comment for study. Although inline comments in many cases have only one sentence. In addition, we think it is necessary to split the words in comments with CamelCase and snake_case. It is because some variables or API names from the corresponding code may be mixed into the comments [55]. Word lemmatization is to restore a word to its original form according to its POS. For example, the verb past tense “broken” will be restored to “break”, and the adjective comparative “bigger” will be restored to “big”.
After applying the preprocessing, we count the frequency of the words used in the two types of comments to form the corresponding dictionary. The details of the dictionary are shown in Table 8, and we find that the words used in method comments are more concentrated and the words used in inline comments are more dispersed. The size of the dictionary of method comment is smaller than that of the inline comment. The dictionary of method comment contains 57,553 words, while the dictionary of inline comment contains 87,665 words. We sort the words in the dictionary according to their frequency. In method comments, only the first 54 words are needed when the cumulative frequency of words reaches 50%. The case for inline comments is 84 words. Similarly, when the cumulative frequency of words reaches 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, and 99%, the number of words needed in method comment dictionary is obviously less than that in inline comment dictionary, as shown in Table 8. In particular, if the cumulative frequency of words in inline comments reaches 90%, the word number is 1,431, but 1,431 words can cover 92.75% of method comments. It shows that developers use more abundant words when writing inline comments. This may be that inline comments need to describe source code in more specific situations. Inline comments, therefore, require more extensive expression. While the method comments may describe the features of methods at a higher level when comparing with the inline comments, which may lead the words used in the method comments are more concentrated considering only taking the first sentence of method comment into consideration.
Table 8.
 Size99%98%97%96%95%90%80%70%60%50%
Method57,5539,1944,7193,2362,4892,0291,12845722110754
Inline87,66522,2639,5355,6723,9523,0201,43158329715784
Table 8. Word usage in Comments
Tokens in comments. We would like to discuss which kind of comments is more likely to reference the token in the code. Intuitively, we think that the inline comments are easier to reference tokens in the code because inline comments directly explain the next few code lines. The method comments generally explains the function of the entire method. We study this issue by counting the proportion of the comments that mention the code token to all the comments.
As shown in Table 9, we explore the proportion of comments mentioning a certain type of token in the code among all comments. The types of tokens include variable, API, basic data type, and reference data type. Basic data types refer to the 8 basic types provided by the Java language, such as int, float, string, and so on. The reference data type refers to the user-defined data type, usually a Java class, such as Student, Employee, and so on. The inner API refers to the API that belongs to the same project, and the outer API refers to the external API coming from other projects. To distinguish between inner and outer API, we employ JavaParser6 to identify the API of current project and generate a API list for current project. JavaParser converts Java code into a corresponding Abstract Syntax Tree, and we identify the inner API of current project by traversing the Abstract Syntax Tree. Then, if we find a API in the API list of current project, it will be identified as an inner API, otherwise, it is an outer API.
Table 9.
 VariableAPIAPI(inner)API(outer)Basic data typeReference data type
Method58.58%26.71%2.33%24.38%9.28%30.46%
Inline26.37%24.21%0.25%23.96%1.51%5.96%
Table 9. The Proportion of Comments Mentioning a Certain Type of Token in the Code
The data in Table 9 refers to the proportion of comments that mention the corresponding token type among all comments. For example, there are 10 method comment samples, and 3 of them mention the API name in the corresponding code. In this case, the proportion of comments mentioning API is 30%
We are surprised to find that no matter what type of token it is, the statistical results of method comments are higher than those of inline comments. This is contrary to the intuitive conclusion we mentioned earlier. In addition, we can see that in the two types of comments, the proportion of comments mentioning API is similar. The number of comments mentioning outer API is significantly higher than that mentioning inner API in both types of comments. It is because the outer API may be more difficult to understand since its source code is more difficult to access. The number of outer API is more than that of inner API. For other types of tokens, the proportion of method comments is significantly higher than that of inline comments.
Tokens in commented codes. Based on the analysis of tokens in comments, it can be found that there are some special tokens existing in comments. In order to analyze where and how the comment capture these special tokens from source code, we also count the proportion of these special tokens that appear in comments in source code, which is shown in Table 10. It can be found that method codes have more information about variable, API, basic data type and reference data type. Besides, as it is known that inline codes are located in method codes, so the method codes are the context of inline codes. We count the proportion of these special tokens in inline codes with the context, that is, we count the proportion of these special tokens in source code in method codes including commented inline codes. The proportion of these four kinds of special tokens increases when adding the context. Therefore, the context includes more special token information.
Table 10.
 VariableAPIBasic data typeReference data type
Method6.09%0.78%1.72%11.08%
Inline4.18%0.64%0.29%1.52%
Inline with context14.95%1.28%0.65%3.42%
Table 10. The Proportion of Codes Mentioning a Certain Type of Token in the Comment
POS. We are also interested in the POS of words and phrases in the two types of comments. We use the well-known natural language processing tool NLTK7 to parse the comment sentences and got the POS of each word in the sentence. We count various POS structures in method comments and inline comments, and select the most important and useful 10 POS structures, which is shown in Tables 11 and 12.
Table 11.
 MethodInline
TopPOSRatioPOSRatio
1noun37.70%noun31.44%
2verb15.50%verb16.05%
3determiner15.08%prep or conj12.60%
4prep or conj13.92%determiner9.72%
5adjective8.04%adjective8.79%
6adverb2.00%adverb5.14%
7pronoun0.77%pronoun2.56%
8modal auxiliary0.65%modal auxiliary1.51%
9genitive marker0.25%particle0.30%
10particle0.12%genitive marker0.25%
Table 11. Top10 most Frequently Occurring POS
Table 12.
 MethodInline
TopPOSRatioExamplePOSRatioExample
1noun+noun12.08%“Constructs exception with the specified detail messagenoun+noun8.00%“Assume a drive letter for a mount point”
2determiner+noun10.50%The IV is produced by adding the initial IV to the counter”determiner+noun6.92%“(sum >>> Byte.SIZE) is the carry for addition”
3noun+“prep or conj”9.63%“@param path for the exception”noun+“prep or conj”6.71%“There is no real data in inBuffer”
4“prep or conj”+determiner7.04%“Configure whether the stream should drop the cache”adjective+noun5.78%“iterate over old configuration
5adjective+noun6.15%“Release a ByteBuffer which was created by the enhanced ByteBuffer read function”“prep or conj”+determiner4.14%“This operation does not change the current offset of the file”
6verb+determiner5.93%Get a ByteBuffer containing file data”verb+determiner3.98%Have some decrypted data unread, need to reset”
7noun+verb4.37%Constructor deprecated by ContentSummary.Builder”noun+verb3.53%“’The fallback behavior accomplishes the rename by a full copy”
8determiner+adjective3.37%“Returns the names of the fields from the summary header”“prep or conj”+noun3.36%“Else try POSIX style rename on Windows only”
9“prep or conj”+noun3.23%“Reads up to buf.remaining() bytes into bufverb+“prep or conj”2.98%“Then read in the class names and add them to our tables”
10verb+noun3.02%Read data from underlying stream”“prep or conj”+verb2.74%“Special case: must come before writing out the declaredClass”
Table 12. Top10 most Frequently Occurring POS of Two-word Phrases
Tables 11 and 12 are the statistics of POS of words and the statistics and examples of POS of phrases in the two comment datasets. Here “prep or conj” is the abbreviation of “preposition or conjunction”. From Table 11, we can see that the top10 most frequently used POS of the two types of comments are almost the same. In terms of phrases, the top3 POS of the two-word phrases are the same. The top10 list is also the same, except that the order is slightly different. We can learn that the two types of comments are similar in the use of POS. There is no phenomenon that method comments tend to use a certain POS of words, while inline comments tend to use phrases with specific POS combinations. We can also see from the percentages in Table 12 that inline comments are indeed more diverse than method comments. This is consistent with the conclusion we reach when we discuss word usage earlier. To make the phrase more detailed, we also present a representative example for each kind of phrase in Table 12.
As it is shown in Table 8, the dictionary size of inline comment is 52.3% larger than that of method comment. However, from Table 12, we find that the differences of two-word phrases are not obvious. Therefore, we try to analyze the sentence diversity of inline comments. We try to do a cluster experiment based on sentence similarity to reduce the diversity of inline comments.
Summary: It can be found that there exist many differences between method comments and inline comments in writing style. Compared with method comments, inline comments have a more diverse dictionary which includes more tokens. This can guide us to adjust the size of inline comment dictionary when designing an inline comment generation model. Also, method comments utilize more certain types of tokens such as variable, API, basic data type, and reference data type. This can guide us to utilize these kinds of tokens when designing method comment generation model. Besides, the POS distribution between method comments and inline comments are similar. Therefore, we can provide a POS mapping table to catch the comment generation rules when designing a comment generation model. In order to prove our findings, we also do a questionnaire survey to investigate the habits of real developers writing method comments and inline comments, as it is shown in Section 5. The results also show that developers have different writing styles when writing method comments and inline comments.
RQ3. Can method comment generation models be applied in generating inline comment and why?
From RQ1 and RQ2, we know the characteristics of method comments and inline comments. The difference between them is also explored. For example, method comments include more templates and have a concentrated dictionary. These characteristics may be the influence factors of the neural machine translation (i.e., NMT) model. We want to analyze if the difference between method and inline model has influence on comment generation model.
We use the three existing NMT models for the comparison experiments. These models are Seq2Seq, DeepCom, and Code2Seq. The experimental results are shown in Table 13 (the second and fourth columns). We can see that the BLEU-4 of generating method comment is better than that of generating inline comment for all models, and the performance of the models on the method comments outperforms the ones on the inline comments by 10%. Seq2Seq has the best performance in inline comment generation. Besides, Code2Seq performs OK in method comment generation, but not good in inline comments (17–18 points lower than the other two models). This may refer to the feature extraction approach of Code2Seq. As for Code2Seq, its main idea is to select several AST paths from the AST, but as for inline AST, this approach may cause many LCA nodes to be selected multiple times. These LCA nodes are only introduced to complete the structure of inline AST, and have nothing to do with the inline comments and they have no semantics information for generating inline comments. So, compared with other traversal approaches, this approach may generate more noise. Therefore, it does not show a good result.
Table 13.
 Method commentMethod comment without template commentInline comment
Seq2Seq38.9834.1329.95
DeepCom39.1333.9629.07
Code2Seq24.5217.8412.52
Table 13. The BLEU-4 Results of Applying Generation Models to Method Comment and Inline Comment
Because the models used for method and inline comment generation are the same, it seems that the features in the comments affect these performance of the generation models. We further explore the possible reasons.
Template comment. Note that we have detected many template comments in the method comment dataset. These comments are highly consistent and may affect the effectiveness of the models. As we describe in RQ1, when multiple identical comments exist in both the training set and testset, models can perform better on the repetitive comments. The reason is that, in this case, the training set and testset overlap. It always makes the models perform better than their real performance. To eliminate the influence that may be caused by template comments, we reconstruct a method comment dataset of the same size. When we randomly select samples, we avoid samples that are detected as template comment by the AEL method. We test the performance of the models on the new method comment dataset. The experimental results are shown in Table 13 (the third column). By comparing the results, we can find that the performances of all models become worse by almost 4% after removing the template comment samples, but it is still better than generating inline comments. It shows that the existence of template comments does make the models perform better.
OOV issues. The Out Of Vocabulary (OOV) issue refers to out-of-vocabulary words or unregistered words appearing in the sample sequence. It is a normal issue in NLP models. In order to solve this problem, we use CamelCase splitting and snake_case splitting to reduce token granularity for these three models. Because the input of the model is code related sequence, we also count the distribution of code tokens (like Table 8), which is shown in Table 14. It can be found that the word sizes of method codes and inline codes are similar (212,578 and 214,696). In order to analyze the influence of OOV tokens. We change the size of vocabulary in method comment generation models and see how it can affect the performance of generation model, which is shown in Table 15. It can be shown that if the vocabulary size decreases from 50,000 tokens to 30,000 tokens. The performance of different comment generation models also decreases by 3.44%–8.76%. The result shows that if there is less OOV tokens, that is, a larger vocabulary size, the models will have a better performance. Therefore, it is an effective way to improve the performance of comment generation models by mitigating the OOV issue.
Table 14.
 Size99%98%97%96%95%90%80%70%60%50%
Method212,57815,2926,9084,4473,2532,5341,0503301375927
Inline214,69619,1758,1104,9923,5552,7311,1043351315222
Table 14. Word usage in Codes
Table 15.
Vocabulary sizeSeq2SeqDeepcomCode2Seq
50,000 tokens38.9839.1324.52
30,000 tokens32.2230.3721.08
Table 15. The Influence of OOV Tokens
Tokens in comments. As it is shown above, we can see that after removing the template comments, the quality of the method comments generated by the models is still better than that of the inline comments. It seems that there are other factors that make it difficult for the models to generate inline comments. In RQ2, when we study which kind of comments are more likely to use the token in the code, we find that the statistical results of method comments are quite different from those of inline comments. This may be one of the reasons why the quality of the generated method comments is better.
We continue to use the testset without template comments to study the “tokens” factor. We divide the testset into two parts based on whether the tokens in the code are mentioned in the original comments. The quality of the generated comments of the two parts is compared. We can observe from Table 16 that whether the API and reference data type are mentioned in the comments or not could have an impact on the quality of the generated method comments. This finding is not similar with previous study [27]. When considering other cases, we are surprised to find out that whether tokens of the code appears in the inline comments or not, the quality of generating comments is similar. In particular, when variable and reference data type tokens exist in inline comment tokens, the comment generation performance has a slight improvement.
Table 16.
 MethodInline
Comment typeVariableAPIBasic data typeReference data typeVariableAPIBasic data typeReference data type
Mentioned33.9930.4533.4232.8029.5729.0428.7529.09
Not mentioned33.9435.4534.0834.8228.8429.0429.0429.04
Table 16. The Effect of the Comment Mentioning Special Tokens
We further analyze this phenomenon. According to our assumption, if a token appears in both code and comment, it is easier for models to generate this token when using code as a feature to generate comments. However, the experimental result is quite the contrary. Two hypotheses probably could explain this result. On the one hand, the popular models we use to generate comments are encoder-decoder structures. If a token appears in both code and comment, it could have two different coding vectors in the encoder’s vector space and decoder’s vector space. Unless we have some restrictions to align the two different vectors, it is the same as two different tokens in the code and the comment. On the other hand, it is a very complex process for model generating comments, even if tokens appear in both code and comment could promote the generating quality, we could not ensure the quality of the generated whole sentence.
We also notice that the models are more effective generating method comments when no mentioning API or reference data types and we have done further research. The results are shown in Table 17. We find that comments do not mention API or reference data types have a shorter average length and fewer AST nodes. It seems that these samples are uncomplicated relatively and models could learn and memorize them more easily.
Table 17.
 Mentioned APINot men- tioned APIMentioned reference data typeNot mentioned reference data type
Original comment length13.9912.4114.0512.03
Num of AST nodes53.4220.2836.5225.42
Table 17. The Average Original Comment Length and Average Number of AST Nodes in Different Cases
Word usage. In RQ2, we have also learned that these two different types of comments also differ in terms of the word diversity. The vocabulary used in method comments is more concentrated while the vocabulary used in inline comments is more diverse. Specifically, as it is shown in Table 8, the dictionary size of method comment is 34.35% less than that of inline comment. Besides, only 1,128 words can cover 90% of method comment tokens, while 1,431 words cover 90% of inline comment tokens. It is probably one of the most important factors why generating method comments have higher quality than generating inline comments. On the one hand, with a more diverse vocabulary (i.e., inline comment dictionary), the model has more choice in word selection when generating comments at every step. Even if we randomly pick up a word, there is a higher probability for generating sentences to have higher generating quality with relatively concentrated vocabulary (i.e., method comment dictionary). In other words, the probability of picking the correct word becomes higher. On the other hand, models should learn patterns from the dataset and it is relatively easier for concentrated vocabulary to have more patterns. In order to prove the influence of word diversity, we try to reduce the diversity of inline comments. Specifically, we do a cluster experiment of inline comment dataset according to the word usage similarity which is shown in Table 18. We classify the dataset into 20 clusters. We extract the cluster with the largest amount of data and utilize Seq2seq model to the performance of this cluster. The BLEU-4 result is 31.93. It shows that the cluster dataset performs 6.61% better than the performance of the whole dataset (31.93 and 29.95).
Table 18.
Cluster No.12345678910
Data size259,32864,87455,85645,02644,57041,67041,45641,23840,55938,196
Cluster No.11121314151617181920
Data size37,83934,63333,42026,25525,73923,94721,12621,10219,39617,074
Table 18. The Cluster Result of Inline Comments
Summary: We find that the existence of template comments is one of the main reasons that the quality of generated method comments is better than that of generated inline comments. After eliminating this factor, the method comments generated by the model are still better than the generated inline comments. We also find that the distribution of vocabulary is another main reason, which is similar with NMT task. As it is known that as for the fixed vocabulary size, larger word size means more unknown words, which decreases the accuracy of the translation (i.e., comment generation) [60]. We are surprised to find out that there is no correlation between the quality of generated comments and whether or not the same tokens appear in both code and comment.
From the findings based on three RQs, we find that there are many differences in the characteristics of method comments and inline comments. These findings give us some implications and motivate us that we need to propose a more specialized approach for method comment or inline comment generation to improve the performance. For instance, there are more template comments in method comments. So we need to remove template comments when training the method comment generation model to reduce the influence of method comments. Firstly, the dictionary in inline comment is more diverse than that in the method comment. Therefore, it is reasonable to adjust the size of dictionary. Specifically, we can increase the inline comment dictionary size or reduce method comment dictionary size to improve comment generation. Besides, from Table 2, we find that the length of method codes is longer than that of inline codes. Therefore, method codes have more complete semantics and syntax information. We can consider adding context information of inline codes to make inline semantics and syntax information complete. But directly adding all the context will make the code sequence so long that includes redundant information. From Table 10, we find that when adding the context of inline codes, there are more information about variable, API, basic data type, and reference data type. From Table 16, we find that comments that mention variable and reference data type have better generation performance. Therefore, it is possible to improve the performance of inline comment generation model by extracting variable and reference data type tokens from the context as the input. From Table 17, we find that comments that mention variable and reference data type have better generation performance. Therefore, it is possible to improve the performance of inline comment generation model by extracting variable and reference data type tokens from the context as one of the model inputs. Another factor these findings can guide us is that the method to adjust hyperparameters of NLP models. As it is shown in Tables 1 and 17, the code length, comment length, and AST nodes number can guide us to adjust the max input size of the comment generation model. The size of the dictionary can also guide us to design the embedding size of the model.

5 Discussion

To further investigate the importance and differences between the two types of comments, a questionnaire survey8 is conducted to understand people’s concerns in the process of writing code comments. We invite programmers, researchers, and students from universities and industry to participate in our research through an online questionnaire. Over 40% of the volunteers have more than three years of programming experience and the rest have at least one year programming experience. The questionnaire consists of 15 questions, 5 of which are relevant to the participants’ background and are used to assess their experience in writing code comments. The other 10 questions are used to investigate the importance and understanding of method comments and inline comments. Finally, 102 questionnaires are collected using the Questionnaire Star online survey system, of which 27.5% are from industry, including development engineers, algorithm engineers, test engineers, and so on, and others are from universities.
The results show that in terms of writing comments, 38% of the participants have the habit of writing method comments frequently, while the proportion of participants who write inline comments frequently reaches 59% (frequently indicates that over 60% of methods will write comments). In addition, 75% and 77% of the participants consider themselves to read method comments and inline comments frequently, respectively. Two scale questions are designed to measure how helpful two kinds of comments are perceived to be in understanding other people’s code, by asking participants to rate the importance of method comments and inline comments to program interpretation, on a scale of 0–10. The results show that the average score for method comments is 8.46 and for inline comments 8.47, indicating that inline comments are as important as method comments and should be given sufficient attention.
Furthermore, we query the feedback from participants through the question “What is your customary comment length when writing comments?”. For method comments, almost half of the participants are not too concerned about the length of the comments, with 38% of the participants habitually using a length of between 10 and 20 words. For inline comments, over 42% of the participants prefer to use a comment length of 10 words or less. This finding could correspond to the previous statistics for the dataset, suggesting that people tend to use a higher number of words when writing method comments, leading to the difference in length between the two types of comments.
Finally, we investigate the participants’ focus on the code tokens during the commenting process, by asking them which tokens in the code are involved in method and line comments. Besides the previously mentioned variable, API, basic data type, and reference data type, we have added some other options, such as exception related keyword, control logic related keyword, and so on. The detailed results are shown in Table 19, where we calculate that the standard deviation of the proportion of method comments involving various types of code token is 0.2551, while the standard deviation of inline comments is 0.1869. It shows that the distribution of code token types in method comments is more concentrated and mainly focuses on method parameter, method return value and words in variable, with the proportion of method return value being 27.45% higher than that of inline comments. Additionally, the distribution of tokens in the inline comments is more balanced, with a relatively smaller difference in the proportion of various keywords. This finding suggests that, while the proportion of code tokens referenced in method comments is higher, the variety of tokens involved in inline comments is likely to be richer.
Table 19.
 Method parameterMethod return valueWords in variableAPI from third libraryAPI from this projectAPI from this classBasic data typeReference data typeException related keywordVisibility related keywordControl logic related keywordThread related keywordTest related keywordOther
Method88.24%80.39%57.84%27.45%30.39%20.59%29.41%41.18%20.59%11.76%18.63%13.73%12.75%6.86%
Inline66.67%52.94%59.80%32.35%33.33%21.57%19.61%30.39%29.41%7.84%22.55%13.73%11.76%8.82%
Table 19. The Proportion of Participants Mentioning a Certain Type of Token when Writing Comments

6 Threats to Validity

In this section, we focus on the threats that could affect the results of our study.
Threats to internal validity relate to the scale of the data set using for the empirical study of the code comment. Since we need to study the general features of the method comments and inline comments in the source code, we need to collect a large number of code comment instances. Therefore, we collected 998 projects from GitHub, which contain 975,765 method comments and 973,525 inline comments, respectively. In the future, we need to get more code comment instances to extend our data set.
Threats to external validity relate to the generalizability of our results. We collect a number of comment-code snippet pairs for a comparison experiment (i.e., method comment generation vs. inline comment generation) in RQ3. All of the code snippets are written in Java language. When migrating the comparison experiment to the datasets written by other programming languages, such as C, C++, and Python, some particular code syntax (e.g., pointer operation in C++) should be carefully handled when extracting the syntax features from the abstract syntax tree. In the future, further investigation by analyzing even more projects written by other programming languages is needed to mitigate this threat.
Threats to construct validity refers to the suitability of our evaluation measure. We use a conventional measure to evaluate the effectiveness of the models when generating method comments and inline comments in RQ3. Because the issue of comment generation can be modeled as a natural language generation problem, we introduce the BLEU score to evaluate the performance of the comment generation models. BLEU score can evaluate the effectiveness of the comment generation models. Thus, we believe there is little threat to the suitability of our evaluation measure.

7 Conclusion and Future Work

In this article, we first compare the occurrence of method comments and inline comments in the project and think it is necessary to conduct some investigation and analysis on inline comments. Then we explore the similarities and differences between method comments and inline comments in many aspects, including the number of template comments, writing styles, tokens in comments, and so on. Finally, we compare the performance of the existing comment generation models on the method comment dataset and the inline comment dataset. We find that the effect of the models on inline comment is worse than that on method comment. Through further research, we believe that the existence of template comments and writing styles are the reasons why the generation of method comments is easier to achieve good results. Whether the original comment mentioned the token in the code does not have much impact on the result. There may be other reasons for the poor effect of generating inline comments. We will continue to explore more possible factors in future work.

Footnotes

References

[1]
E. Wong, J. Yang, and L. Tan. 2013. Autocomment: Mining question and answer sites for automatic comment generation. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 562–567.
[2]
E. Wong, T. Liu, and L. Tan. 2015. Clocom: Mining existing source code for automatic comment generation. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 380–389.
[3]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based neural source code summarization. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering and(the ACM/IEEE 42nd International Conference on Software Engineering). Association for Computing Machinery, 1385–1397.
[4]
S. Panichella, V. Arnaoudova, and M. Di Penta. 2015. Would static analysis tools help developers with code reviews. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 161–170.
[5]
B. L. Vinz and L. H. Etzkorn. 2008. Improving Program Comprehension by Combining Code Understanding with Comment Understanding. 813–825. Knowledge-Based Systems.
[6]
M. A. Storey, L. T. Cheng, and J. Singer. 2007. How programmers can turn comments into waypoints for code. In Proceedings of the IEEE International Conference on Software Maintenance. IEEE, 265–274.
[7]
I. Kádár, P. Hegedus, R. Ferenc, and T. Gyimóthy. 2016. A code refactoring dataset and Its assessment regarding software maintainability. IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER’16), Osaka, 599–603. DOI:
[8]
Gang Huang, Hong Mei, and Fuqing Yang. 2006. Runtime recovery and manipulation of software architecture of component-based systems. Automated Software Engineering 13, 2 (2006), 257–281.
[9]
Qingying Chen and Minghui Zhou. 2018. A. neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.ACM, 826–831.
[10]
Yuding Liang and Kenny Qili Zhu. 2018. Automatic generation of text descriptive comments for code blocks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.
[11]
Gang Huang, Yun Ma, Xuanzhe Liu, Yuchong Luo, Xuan Lu, and M. Brian Blake. 2015. Model-based automated navigation and composition of complex service mashups. IEEE Transactions on Services Computing 8, 3 (2015), 494–506.
[12]
Y. Huang, N. Jia, and J. Shu. 2019. Does your code need comment. In Proceedings of the Software Practice and Experience.
[13]
Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip Yu, and Guandong Xu. 2022. Reinforcement-learning-guided source code summarization using hierarchical attention. IEEE Transactions on Software Engineering 48, 1 (2022), 102–119. DOI:
[14]
Y. Huang, X. Hu, and N. Jia. 2019. Learning code context information to predict comment locations. IEEE Transactions on Reliability 69, 1 (2019), 88–105.
[15]
S. Gao, C. Chen, and Z. Xing. 2019. A. neural model for method name generation from functional description. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 414–421.
[16]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, 2073–2083.
[17]
Yuan Huang, Shaohao Huang, Huanchao Chen, Xiangping Chen, Zibin Zheng,Xiapu Luo, Nan Jia, Xinyu Hu, and Xiaocong Zhou. 2020. Towards automatically generating block comments for code snippets. Information and Software Technology 127 (2020), 106373.
[18]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, Vol. 27, Curran Associates, Inc.
[19]
X. Hu, G. Li, and X. Xia. 2018. Deep code comment generation. In Proceedings of the IEEE/ACM 26th International Conference on Program Comprehension. IEEE.
[20]
Uri Alon, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In International Conference on Learning Representations. https://openreview.net/forum?id=H1gKYo09tX.
[21]
Y. Shido, Y. Kobayashi, and A. Yamamoto. 2019. Automatic source code summarization. In Proceedings of the 2019 International Joint Conference on Neural Networks. IEEE, 1–8.
[22]
X. Hu, G. Li, and X. Xia. 2020. Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 25, 3 (2020), 2179–2217.
[23]
A. Leclair, S. Jiang, and C. A. Mcmillan. 2019. neural model for generating natural language summaries of program subroutines. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering. IEEE, 795–806.
[24]
J. Moore, B. Gelman, and D. A. Slater. 2019. Convolutional neural network for language-agnostic source code summarization. In Proceedings of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering. ACM, 15–26.
[25]
Luca Pascarella, Magiel Bruntink, and Alberto Bacchelli. 2019. Classifying code comments in java software systems. Empirical Software Engineering 24, 3 (2019), 1499–1537.
[26]
Yuan Huang, Xinyu Hu, Nan Jia, Xiangping Chen, Zibin Zheng, and Xiapu Luo. 2020. CommtPst: Deep learning source code for commenting positions prediction. Journal of Systems and Software 170 (2020), 110754.
[27]
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred API knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2269–2275.
[28]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A. convolutional attention network for extreme summarization of source code. In Proceedings of the International Conference on Machine Learning. 2091–2100.
[29]
N. J. Abid, N. Dragan, and M. L. Collard. 2015. Using stereotypes in the automatic generation of natural language summaries. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution. IEEE, 561–565.
[30]
B. Wei. 2019. Retrieve and refine: Exemplar-based neural comment generation. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 1250–1252.
[31]
P. Oman and J. Hagemeister. 1992. Metrics for assessing a software system’s maintainability. In Proceedings of the Conference on Software Maintenance.337–344.
[32]
J. Manuel, Juan Carlos Granja Barranco Garcia, and Alvarez. 1996. Maintainability as a key factor in maintenance productivity: A. case study. In Proceedings of the 1996 International Conference on Software Maintenance.IEEE Computer Society, 87.
[33]
O. Arafati and D. Riehle. 2009. The comment density of open source software. In Proceedings of the 31st International Conference on Software Engineering-Companion. 195–198.
[34]
Oliver Arafat and Dirk Riehle. 2009. The commenting practice of open source. In Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications.ACM, 857–864.
[35]
H. Siy and L. Votta. 2001. Does the modern code inspection have value? In Software Maintenance. In Proceedings of the IEEE International Conference on.281–289.
[36]
Mahmoud O. Elish and Jeff, Offutt. 2002. The adherence of open source java programmers to standard coding practices. The 6th IASTED International Conference Software Engineering and Applications, Vol. 6, Cambridge, MA, 193–198.
[37]
Ming Zhen, Ahmed E. Jiang, and Hassan. 2006. Examining the evolution of code comments in postgresql. In Proceedings of the 2006 International Workshop on Mining Software Repositories. ACM, 179–180.
[38]
Abdulkadir Seker, Banu Diri, and Halil Arslan. 2020. New developer metrics: Are comments as crucial as code contributions? CoRR abs/2006.16349 (2020). arXiv:2006.16349 https://arxiv.org/abs/2006.16349.
[39]
Vishal Misra, Jakku Sai Krupa Reddy, and Sridhar Chimalakonda. 2020. Is there a correlation between code comments and issues? An exploratory study. In Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC’20), Association for Computing Machinery, New York, NY, 110–117.
[40]
F. Wen, C. Nagy, and G. Bavota. 2019. A. large-scale empirical study on code. In Proceedings of the 2019 IEEE/ACM 27th International Conference on Program Comprehension.IEEE, 53–64.
[41]
Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley Weimer, Kevin Leach, and Yu Huang. 2020. A human study of comprehension and code summarization. In Proceedings of the 28th International Conference on Program Comprehension (ICPC’20), Association for Computing Machinery, New York, NY, 2–13.
[42]
David Gros, Hariharan Sezhiyan, Prem Devanbu, and Zhou Yu. 2021. Code to comment “translation”: Data, metrics, baselining & evaluation. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20), Association for Computing Machinery, New York, NY, 746–757.
[43]
Ensheng Shi, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. ([n. d.]). CAST: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.2021.
[44]
G. Sridhara, E. Hill, and D. Muppaneni. 2010. Towards automatically generating summary comments for java. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering.43–52.
[45]
Carmine Vassallo, Sebastiano Panichella, Massimiliano Di Penta, and Gerardo Canfora. 2014. CODES: mining source code descriptions from developers discussions. In Proceedings of the 22nd International Conference on Program Comprehension (ICPC’14), Association for Computing Machinery, New York, NY, 106–109.
[46]
Luca Pascarella and Alberto Bacchelli. 2017. Classifying code comments in java open-source software systems. In Proceedings of the 14th International Conference on Mining Software Repositories. 227–237.
[47]
G. Sridhara, L. Pollock, and K. Vijay-Shanker. 2011. Automatically detecting and describing high level actions within methods. In Proceedings of the 2011 33rd International Conference on Software Engineering. IEEE. 101–110.
[48]
L. Moreno, J. Aponte, and G. Sridhara. 2013. Automatic generation of natural language summaries for java. In Proceedings of the 2013 21st International Conference on Program Comprehension. IEEE, 23–32.
[49]
X. Song, H. Sun, X. Wang, and J. Yan. 2019. A survey of automatic generation of source code comments: Algorithms and techniques. In IEEE Access, Vol. 7, 111411–111428. DOI:
[50]
Huanchao Chen, Yuan Huang, Zhiyong Liu, Xiangping Chen, Fan Zhou, and Xiaonan Luo. 2019. Automatically detecting the scopes of source code comments. Journal of Systems and Software 153 (2019), 45–63.
[51]
Paul W. McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension (ICPC’14), Association for Computing Machinery, New York, NY, 279–290.
[52]
M. A. Possatto and D. Lucrédio. 2015. Automatically propagating changes from reference implementations to code generation templates. Information and Software Technology, 67 (2015), 65–78.
[53]
Z. M. Jiang, A. E. Hassan, G. Hamann, and P. Flora. 2008. An automated approach for abstracting execution logs to execution events. Journal of Software Maintenance and Evolution: Research and Practice 20 (2008), 249–267.
[54]
J. Zhang, X. Wang, and H. Zhang. 2019. A. novel neural source code representation based on abstract syntax. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering. IEEE, 783–794.
[55]
B. Dit, L. Guerrouj, and D. Poshyvanyk. 2011. Can better identifier splitting techniques help feature location. In Proceedings of the IEEE 19th International Conference on Program Comprehension. IEEE. 11–20.
[56]
K. Papineni, S. Roukos, and T. Ward. 2002. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.
[57]
Le Yu, Tao Zhang, Xiapu Luo, and Lei Xue. 2015. AutoPPG: Towards automatic generation of privacy policy for Android applications. In Proceedings of the 5th Annual ACM CCS Workshop on Security and Privacy in Smartphones and Mobile Devices.39–50.
[58]
Le Yu, Tao Zhang, Xiapu Luo, Lei Xue, and Henry Chang. 2017. Toward automatically generating privacy policy for Android apps. IEEE Transactions on Information Forensics and Security 12, 4 (2017), 865–880.
[59]
X. Li, T. Chen, X. Luo, T. Zhang, L. Yu, and Z. Xu. 2020. STAN: Towards describing bytecodes of smart contract. IEEE 20th International Conference on Software Quality, Reliability and Security (QRS’20), Macau, 273–284. DOI:
[60]
Taku Kudo. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.66–75.
[61]
Z. Liu, X. Xia, and A. E. Hassan. 2018. Neural-machine-translation-based commit message generation: How far are we. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.373–384.
[62]
Qiuyuan Chen, Xin Xia, Han Hu, David Lo, and Shanping Li. 2021. Why my code summarization model does not work: Code comment improvement with category prediction. ACM Trans. Softw. Eng. Methodol. 30, 2, (April 2021), 29 pages.
[63]
Pooja Rani, Arianna Blasi, Nataliia Stulova, Sebastiano Panichella, Alessandra Gorla, and Oscar Nierstrasz. 2023. A decade of code comment quality assessment: A systematic literature review. J. Syst. Softw. 195, C (Jan 2023).

Cited By

View all
  • (2025)An alternative to code comment generation? Generating comment from bytecodeInformation and Software Technology10.1016/j.infsof.2024.107623179(107623)Online publication date: Mar-2025
  • (2024)Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is NeededACM Transactions on Software Engineering and Methodology10.1145/365215633:6(1-35)Online publication date: 27-Jun-2024
  • (2024)Source Code Summarization & Comment Generation with NLP : A New Index Proposal2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA61326.2024.10550711(1-6)Online publication date: 23-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 5
September 2023
905 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3610417
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2023
Online AM: 13 February 2023
Accepted: 05 January 2023
Revised: 20 November 2022
Received: 25 June 2021
Published in TOSEM Volume 32, Issue 5

Check for updates

Author Tags

  1. Code comment
  2. comparative study
  3. comment generation
  4. method comment
  5. inline comment

Qualifiers

  • Research-article

Funding Sources

  • Guangdong Key Area R&D Program
  • National Natural Science Foundation of China
  • Hong Kong RGC Project
  • Hong Kong ITF Project
  • Research and Development Program of Shenzhen

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,277
  • Downloads (Last 6 weeks)158
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)An alternative to code comment generation? Generating comment from bytecodeInformation and Software Technology10.1016/j.infsof.2024.107623179(107623)Online publication date: Mar-2025
  • (2024)Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is NeededACM Transactions on Software Engineering and Methodology10.1145/365215633:6(1-35)Online publication date: 27-Jun-2024
  • (2024)Source Code Summarization & Comment Generation with NLP : A New Index Proposal2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA61326.2024.10550711(1-6)Online publication date: 23-May-2024
  • (2024)The future of API analyticsAutomated Software Engineering10.1007/s10515-024-00442-z31:2Online publication date: 9-Jun-2024
  • (2023)ICG: A Machine Learning Benchmark Dataset and Baselines for Inline Code Comments Generation TaskInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402350054734:02(331-356)Online publication date: 20-Oct-2023
  • (2023)SyntaxLineDP: a Line-level Software Defect Prediction Model based on Extended Syntax Information2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00018(83-94)Online publication date: 22-Oct-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media