[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3468264.3468567acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Lightweight global and local contexts guided method name recommendation with prior knowledge

Published: 18 August 2021 Publication History

Abstract

The quality of method names is critical for the readability and maintainability of source code. However, it is often challenging to construct concise method names. To alleviate this problem, a number of approaches have been proposed to automatically recommend high-quality names for methods. Despite being effective, existing approaches meet their bottlenecks mainly in two aspects: (1) the leveraged information is restricted to the target method itself; and (2) lack of distinctions towards the contributions of tokens extracted from different program contexts. Through a large-scale empirical analysis on +12M methods from +14K real-world projects, we found that (1) the tokens composing a method’s name can be frequently observed in its callers/callees; and (2) tokens extracted from different specific contexts have diverse probabilities to compose the target method’s name. Motivated by our findings, we propose, in this paper, a context-guided method name recommender, which mainly embodies two key ideas: (1) apart from the local context, which is extracted from the target method itself, we also consider the global context, which is extracted from other methods in the project that have call relations with the target method, to include more useful information; and (2) we utilize our empirical results as the prior knowledge to guide the generation of method names and also to restrict the number of tokens extracted from the global contexts. We implemented the idea as Cognac and performed extensive experiments to assess its effectiveness. Results reveal that can (1) perform better than existing approaches on the method name recommendation task (e.g., it achieves an F-score of 63.2%, 60.8%, 66.3%, and 68.5%, respectively, on four widely-used datasets, which all outperform existing techniques); and (2) achieve higher performance than existing techniques on the method name consistency checking task (e.g., its overall accuracy reaches 76.6%, outperforming the state-of-the-art MNire by 11.2%). Further results reveal that the caller/callee information and the prior knowledge all contribute significantly to the overall performance of Cognac.

References

[1]
2021. Eclipse Statement. https://help.eclipse.org/2020-12/index.jsp?topic=org.eclipse.jdt.doc.isvreferenceapiorgeclipsejdtcoredomStatement.html
[2]
2021. Find Bugs in Java programs. http://findbugs.sourceforge.net/
[3]
Surafel Lemma Abebe, Venera Arnaoudova, Paolo Tonella, Giuliano Antoniol, and Yann-Gael Gueheneuc. 2012. Can lexicon bad smells improve fault prediction? In 2012 19th Working Conference on Reverse Engineering. 235–244.
[4]
Surafel Lemma Abebe, Sonia Haiduc, Paolo Tonella, and Andrian Marcus. 2011. The effect of lexicon bad smells on concept location in source code. In 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation. 125–134.
[5]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 38–49. https://doi.org/10.1145/2786805.2786849
[6]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281–293. https://doi.org/10.1145/2635868.2635883
[7]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In Proceedings of the 6th International Conference on Learning Representations. OpenReview.net.
[8]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the 33rd International Conference on Machine Learning. JMLR.org, 2091–2100.
[9]
Miltiadis Allamanis and Charles Sutton. 2013. Mining source code repositories at massive scale using language modeling. In 2013 10th Working Conference on Mining Software Repositories (MSR). 207–216. https://doi.org/10.1109/MSR.2013.6624029
[10]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating Sequences from Structured Representations of Code. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net.
[11]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 404–419. https://doi.org/10.1145/3192366.3192412
[12]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 40:1–40:29. https://doi.org/10.1145/3290353
[13]
Sven Amann, Hoan Anh Nguyen, Sarah Nadi, Tien N Nguyen, and Mira Mezini. 2018. A systematic evaluation of static api-misuse detectors. IEEE Transactions on Software Engineering, 45, 12 (2018), 1170–1188.
[14]
Venera Arnaoudova, Massimiliano Di Penta, and Giuliano Antoniol. 2016. Linguistic antipatterns: What they are and how developers perceive them. Empirical Software Engineering, 21, 1 (2016), 104–158.
[15]
Venera Arnaoudova, Laleh M Eshkevari, Massimiliano Di Penta, Rocco Oliveto, Giuliano Antoniol, and Yann-Gaël Guéhéneuc. 2014. Repent: Analyzing the nature of identifier renamings. IEEE Transactions on Software Engineering, 40, 5 (2014), 502–532.
[16]
Venera Arnaoudova, Massimiliano Di Penta, Giuliano Antoniol, and Yann-Gaël Guéhéneuc. 2013. A New Family of Software Anti-patterns: Linguistic Anti-patterns. 2013 17th European Conference on Software Maintenance and Reengineering, 187–196.
[17]
Gabriele Bavota, Rocco Oliveto, Malcom Gethers, Denys Poshyvanyk, and Andrea De Lucia. 2013. Methodbook: Recommending move method refactorings via relational topic models. IEEE Transactions on Software Engineering, 40, 7 (2013), 671–694.
[18]
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
[19]
Nghi D. Q. Bui, Yijun Yu, and Lingxiao Jiang. 2021. TreeCaps: Tree-Based Capsule Networks for Source Code Processing. In Proceedings of the 35th AAAI Conference on Artificial Intelligence.
[20]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2009. Relating identifier naming flaws and code quality: An empirical study. In 2009 16th Working Conference on Reverse Engineering. 31–35.
[21]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2010. Exploring the influence of identifier names on code quality: An empirical study. In 2010 14th European Conference on Software Maintenance and Reengineering. 156–165.
[22]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2011. Improving the tokenisation of identifier names. In Proceedings of the 25th European Conference on Object-Oriented Programming (ECOOP). 130–154.
[23]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2011. Mining java class naming conventions. In 2011 27th IEEE International Conference on Software Maintenance (ICSM). 93–102. https://doi.org/10.1109/ICSM.2011.6080776
[24]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2013. INVocD: Identifier name vocabulary dataset. In 2013 10th Working Conference on Mining Software Repositories (MSR). 405–408.
[25]
Florian Deissenboeck and Markus Pizka. 2006. Concise and consistent naming. Software Quality Journal, 14, 3 (2006), 261–282.
[26]
Aryaz Eghbali and Michael Pradel. 2020. No Strings Attached: An Empirical Study of String-related Software Bugs. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 956–967.
[27]
Sarah Fakhoury, Yuzhan Ma, Venera Arnaoudova, and Olusola Adesope. 2018. The effect of poor source code lexicon and readability on developers’ cognitive load. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).
[28]
Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 313–324. https://doi.org/10.1145/2642937.2642982
[29]
Chunrong Fang, Zixi Liu, Yangyang Shi, Jingfang Huang, and Qingkai Shi. 2020. Functional code clone detection with syntax and semantics fusion learning. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis.
[30]
Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured Neural Summarization. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net.
[31]
Malcom Gethers, Trevor Savage, Massimiliano Di Penta, Rocco Oliveto, Denys Poshyvanyk, and Andrea De Lucia. 2011. CodeTopics: which topic am I coding now? In Proceedings of the 33rd International Conference on Software Engineering. 1034–1036.
[32]
Latifa Guerrouj, Zeinab Kermansaravi, Venera Arnaoudova, Benjamin CM Fung, Foutse Khomh, Giuliano Antoniol, and Yann-Gaël Guéhéneuc. 2017. Investigating the relation between lexical smells and change-and fault-proneness: an empirical study. Software Quality Journal, 25, 3 (2017), 641–670.
[33]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In 2010 17th Working Conference on Reverse Engineering. 35–44.
[34]
Sakib Haque, Alexander LeClair, Lingfei Wu, and Collin McMillan. 2020. Improved automatic summarization of subroutines via attention to file context. In Proceedings of the 17th International Conference on Mining Software Repositories. 300–310.
[35]
Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2020. Global Relational Models of Source Code. In Proceedings of the 8th International Conference on Learning Representations (ICLR). OpenReview.net.
[36]
Yoshiki Higo and Shinji Kusumoto. 2012. How often do unintended inconsistencies happen? Deriving modification patterns and detecting overlooked code fragments. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). 222–231.
[37]
Johannes C. Hofmeister, J. Siegmund, and Daniel V. Holt. 2017. Shorter identifier names take longer to comprehend. Empirical Software Engineering, 24 (2017), 417–443.
[38]
Einar W. Høst and Bjarte M. Ø stvold. 2009. Debugging Method Names. In Proceedings of the 23rd European Conference on Object-Oriented Programming (ECOOP). 294–317.
[39]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep Code Comment Generation. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).
[40]
Lin Jiang, Hui Liu, and He Jiang. 2019. Machine Learning Based Recommendation of Method Names: How Far are We. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 602–614. https://doi.org/10.1109/ASE.2019.00062
[41]
Suntae Kim and Dongsun Kim. 2016. Automatic identifier inconsistency detection using code dictionary. Empirical Software Engineering, 21, 2 (2016), 565–604.
[42]
Yi Li, Shaohua Wang, and Tien N. Nguyen. 2021. A Context-based Automated Approach for Method Name Consistency Checking and Suggestion. In Proceedings of the 43rd International Conference on Software Engineering.
[43]
Yi Li, Shaohua Wang, Tien N Nguyen, and Son Van Nguyen. 2019. Improving bug detection via context-based code representation learning and attention-based neural networks. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 162:1–162:30. https://doi.org/10.1145/3360588
[44]
Bo Lin, Shangwen Wang, Kui Liu, Xiaoguang Mao, and Tegawendé F. Bissyandé. 2021. Automated Comment Update: How Far are We? In The 29th IEEE/ACM International Conference on Program Comprehension (ICPC). 36–46.
[45]
Hui Liu, Qiurong Liu, Cristian-Alexandru Staicu, Michael Pradel, and Yue Luo. 2016. Nomen est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1063–1073.
[46]
Kui Liu, Dongsun Kim, Tegawendé F. Bissyandé, Tae-young Kim, Kisub Kim, Anil Koyuncu, Suntae Kim, and Yves Le Traon. 2019. Learning to spot and refactor inconsistent method names. In Proceedings of the 41st International Conference on Software Engineering. 1–12. https://doi.org/10.1109/ICSE.2019.00019
[47]
Kui Liu, Dongsun Kim, Anil Koyuncu, Li Li, Tegawendé F Bissyandé, and Yves Le Traon. 2018. A closer look at real-world patches. In Proceedings of the 34th International Conference on Software Maintenance and Evolution. 275–286. https://doi.org/10.1109/ICSME.2018.00037
[48]
Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. In Proceedings of the 43rd International Conference on Software Engineering.
[49]
Paul W McBurney and Collin McMillan. 2014. Automatic documentation generation via source code summarization of method context. In Proceedings of the 22nd International Conference on Program Comprehension. 279–290.
[50]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the AAAI Conference on Artificial Intelligence. 30.
[51]
Son Nguyen, Hung Phan, Trinh Le, and Tien N. Nguyen. 2020. Suggesting Natural Method Names to Check Name Consistencies. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1372–1384.
[52]
Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald C Gall. 2016. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proceedings of the 38th International Conference on Software Engineering. 547–558.
[53]
Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 147, 25 pages. https://doi.org/10.1145/3276517
[54]
Simone Scalabrino, Mario Linares-Vásquez, Rocco Oliveto, and Denys Poshyvanyk. 2018. A comprehensive model for code readability. Journal of Software: Evolution and Process, 30, 6 (2018).
[55]
Simone Scalabrino, Christopher Vendome, and Denys Poshyvanyk. 2019. Automatically assessing code understandability. IEEE Transactions on Software Engineering.
[56]
Andrea Schankin, A. Berger, Daniel V. Holt, Johannes C. Hofmeister, T. Riedel, and M. Beigl. 2018. Descriptive Compound Identifier Names Improve Source Code Comprehension. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).
[57]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1073–1083.
[58]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 43–52.
[59]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000–6010.
[60]
Yaza Wainakh, Moiz Rauf, and Michael Pradel. 2021. IdBench: Evaluating Semantic Representations of Identifier Names in Source Code. In Proceedings of the 43rd International Conference on Software Engineering.
[61]
Ke Wang and Zhendong Su. 2020. Blended, Precise Semantic Program Embeddings. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery, 121–134. https://doi.org/10.1145/3385412.3385999
[62]
Shangwen Wang, Ming Wen, Bo Lin, Hongjun Wu, Yihao Qin, Deqing Zou, Xiaoguang Mao, and Hai Jin. 2020. Automated Patch Correctness Assessment: How Far are We? In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ACM, 968–980. https://doi.org/10.1145/3324884.3416590
[63]
Yu Wang, Ke Wang, Fengjuan Gao, and Linzhang Wang. 2020. Learning Semantic Program Embeddings with Graph Interval Neural Network. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 137, 27 pages. https://doi.org/10.1145/3428205
[64]
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In Proceedings of the 40th International Conference on Software Engineering. 1–11. https://doi.org/10.1145/3180155.3180233
[65]
Ming Wen, Yepang Liu, Rongxin Wu, Xuan Xie, Shing-Chi Cheung, and Zhendong Su. 2019. Exposing Library API Misuses Via Mutation Analysis. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 866–877. https://doi.org/10.1109/ICSE.2019.00093
[66]
Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: Locating bugs from software changes. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). 262–273.
[67]
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). 87–98.
[68]
Yinxing Xue, Mingliang Ma, Yun Lin, Yulei Sui, Jiaming Ye, and Tianyong Peng. 2020. Cross-Contract Static Analysis for Detecting Practical Reentrancy Vulnerabilities in Smart Contracts. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1029–1040.
[69]
Zhaoxu Zhang, Hengcheng Zhu, Ming Wen, Yida Tao, Yepang Liu, and Yingfei Xiong. 2020. How Do Python Framework APIs Evolve? An Exploratory Study. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). 81–92. https://doi.org/10.1109/SANER48275.2020.9054800
[70]
Gang Zhao and Jeff Huang. 2018. DeepSim: deep learning code functional similarity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.

Cited By

View all
  • (2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
  • (2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
  • (2024)How Important Are Good Method Names in Neural Code Generation? A Model Robustness PerspectiveACM Transactions on Software Engineering and Methodology10.1145/363001033:3(1-35)Online publication date: 14-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2021
1690 pages
ISBN:9781450385626
DOI:10.1145/3468264
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Code embedding
  2. Deep learning
  3. Method name recommendation

Qualifiers

  • Research-article

Funding Sources

Conference

ESEC/FSE '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
  • (2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
  • (2024)How Important Are Good Method Names in Neural Code Generation? A Model Robustness PerspectiveACM Transactions on Software Engineering and Methodology10.1145/363001033:3(1-35)Online publication date: 14-Mar-2024
  • (2024)Service Recommendations for Mashup Based on Generation ModelIEEE Transactions on Services Computing10.1109/TSC.2023.332951117:4(1820-1834)Online publication date: Jul-2024
  • (2024)Exploring Large Language Models for Method Name Prediction2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00028(192-203)Online publication date: 1-Jul-2024
  • (2024)Dependency-Aware Method Naming Framework with Generative Adversarial Sampling2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651109(1-8)Online publication date: 30-Jun-2024
  • (2024)Deep learning based identification of inconsistent method names: How far are we?Empirical Software Engineering10.1007/s10664-024-10592-z30:1Online publication date: 25-Nov-2024
  • (2024)An intelligent java method name recommendation framework via two-phase neural networksEmpirical Software Engineering10.1007/s10664-024-10574-130:1Online publication date: 8-Nov-2024
  • (2024)Revisiting file context for source code summarizationAutomated Software Engineering10.1007/s10515-024-00460-x31:2Online publication date: 27-Jul-2024
  • (2023)Pre-implementation Method Name Prediction for Object-oriented ProgrammingACM Transactions on Software Engineering and Methodology10.1145/359720332:6(1-35)Online publication date: 29-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media