More Web Proxy on the site http://driver.im/

research-article

Open access

Aroma: code recommendation via structural code search

Authors:

Celeste Barnaby,

Satish ChandraAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 3, Issue OOPSLA

Article No.: 152, Pages 1 - 28

https://doi.org/10.1145/3360578

Published: 10 October 2019 Publication History

Abstract

Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.

Supplementary Material

a152-luan (a152-luan.webm)

Presentation at OOPSLA '19

Download
125.45 MB

References

[1]

Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. 2006. Sourcerer: A Search Engine for Open Source Code Supporting Structure-based Search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications (OOPSLA ’06) . ACM, New York, NY, USA, 681–682.

Digital Library

[2]

Marcel Bruch, Martin Monperrus, and Mira Mezini. 2009. Learning from Examples to Improve Code Completion Systems. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09) . ACM, New York, NY, USA, 213–222.

Digital Library

[3]

Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API Usage Examples. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 782–792. http://dl.acm.org/citation.cfm? id=2337223.2337316

Digital Library

[4]

Wing-Kwan Chan, Hong Cheng, and David Lo. 2012. Searching Connected API Subgraph via Text Phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12) . ACM, New York, NY, USA, Article 10, 11 pages.

Digital Library

[5]

Shaunak Chatterjee, Sudeep Juvekar, and Koushik Sen. 2009. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering, Marsha Chechik and Martin Wirsing (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 385–400.

[6]

J. R. Cordy and C. K. Roy. 2011. The NiCad Clone Detector. In 2011 IEEE 19th International Conference on Program Comprehension . 219–220.

Digital Library

[7]

R. Hill and J. Rideout. 2004. Automatic method completion. In Proceedings. 19th International Conference on Automated Software Engineering, 2004. 228–235.

[8]

R. Holmes and G. C. Murphy. 2005. Using structural context to recommend source code examples. In Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005. 117–125.

[9]

L. Jiang, G. Misherghi, Z. Su, and S. Glondu. 2007. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In 29th International Conference on Software Engineering (ICSE’07). 96–105.

Digital Library

[10]

T. Kamiya, S. Kusumoto, and K. Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (July 2002), 654–670.

Digital Library

[11]

H. Kim, Y. Jung, S. Kim, and K. Yi. 2011. MeCC: memory comparison-based clone detector. In 2011 33rd International Conference on Software Engineering (ICSE) . 301–310.

Digital Library

[12]

Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: A Code-to-code Search Engine. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 946–957.

Digital Library

[13]

Ken Krugler. 2013. Krugle Code Search Architecture. Springer New York, New York, NY, 103–120.

[14]

Otávio Augusto Lazzarini Lemos, Sushil Krishna Bajracharya, Joel Ossher, Ricardo Santos Morla, Paulo Cesar Masiero, Pierre Baldi, and Cristina Videira Lopes. 2007. CodeGenie: Using Test-cases to Search and Reuse Source Code. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering (ASE ’07) . ACM, New York, NY, USA, 525–526.

Digital Library

[15]

Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéJàVu: A Map of Code Duplicates on GitHub. Proc. ACM Program. Lang. 1, OOPSLA, Article 84 (Oct. 2017), 28 pages.

Digital Library

[16]

Fei Lv, Hongyu Zhang, Jian-guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) (ASE ’15) . IEEE Computer Society, Washington, DC, USA, 260–270.

Digital Library

[17]

L. Martie, T. D. LaToza, and A. v. d. Hoek. 2015. CodeExchange: Supporting Reformulation of Internet-Scale Code Queries in Context (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 24–35.

Digital Library

[18]

C. McMillan, M. Grechanik, D. Poshyvanyk, C. Fu, and Q. Xie. 2012. Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications. IEEE Transactions on Software Engineering 38, 5 (Sept 2012), 1069–1087.

Digital Library

[19]

Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: Finding Relevant Functions and Their Usage. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). ACM, New York, NY, USA, 111–120.

Digital Library

[20]

Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How Can I Use This Method?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 880–890. http://dl.acm.org/citation.cfm?id=2818754.2818860

Digital Library

[21]

S. Mover, S. Sankaranarayanan, R. B. Olsen, and B. E. Chang. 2018. Mining framework usage graphs from app corpora. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), Vol. 00. 277–289.

[22]

Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N. Nguyen, and Danny Dig. 2016a. API Code Recommendation Using Statistical Learning from Fine-grained Changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016) . ACM, New York, NY, USA, 511–522.

Digital Library

[23]

Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi, and Tien N. Nguyen. 2012. Graph-based Pattern-oriented, Context-sensitive Source Code Completion. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 69–79. http: //dl.acm.org/citation.cfm?id=2337223.2337232

[24]

Thanh Nguyen, Ngoc Tran, Hung Phan, Trong Nguyen, Linh Truong, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2018. Complementing Global and Local Contexts in Representing API Descriptions to Improve API Retrieval Tasks. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, New York, NY, USA, 551–562.

Digital Library

[25]

Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09) . ACM, New York, NY, USA, 383–392.

Digital Library

[26]

Tam The Nguyen, Hung Viet Pham, Phong Minh Vu, and Tung Thanh Nguyen. 2016b. Learning API Usages from Bytecode: A Statistical Approach. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 416–427.

Digital Library

[27]

Terence Parr. 2013. The Definitive ANTLR 4 Reference (2 ed.). Pragmatic Bookshelf.

[28]

M. F. Porter. 1997. Readings in Information Retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter An Algorithm for Suffix Stripping, 313–316. http://dl.acm.org/citation.cfm?id=275537.275705

Digital Library

[29]

Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: Synthesizing What I Mean: Code Search and Idiomatic Snippet Synthesis. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 357–367.

Digital Library

[30]

R. Robbes and M. Lanza. 2008. How Program History Can Improve Code Completion. In 2008 23rd IEEE/ACM International Conference on Automated Software Engineering . 317–326.

Digital Library

[31]

Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on Source Code: A Neural Code Search. In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2018) . ACM, New York, NY, USA, 31–41.

Digital Library

[32]

Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How Developers Search for Code: A Case Study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . ACM, New York, NY, USA, 191–201.

Digital Library

[33]

Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes. 2018. Oreo: Detection of Clones in the Twilight Zone. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, New York, NY, USA, 354–365.

Digital Library

[34]

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-code. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 1157–1168.

Digital Library

[35]

Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.

Digital Library

[36]

Raphael Sirres, Tegawendé F. Bissyandé, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, and Yves Le Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering 23, 5 (01 Oct 2018), 2622–2654.

Digital Library

[37]

Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API Documentation. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014) . ACM, New York, NY, USA, 643–652.

Digital Library

[38]

Christoph Treude and Martin P. Robillard. 2016. Augmenting API Documentation with Insights from Stack Overflow. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16) . ACM, New York, NY, USA, 392–403.

Digital Library

[39]

Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue. 2002. On detection of gapped code clones using gap locations. In Ninth Asia-Pacific Software Engineering Conference, 2002. 327–336.

[40]

Julien Verlaguet and Alok Menghrajani. 2014. Hack: a new programming language for HHVM. https://code.fb.com/developertools/hack-a-new-programming-language-for-hhvm/ .

[41]

Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K. Roy. 2018. CCAligner: A Token Based Large-gap Clone Detector. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 1066–1077.

Digital Library

[42]

Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep Learning Code Fragments for Code Clone Detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016) . ACM, New York, NY, USA, 87–98.

Digital Library

[43]

Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. 2009. MAPO: Mining and Recommending API Usage Patterns. In ECOOP 2009 – Object-Oriented Programming, Sophia Drossopoulou (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 318–343.

Cited By

Zhou XLiang PZhang BLi ZAhmad AShahin MWaseem M(2025)Exploring the problems, their causes and solutions of AI pair programming: A study on GitHub and Stack OverflowJournal of Systems and Software10.1016/j.jss.2024.112204219(112204)Online publication date: Jan-2025
https://doi.org/10.1016/j.jss.2024.112204
Zhang KLi JLi ZJin ZLi G(2025)Transformer-based code model with compressed hierarchy representationEmpirical Software Engineering10.1007/s10664-025-10612-630:2Online publication date: 23-Jan-2025
https://doi.org/10.1007/s10664-025-10612-6
Abu Doush I(2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
https://doi.org/10.54455/MCN2605
Show More Cited By

Index Terms

Aroma: code recommendation via structural code search
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Near-duplicate and plagiarism detection
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
  2. Software notations and tools
    1. Development frameworks and environments

Recommendations

Learning to rank code examples for code search engines

Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on Internetware

Bug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Big Code Search: A Bibliography
Code search is an essential task in software development. Developers often search the internet and other code databases for necessary source code snippets to ease the development efforts. Code search techniques also help learn programming as novice ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 3, Issue OOPSLA

October 2019

2077 pages

EISSN:2475-1421

DOI:10.1145/3366395

Issue’s Table of Contents

Copyright © 2019 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2019

Published in PACMPL Volume 3, Issue OOPSLA

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
3,212
Total Downloads

Downloads (Last 12 months)463
Downloads (Last 6 weeks)52

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou XLiang PZhang BLi ZAhmad AShahin MWaseem M(2025)Exploring the problems, their causes and solutions of AI pair programming: A study on GitHub and Stack OverflowJournal of Systems and Software10.1016/j.jss.2024.112204219(112204)Online publication date: Jan-2025
https://doi.org/10.1016/j.jss.2024.112204
Zhang KLi JLi ZJin ZLi G(2025)Transformer-based code model with compressed hierarchy representationEmpirical Software Engineering10.1007/s10664-025-10612-630:2Online publication date: 23-Jan-2025
https://doi.org/10.1007/s10664-025-10612-6
Abu Doush I(2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
https://doi.org/10.54455/MCN2605
Olewicki DHabchi SNayrolles MFaramarzi MChandar SAdams BRoychoudhury APaiva AAbreu RStorey MAniche MNagappan N(2024)On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics - Empirical Study on Brown Build and Risk PredictionProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639717(275-286)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639477.3639717
Bhowmick AMishra MSinghal R(2024)TASCA : Tool for Automatic SCalable Acceleration of ML pipelines✱Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632504(514-518)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632504
Kang HWang KKim MRoychoudhury APaiva AAbreu RStorey M(2024)Scaling Code Pattern Inference with Interactive What-If AnalysisProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639193(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639193
Gao SZhang LLiu HWang Y(2024)Which Animation API Should I Use Next? A Multimodal Real-Time Animation API Recommendation Model for Android AppsIEEE Transactions on Software Engineering10.1109/TSE.2023.333872850:1(106-122)Online publication date: Jan-2024
https://doi.org/10.1109/TSE.2023.3338728
Rotchford DEvans SFilgueira R(2024)Laminar 2.0: Serverless Stream Processing with Enhanced Code Search and RecommendationsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00261(2088-2095)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SCW63240.2024.00261
Jha ANadi S(2024)Migrating Unit Tests Across Java Applications2024 IEEE International Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM63643.2024.00022(131-142)Online publication date: 7-Oct-2024
https://doi.org/10.1109/SCAM63643.2024.00022
Mlinarić DDončević JBrčić MBotički I(2024)Revolutionizing Software Development: Autonomous Software Evolution2024 47th MIPRO ICT and Electronics Convention (MIPRO)10.1109/MIPRO60963.2024.10569871(224-228)Online publication date: 20-May-2024
https://doi.org/10.1109/MIPRO60963.2024.10569871
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents