[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
survey

Opportunities and Challenges in Code Search Tools

Published: 08 October 2021 Publication History

Abstract

Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged different techniques, such as deep learning and information retrieval approaches, to retrieve expected code from a large-scale codebase. However, there is a lack of a comprehensive comparative summary of existing code search approaches. To understand the research trends in existing code search studies, we systematically reviewed 81 relevant studies. We investigated the publication trends of code search studies, analyzed key components, such as codebase, query, and modeling technique used to build code search tools, and classified existing tools into focusing on supporting seven different search tasks. Based on our findings, we identified a set of outstanding challenges in existing studies and a research roadmap for future code search research.

References

[1]
Hervé Abdi. 2007. The Kendall rank correlation coefficient. Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA (2007), 508–510.
[2]
Afsoon Afzal, Manish Motwani, Kathryn Stolee, Yuriy Brun, and Claire Le Goues. 2019. SOSRepair: Expressive semantic search for real-world program repair. IEEE Transactions on Software Engineering (2019).
[3]
Parag Agrawal, Arvind Arasu, and Raghav Kaushik. 2010. On indexing error-tolerant set containment. In International Conference on Management of Data. 927–938.
[4]
Shayan Akbar and Avinash Kak. 2019. SCOR: Source code retrieval with semantics and order. In Working Conference on Mining Software Repositories. IEEE, 1–12.
[5]
Miltos Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In International Conference on Machine Learning. 2123–2132.
[6]
Sushil Bajracharya and Cristina Lopes. 2009. Mining search topics from a code search engine usage log. In Working Conference on Mining Software Repositories. IEEE, 111–120.
[7]
Sushil Krishna Bajracharya and Cristina Videira Lopes. 2012. Analyzing and mining a code search engine usage log. Empirical Software Engineering 17, 4–5 (2012), 424–466.
[8]
Sushil K. Bajracharya, Joel Ossher, and Cristina V. Lopes. 2010. Leveraging usage similarity for effective retrieval of examples in code repositories. In International Symposium on Foundations of Software Engineering. 157–166.
[9]
Vipin Balachandran. 2015. Query by example in large-scale code repositories. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 467–476.
[10]
Lingfeng Bao, Zhenchang Xing, Xin Xia, David Lo, Minghui Wu, and Xiaohu Yang. 2020. psc2code: Denoising code extraction from programming screencasts. ACM Transactions on Software Engineering and Methodology 29, 3 (2020), 1–38.
[11]
Anton Barua, Stephen W. Thomas, and Ahmed E. Hassan. 2014. What are developers talking about? An analysis of topics and trends in stack overflow. Empirical Software Engineering 19, 3 (2014), 619–654.
[12]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In International Conference on Management of Data. 322–331.
[13]
Farnaz Behrang, Steven P. Reiss, and Alessandro Orso. 2018. GUIfetch: Supporting app design and development through GUI search. In International Conference on Mobile Software Engineering and Systems. 236–246.
[14]
Tony Beltramelli. 2018. pix2code: Generating code from a graphical user interface screenshot. In EICS. 1–6.
[15]
Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coefficient. In Noise Reduction in Speech Processing. Springer, 1–4.
[16]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. TACL 5 (2017), 135–146.
[17]
Joel Brandt, Mira Dontcheva, Marcos Weskamp, and Scott R. Klemmer. 2010. Example-centric programming: Integrating web search into the development environment. In SIGCHI Conference on Human Factors in Computing Systems. ACM, 513–522.
[18]
Joel Brandt, Philip J . Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code. In SIGCHI Conference on Human Factors in Computing Systems. ACM, 1589–1598.
[19]
Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In International Symposium on Foundations of Software Engineering. 964–974.
[20]
Long Chen, Wei Ye, and Shikun Zhang. 2019. Capturing source code semantics via tree-based convolution over API-enhanced AST. In ACM International Conference on Computing Frontiers. 174–182.
[21]
Zhengzhao Chen, Renhe Jiang, Zejun Zhang, Yu Pei, Minxue Pan, Tian Zhang, and Xuandong Li. 2020. Enhancing example-based code search with functional semantics. Journal of Systems and Software (2020), 110568.
[22]
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR Conference on Research and Development in Information Retrieval. 659–666.
[23]
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37–46.
[24]
David Roxbee Cox and Alan Stuart. 1955. Some quick sign tests for trend in location and dispersion. Biometrika 42, 1/2 (1955), 80–95.
[25]
Kostadin Damevski, David Shepherd, and Lori Pollock. 2014. A case study of paired interleaving for evaluating code search techniques. In Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering. IEEE, 54–63.
[26]
Kostadin Damevski, David Shepherd, and Lori Pollock. 2016. A field study of how developers locate features in source code. Empirical Software Engineering 21, 2 (2016), 724–747.
[27]
Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. ACM SIGPLAN Notices 49, 6 (2014), 349–360.
[28]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.
[29]
Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2016. Kam1n0: Mapreduce-based assembly clone search for reverse engineering. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 461–470.
[30]
Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In IEEE Symposium on Security and Privacy. IEEE, 472–489.
[31]
Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature location in source code: A taxonomy and survey. Journal of Software: Evolution and Process 25, 1 (2013), 53–95.
[32]
Ian Drosos, Titus Barik, Philip J. Guo, Robert DeLine, and Sumit Gulwani. 2020. Wrex: A unified programming-by-example interaction for synthesizing readable code for data scientists. In CHI Conference on Human Factors in Computing Systems. 1–12.
[33]
Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In SIGSAC Conference on Computer and Communications Security. 480–491.
[34]
Brendan J. Frey and Delbert Dueck. 2007. Clustering by passing messages between data points. Science 315, 5814 (2007), 972–976.
[35]
Wei Fu and Tim Menzies. 2017. Easy over hard: A case study on deep learning. In Joint Meeting on Foundations of Software Engineering. 49–60.
[36]
Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. Vulseeker: A semantic learning based vulnerability seeker for cross-platform binary. In International Conference on Automated Software Engineering. IEEE, 896–899.
[37]
Xi Ge, David Shepherd, Kostadin Damevski, and Emerson Murphy-Hill. 2014. How developers use multi-recommendation system in local code search. In VL/HCC. IEEE, 69–76.
[38]
Mohammad Gharehyazie, Baishakhi Ray, and Vladimir Filkov. 2017. Some from here, some from there: Cross-project code reuse in github. In International Conference on Mining Software Repositories. IEEE, 291–301.
[39]
Aristides Gionis, Piotr Indyk, and Rajeev Motwanil. 1999. Similarity search in high dimensions via hashing. In VLDB, Vol. 99. 518–529.
[40]
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In International Conference on Software Engineering. IEEE, 933–944.
[41]
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2019. CodeKernel: A graph kernel based approach to the selection of API usage examples. In International Conference on Automated Software Engineering. IEEE, 590–601.
[42]
Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In International Conference on Software Engineering. IEEE, 842–851.
[43]
Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques. Elsevier.
[44]
Simon Harris. 2003. Simian-similarity analyser. HYPERLINK Available from http://www. harukizaemon. com/simian/index. html (2003).
[45]
Gang Hu, Min Peng, Yihan Zhang, Qianqian Xie, Wang Gao, and Mengting Yuan. 2020. Unsupervised software repositories mining and its application to code search. Software-Practice & Experience 50, 3 (2020), 299–322.
[46]
Qing Huang, An Qiu, Maosheng Zhong, and Yuan Wang. 2020. A code-description representation learning model based on attention. In International Conference on Software Analysis, Evolution and Reengineering. IEEE, 447–455.
[47]
Qing Huang and Guoqing Wu. 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26, 4 (2019), 705–732.
[48]
Qing Huang and Huaiguang Wu. 2019. QE-integrating framework based on Github knowledge and SVM ranking. Science China. Information Science 62, 5 (2019), 52102.
[49]
Qing Huang, Yang Yang, and Ming Cheng. 2019. Deep learning the semantics of change sequences for query expansion. Software-Practice & Experience 49, 11 (2019), 1600–1617.
[50]
Qing Huang, Yangrui Yang, Xue Zhan, Hongyan Wan, and Guoqing Wu. 2018. Query expansion based on statistical learning from code changes. Software-Practice & Experience 48, 7 (2018), 1333–1351.
[51]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In AMACL. 2073–2083.
[52]
He Jiang, Liming Nie, Zeyi Sun, Zhilei Ren, Weiqiang Kong, Tao Zhang, and Xiapu Luo. 2016. ROSF: Leveraging information retrieval and supervised learning for recommending code snippets. IEEE Transactions on Services Computing 12, 1 (2016), 34–46.
[53]
Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. In International Conference on Software Engineering. IEEE, 96–105.
[54]
Renhe Jiang, Zhengzhao Chen, Zejun Zhang, Yu Pei, Minxue Pan, and Tian Zhang. 2018. Semantics-based code search using input/output examples. In International Working Conference on Source Code Analysis and Manipulation. IEEE, 92–102.
[55]
Huan Jin and Lei Xiong. 2019. A query expansion method based on evolving source code. Wuhan University Journal of Natural Sciences 24, 5 (2019), 391–399.
[56]
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (2002), 654–670.
[57]
Barbara Kitchenham and Stuart Charters. 2007. Guidelines for Performing Systematic literature Reviews in Software Engineering (Version 2.3). Technical Report, Keele University and University of Durham.
[58]
Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting working code examples. In International Conference on Software Engineering. ACM, 664–675.
[59]
Wei Ming Khoo, Alan Mycroft, and Ross Anderson. 2013. Rendezvous: A search engine for binary code. In Working Conference on Mining Software Repositories. IEEE, 329–338.
[60]
Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-codesearch engine. In International Conference on Software Engineering. 946–957.
[61]
Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: A code-to-code search engine. In International Conference on Software Engineering. 946–957.
[62]
Jacob Krüger, Thorsten Berger, and Thomas Leich. 2019. Features and how to find them: A survey of manual feature location. Software Engineering for Variability Intensive Systems (2019), 153–172.
[63]
Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2130–2137.
[64]
An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In International Conference on Program Comprehension. IEEE, 218–229.
[65]
Moreno Laura, Bavota Gabriele, Di Penta Massimiliano, Oliveto Rocco, and Marcus Andrian. 2015. How can I use this method?. In International Conference on Software Engineering. ACM.
[66]
Mu-Woong Lee, Jong-Won Roh, Seung-won Hwang, and Sunghun Kim. 2010. Instant code clone search. In International Symposium on Foundations of Software Engineering. 167–176.
[67]
Shin-Jie Lee, Xavier Lin, Wu-Chen Su, and Hsi-Min Chen. 2018. A comment-driven approach to API usage patterns discovery and search. Journal of Internet Technology 19, 5 (2018), 1587–1601.
[68]
Otávio A. L. Lemos, Adriano C. de Paula, Felipe C. Zanichelli, and Cristina V. Lopes. 2014. Thesaurus-based automatic query expansion for interface-driven code search. In Working Conference on Mining Software Repositories. 212–221.
[69]
Otávio Augusto Lazzarini Lemos, Sushil Bajracharya, Joel Ossher, Paulo Cesar Masiero, and Cristina Lopes. 2011. A test-driven approach to code search and its application to the reuse of auxiliary functionality. Information and Software Technology 53, 4 (2011), 294–306.
[70]
Otávio Augusto Lazzarini Lemos, Adriano Carvalho de Paula, Gustavo Konishi, Sushil Krishna Bajracharya, Joel Ossher, and Cristina Videira Lopes. 2014. Thesaurus-based tag clouds for test-driven code search.Journal of Universal Computer Science 20, 5 (2014), 772–796.
[71]
Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics Doklady, 10. 707–710.
[72]
Hongwei Li, Zhenchang Xing, Xin Peng, and Wenyun Zhao. 2013. What help do developers seek, when and how?. In Working Conference on Reverse Engineering. IEEE, 142–151.
[73]
Wei Li, Shuhan Yan, Beijun Shen, and Yuting Chen. 2019. Reinforcement learning of code search sessions. In Asia-Pacific Software Engineering Conference. IEEE, 458–465.
[74]
Xuan Li, Zerui Wang, Qianxiang Wang, Shoumeng Yan, Tao Xie, and Hong Mei. 2016. Relationship-aware code search for JavaScript frameworks. In International Symposium on Foundations of Software Engineering. ACM, 690–701.
[75]
Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. 2009. Sourcerer: Mining and searching Internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300–336.
[76]
Chao Liu, Cuiyun Gao, Xin Xia, David Lo, John Grundy, and Xiaohu Yang. 2020. On the replicability and reproducibility of deep learning in software engineering. arXiv preprint arXiv:2006.14244 (2020).
[77]
Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, and Shanping Li. 2020. Simplifying deep-learning-based model for code search. arXiv preprint arXiv:2005.14373 (2020).
[78]
Chao Liu, Dan Yang, Xin Xia, Meng Yan, and Xiaohong Zhang. 2018. Cross-project change-proneness prediction. In Annual Computer Software and Applications Conference, Vol. 1. IEEE, 64–73.
[79]
Chao Liu, Dan Yang, Xin Xia, Meng Yan, and Xiaohong Zhang. 2019. A two-phase transfer learning model for cross-project defect prediction. Information and Software Technology 107 (2019), 125–136.
[80]
Chao Liu, Dan Yang, Xiaohong Zhang, Haibo Hu, Jed Barson, and Baishakhi Ray. 2018. A recommender system for developer onboarding. In International Conference on Software Engineering: Companion. 319–320.
[81]
Chao Liu, Dan Yang, Xiaohong Zhang, Baishakhi Ray, and Md. Masudur Rahman. 2018. Recommending GitHub projects for developer onboarding. IEEE Access 6 (2018), 52082–52094.
[82]
Jason Liu, Seohyun Kim, Vijayaraghavan Murali, Swarat Chaudhuri, and Satish Chandra. 2019. Neural query expansion for code search. In International Workshop on Machine Learning and Programming Languages. 29–37.
[83]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F Bissyandé. 2019. TBar: Revisiting template-based automated program repair. In ACM SIGSOFT International Symposium on Software Testing and Analysis. 31–42.
[84]
Wenjian Liu, Xin Peng, Zhenchang Xing, Junyi Li, Bing Xie, and Wenyun Zhao. 2018. Supporting exploratory code search with differencing and visualization. In International Conference on Software Analysis, Evolution and Reengineering. IEEE, 300–310.
[85]
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. OOPSLA 3 (2019), 152.
[86]
Fei Lv, Hongyu Zhang, Jian-guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. Codehow: Effective code search based on API understanding and extended Boolean model (e). In International Conference on Automated Software Engineering. IEEE, 260–270.
[87]
Lee Wei Mar, Ye-Chi Wu, and Hewijin Christine Jiau. 2011. Recommending proper API code examples for documentation purpose. In Asia-Pacific Software Engineering Conference. IEEE, 331–338.
[88]
Lee Martie, Thomas D. LaToza, and Andre van der Hoek. 2015. Codeexchange: Supporting reformulation of internet-scale code queries in context (T). In International Conference on Automated Software Engineering. IEEE, 24–35.
[89]
Lee Martie and Andre van der Hoek. 2015. Sameness: An experiment in code search. In Working Conference on Mining Software Repositories. IEEE, 76–87.
[90]
Michael McCandless, Erik Hatcher, and Otis Gospodnetić. 2010. Lucene in Action. Vol. 2. Manning Greenwich.
[91]
Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: Finding relevant functions and their usage. In International Conference on Software Engineering. 111–120.
[92]
Collin McMillan, Negar Hariri, Denys Poshyvanyk, Jane Cleland-Huang, and Bamshad Mobasher. 2012. Recommending source code for use in rapid software prototypes. In International Conference on Software Engineering. IEEE Press, 848–858.
[93]
Collin McMillan, Denys Poshyvanyk, Mark Grechanik, Qing Xie, and Chen Fu. 2013. Portfolio: Searching for relevant functions and their usages in millions of lines of code. ACM Transactions on Software Engineering and Methodology 22, 4 (2013), 1–30.
[94]
Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and Adam Kalai. 2013. A machine learning framework for programming by example. In International Conference on Machine Learning. PMLR, 187–195.
[95]
Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. IEEE Computer Society 34, 8 (2010), 1388–1429.
[96]
Leann Myers and Maria J. Sirois. 2004. S. Pearman correlation coefficients, differences between. Encyclopedia of Statistical Sciences (2004).
[97]
Brent D. Nichols. 2010. Augmented bug localization using past bug information. In Annual Southeast Regional Conference. 1–6.
[98]
Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li. 2016. Query expansion based on crowd knowledge for code search. IEEE Transactions on Services Computing 9, 5 (2016), 771–783.
[99]
Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2009), 1345–1359.
[100]
Kai Petersen, Sairam Vakkalanka, and Ludwik Kuzniarz. 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (2015), 1–18.
[101]
Denys Poshyvanyk and Mark Grechanik. 2009. Creating and evolving software by searching, selecting and synthesizing relevant source code. In International Conference on Software Engineering-Companion. IEEE, 283–286.
[102]
Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. Swim: Synthesizing what I mean. Code search and idiomatic snippet synthesis. In International Conference on Software Engineering. IEEE, 357–367.
[103]
Chaiyong Ragkhitwetsagul and Jens Krinke. 2019. Siamese: Scalable and incremental code clone search via multiple code representations. Empirical Software Engineering 24, 4 (2019), 2236–2284.
[104]
Md. Masudur Rahman, Jed Barson, Sydney Paul, Joshua Kayani, Federico Andrés Lois, Sebastián Fernandez Quezada, Christopher Parnin, Kathryn T. Stolee, and Baishakhi Ray. 2018. Evaluating how developers use general-purpose web-search for code retrieval. In International Conference on Mining Software Repositories. ACM, 465–475.
[105]
Sukanya Ratanotayanon, Hye Jung Choi, and Susan Elliott Sim. 2010. My repository runneth over: An empirical study on diversifying data sources to improve feature search. In International Conference on Program Comprehension. IEEE, 206–215.
[106]
Steven P. Reiss. 2009. Semantics-based code search. In International Conference on Software Engineering. IEEE, 243–253.
[107]
Steven P. Reiss, Yun Miao, and Qi Xin. 2018. Seeking the user interface. Automated Software Engineering 25, 1 (2018), 157–193.
[108]
Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers, Inc.
[109]
Barbara Rosario. 2000. Latent semantic indexing: An overview. Techn. Rep. INFOSYS 240 (2000), 1–16.
[110]
Chanchal Kumar Roy and James R. Cordy. 2007. A survey on software clone detection research. Queen’s School of Computing TR 541, 115 (2007), 64–68.
[111]
Chanchal K. Roy, James R. Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming 74, 7 (2009), 470–495.
[112]
Julia Rubin and Marsha Chechik. 2013. A survey of feature location techniques. In Domain Engineering. Springer, 29–58.
[113]
Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on source code: A neural code search. In International Workshop on Machine Learning and Programming Languages. 31–41.
[114]
Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How developers search for code: A case study. In Joint Meeting on Foundations of Software Engineering. ACM, 191–201.
[115]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling code clone detection to big-code. In International Conference on Software Engineering. 1157–1168.
[116]
Abdullah Sheneamer and Jugal Kalita. 2016. A survey of software clone detection techniques. International Journal of Computer Applications 137, 10 (2016), 1–21.
[117]
Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, and Yan Lei. 2020. Improving code search with co-attentive representation learning. In International Conference on Program Comprehension.
[118]
Susan Elliott Sim, Medha Umarji, Sukanya Ratanotayanon, and Cristina V. Lopes. 2011. How well do search engines support code retrieval on the web?ACM Transactions on Software Engineering and Methodology 21, 1 (2011), 1–25.
[119]
Janice Singer, Timothy Lethbridge, Norman Vinson, and Nicolas Anquetil. 2010. An examination of software engineering work practices. In CASCON First Decade High Impact Papers. IBM Corp., 174–188.
[120]
Raphael Sirres, Tegawendé F. Bissyandé, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, and Yves Le Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering 23, 5 (2018), 2622–2654.
[121]
Bunyamin Sisman and Avinash C. Kak. 2013. Assisting code search with automatic query reformulation for bug localization. In Working Conference on Mining Software Repositories. IEEE, 309–318.
[122]
Jamie Starke, Chris Luce, and Jonathan Sillito. 2009. Searching and skimming: An exploratory study. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 157–166.
[123]
Kathryn T. Stolee, Sebastian Elbaum, and Daniel Dobos. 2014. Solving the search for source code. ACM Transactions on Software Engineering and Methodology 23, 3 (2014), 26.
[124]
Kathryn T. Stolee, Sebastian Elbaum, and Matthew B. Dwyer. 2016. Code search with input/output queries: Generalizing, ranking, and assessment. Journal of Systems and Software 116 (2016), 35–48.
[125]
Rui Sun, Hui Liu, and Leping Li. 2019. Slicing based code recommendation for type-based instance retrieval. In International Conference on Software and Systems Reuse. Springer, 149–167.
[126]
Suresh Thummalapenta and Tao Xie. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In International Conference on Automated Software Engineering. 204–213.
[127]
Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In International Conference on Automated Software Engineering. IEEE, 283–294.
[128]
Suresh Thummalapenta and Tao Xie. 2011. Alattin: Mining alternative patterns for defect detection. Automated Software Engineering 18, 3-4 (2011), 293–323.
[129]
Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. IGI global, 242–264.
[130]
Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, and Philip Yu. 2019. Multi-modal attention network learning for semantic source code retrieval. In International Conference on Automated Software Engineering. IEEE, 13–25.
[131]
Bei Wang, Ling Xu, Meng Yan, Chao Liu, and Ling Liu. 2020. Multi-dimension convolutional neural network for bug localization. IEEE Transactions on Services Computing (2020).
[132]
Jue Wang, Yingnong Dang, Hongyu Zhang, Kai Chen, Tao Xie, and Dongmei Zhang. 2013. Mining succinct and high-coverage API usage patterns from source code. In Working Conference on Mining Software Repositories. IEEE, 319–328.
[133]
Jianyong Wang and Jiawei Han. 2004. BIDE: Efficient mining of frequent closed sequences. In 20th International Conference on Data Engineering. IEEE, 79–90.
[134]
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In International Conference on Automated Software Engineering. IEEE, 87–98.
[135]
Norman Wilde, Ross Huitt, and Scott Huitt. 1989. Dependency analysis tools: Reusable components for software maintenance. In Conference on Software Maintenance. IEEE, 126–131.
[136]
Huaiguang Wu and Yang Yang. 2019. Code search based on alteration intent. IEEE Access 7 (2019), 56796–56802.
[137]
Ho Chung Wu, Robert Wing Pong Luk, Kam Fai Wong, and Kui Lam Kwok. 2008. Interpreting TF-IDF term weights as making relevance decisions. ACM Transactions on Information Systems 26, 3 (2008), 1–37.
[138]
Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, and Zhenchang Xing. 2017. What do developers search for on the web?Empirical Software Engineering 22, 6 (2017), 3149–3185.
[139]
Yingtao Xie, Tao Lin, and Hongyan Xu. 2019. User interface code retrieval: A novel visual-representation-aware approach. IEEE Access 7 (2019), 162756–162767.
[140]
Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In SIGSAC Conference on Computer and Communications Security. 363–376.
[141]
Yinxing Xue, Zhengzi Xu, Mahinthan Chandramohan, and Yang Liu. 2018. Accurate and scalable cross-architecture cross-os binary code search with emulation. IEEE Transactions on Software Engineering 45, 11 (2018), 1125–1149.
[142]
Shuhan Yan, Hang Yu, Yuting Chen, Beijun Shen, and Lingxiao Jiang. 2020. Are the code snippets what we are searching for? A benchmark and an empirical study on code search with natural-language queries. In International Conference on Software Analysis, Evolution and Reengineering. IEEE, 344–354.
[143]
Yangrui Yang and Qing Huang. 2017. IECS: Intent-enforced code search via extended Boolean model. Journal of Intelligent & Fuzzy Systems 33, 4 (2017), 2565–2576.
[144]
Ziyu Yao, Jayavardhan Reddy Peddamail, and Huan Sun. 2019. CoaCor: Code annotation for code retrieval with reinforcement learning. In The World Wide Web Conference. 2203–2214.
[145]
Wei Ye, Rui Xie, Jinglei Zhang, Tianxiang Hu, Xiaoyin Wang, and Shikun Zhang. 2020. Leveraging code generation to improve code retrieval and summarization via dual learning. In The World Wide Web Conference. 2309–2319.
[146]
Feng Zhang, Haoran Niu, Iman Keivanloo, and Ying Zou. 2017. Expanding queries for code search using semantically related API class-names. IEEE Transactions on Software Engineering 44, 11 (2017), 1070–1082.
[147]
Jingxuan Zhang, He Jiang, Zhilei Ren, Tao Zhang, and Zhiqiu Huang. 2019. Enriching API documentation with code samples and usage scenarios from crowd knowledge. IEEE Transactions on Software Engineering (2019).
[148]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A novel neural source code representation based on abstract syntax tree. In International Conference on Software Engineering. IEEE, 783–794.
[149]
Jingtian Zhang, Sai Wu, Zeyuan Tan, Gang Chen, Zhushi Cheng, Wei Cao, Yusong Gao, and Xiaojie Feng. 2019. S3: A scalable in-memory skip-list index for key-value store. Proceedings of the VLDB Endowment 12, 12 (2019), 2183–2194.
[150]
Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).
[151]
Yu Zhang and Qiang Yang. 2018. An overview of multi-task learning. National Science Review 5, 1 (2018), 30–43.
[152]
Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. 2009. MAPO: Mining and recommending API usage patterns. In European Conference on Object-Oriented Programming. Springer, 318–343.
[153]
Qun Zou and Changquan Zhang. 2020. Query expansion via learning change sequences. International Journal of Knowledge-Based and Intelligent Engineering Systems 24, 2 (2020), 95–105.

Cited By

View all
  • (2025)FSECAM: A contextual thematic approach for linking feature to multi-level software architectural componentsJournal of Systems and Software10.1016/j.jss.2024.112245219(112245)Online publication date: Jan-2025
  • (2025)An intent-enhanced feedback extension model for code searchInformation and Software Technology10.1016/j.infsof.2024.107589177(107589)Online publication date: Jan-2025
  • (2024)Intelligent code search aids edge software developmentJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
  • Show More Cited By

Index Terms

  1. Opportunities and Challenges in Code Search Tools

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 54, Issue 9
    December 2022
    800 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/3485140
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 October 2021
    Accepted: 01 July 2021
    Revised: 01 June 2021
    Received: 01 November 2020
    Published in CSUR Volume 54, Issue 9

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Code search
    2. code retrieval
    3. modeling

    Qualifiers

    • Survey
    • Refereed

    Funding Sources

    • National Science Foundation of China
    • Key Research and Development Program of Zhejiang Province
    • National Research Foundation, Singapore
    • ARC Laureate Fellowship

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)226
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)FSECAM: A contextual thematic approach for linking feature to multi-level software architectural componentsJournal of Systems and Software10.1016/j.jss.2024.112245219(112245)Online publication date: Jan-2025
    • (2025)An intent-enhanced feedback extension model for code searchInformation and Software Technology10.1016/j.infsof.2024.107589177(107589)Online publication date: Jan-2025
    • (2024)Intelligent code search aids edge software developmentJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
    • (2024)Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code FragmentsProceedings of the ACM on Programming Languages10.1145/36564608:PLDI(2051-2072)Online publication date: 20-Jun-2024
    • (2024)A Survey of Source Code Search: A 3-Dimensional PerspectiveACM Transactions on Software Engineering and Methodology10.1145/365634133:6(1-51)Online publication date: 28-Jun-2024
    • (2024)Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE TasksProceedings of the ACM on Software Engineering10.1145/36437581:FSE(699-721)Online publication date: 12-Jul-2024
    • (2024)RAPID: Zero-Shot Domain Adaptation for Code Search with Pre-Trained ModelsACM Transactions on Software Engineering and Methodology10.1145/364154233:5(1-35)Online publication date: 3-Jun-2024
    • (2024)Fusing Code SearchersIEEE Transactions on Software Engineering10.1109/TSE.2024.340304250:7(1852-1866)Online publication date: 1-Jul-2024
    • (2024)Dsn2Code: An automated approach for similarity-based Software Architecture selection for Code reuse2024 International Research Conference on Smart Computing and Systems Engineering (SCSE)10.1109/SCSE61872.2024.10550890(1-6)Online publication date: 4-Apr-2024
    • (2024)Code Search Oriented Node-Enhanced Control Flow Graph Embedding2024 IEEE International Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM63643.2024.00016(59-70)Online publication date: 7-Oct-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media