[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3196398.3196425acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Public Access

Evaluating how developers use general-purpose web-search for code retrieval

Published: 28 May 2018 Publication History

Abstract

Search is an integral part of a software development process. Developers often use search engines to look for information during development, including reusable code snippets, API understanding, and reference examples. Developers tend to prefer general-purpose search engines like Google, which are often not optimized for code related documents and use search strategies and ranking techniques that are more optimized for generic, non-code related information.
In this paper, we explore whether a general purpose search engine like Google is an optimal choice for code-related searches. In particular, we investigate whether the performance of searching with Google varies for code vs. non-code related searches. To analyze this, we collect search logs from 310 developers that contains nearly 150,000 search queries from Google and the associated result clicks. To differentiate between code-related searches and non-code-related searches, we build a model which identifies the code intent of queries. Leveraging this model, we build an automatic classifier that detects a code and non-code related query. We confirm the effectiveness of the classifier on manually annotated queries where the classifier achieves a precision of 87%, a recall of 86%, and an F1-score of 87%. We apply this classifier to automatically annotate all the queries in the dataset. Analyzing this dataset, we observe that code related searching often requires more effort (e.g., time, result clicks, and query modifications) than general non-code search, which indicates code search performance with a general search engine is less effective.

References

[1]
Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. 2006. Sourcerer: a search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications. ACM, 681--682.
[2]
Sushil Krishna Bajracharya and Cristina Videira Lopes. 2012. Analyzing and mining a code search engine usage log. Empirical Software Engineering 17, 4-5 (2012), 424--466.
[3]
Veronika Bauer, Jonas Eckhardt, Benedikt Hauptmann, and Manuel Klimek. 2014. An exploratory study on reuse at google. In Proceedings of the 1st international workshop on software engineering research and industrial practices. ACM, 14--23.
[4]
Bing. {n. d.}. Bing Search. https://www.bing.com. ({n. d.}).
[5]
Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '09). ACM, New York, NY, USA, 1589--1598.
[6]
Codealike. {n. d.}. Codealike. https://codealike.com. ({n. d.}).
[7]
Cohen's Kappa Coefficient. {n. d.}. Cohen's Kappa Coefficient - Wikipedia. https://en.wikipedia.org/wiki/Cohen%27s_kappa. ({n. d.}).
[8]
Christopher S Corley, Federico Lois, and Sebastian Quezada. 2015. Web usage patterns of developers. In Software Maintenance and Evolution (ICSME), 2015 IEEE International Conference on. IEEE, 381--390.
[9]
Russ Cox. {n. d.}. Regular Expression Matching with a Trigram Index. https://swtch.com/~rsc/regexp/regexp4.html. ({n. d.}).
[10]
Frederico A Durão, Taciana A Vanderlei, Eduardo S Almeida, and Silvio R de L Meira. 2008. Applying a semantic layer in a source code search tool. In Proceedings of the 2008 ACM symposium on Applied computing. ACM, 1151--1157.
[11]
Google. {n. d.}. Google Code Search - Deprecation Announcement. http://googleblog.blogspot.com/2011/10/fall-sweep.html. ({n. d.}).
[12]
Google. {n. d.}. Google Search. https://www.google.com. ({n. d.}).
[13]
Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In Software Engineering (ICSE), 2013 35th International Conference on. IEEE, 842--851.
[14]
Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 763--773.
[15]
Reid Holmes and Gail C Murphy. 2005. Using structural context to recommend source code examples. In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on. IEEE, 117--125.
[16]
Michael Hucka and Matthew J Graham. 2016. Software search is not a science, even among scientists. arXiv preprint arXiv:1605.02265 (2016).
[17]
Krugle. {n. d.}. Krugle Search. http://opensearch.krugle.org. ({n. d.}).
[18]
Otávio Augusto Lazzarini Lemos, Sushil Krishna Bajracharya, and Joel Ossher. 2007. CodeGenie:: a tool for test-driven source code search. In Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion. ACM, 917--918.
[19]
Otávio AL Lemos, Adriano C de Paula, Felipe C Zanichelli, and Cristina V Lopes. 2014. Thesaurus-based automatic query expansion for interface-driven code search. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 212--221.
[20]
Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. 2009. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300--336.
[21]
English Stopword List. {n. d.}. "http://www.lextek.com/manuals/onix/stopwords1.html". ({n. d.}).
[22]
Lee Martie, André van der Hoek, and Thomas Kwak. 2017. Understanding the Impact of Support for Iteration on Code Search. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 774--785.
[23]
Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 111--120.
[24]
Haoran Niu, Iman Keivanloo, and Ying Zou. 2017. Learning to rank code examples for code search engines. Empirical Software Engineering 22, 1 (2017), 259--291.
[25]
Most Popular Programming Languages of 2017. {n. d.}. Top 100 programming languages. https://fossbytes.com/100-most-popular-programming-languages/. ({n. d.}).
[26]
Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering. ACM, 357--367.
[27]
Steven P Reiss. 2009. Semantics-based code search. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 243--253.
[28]
Caitlin Sadowski, Kathryn T Stolee, and Sebastian Elbaum. 2015. How developers search for code: a case study. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 191--201.
[29]
SearchCode. {n. d.}. SearchCode Search. https://searchcode.com. ({n. d.}).
[30]
Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz. 1999. Analysis of a very large web search engine query log. In ACm SIGIR Forum, Vol. 33. ACM, 6--12.
[31]
Susan Elliott Sim, Medha Umarji, Sukanya Ratanotayanon, and Cristina V Lopes. 2011. How well do search engines support code retrieval on the web? ACM Transactions on Software Engineering and Methodology (TOSEM) 21, 1 (2011), 4.
[32]
Renuka Sindhgatta. 2006. Using an information retrieval system to retrieve source code samples. In Proceedings of the 28th international conference on Software engineering. ACM, 905--908.
[33]
StackOverflow. {n. d.}. StackOverflow. https://code.openhub.net/. ({n. d.}).
[34]
StackOverflow. {n. d.}. StackOverflow. https://stackoverflow.com/. ({n. d.}).
[35]
StackOverflow. {n. d.}. StackOverflow Post - 46153155. https://stackoverflow.com/questions/46153155/apply-function-to-all-pairs-efficiently. ({n. d.}).
[36]
Kathryn T. Stolee, Sebastian Elbaum, and Daniel Dobos. 2014. Solving the Search for Source Code. ACM Trans. Softw. Eng. Methodol. 23, 3, Article 26 (June 2014), 45 pages.
[37]
Suresh Thummalapenta and Tao Xie. 2007. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 204--213.
[38]
Medha Umarji, Susan Sim, and Crista Lopes. 2008. Archetypal internet-scale source code searching. Open source development, communities and quality (2008), 257--263.
[39]
Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, and Zhenchang Xing. 2017. What do developers search for on the web? Empirical Software Engineering 22, 6 (01 Dec 2017), 3149--3185.
[40]
Yahoo. {n. d.}. Yahoo Search. https://www.yahoo.com. ({n. d.}).
[41]
Yunwen Ye and Gerhard Fischer. 2002. Supporting reuse by delivering task-relevant and personalized information. In Proceedings of the 24th international conference on Software engineering. ACM, 513--523.

Cited By

View all
  • (2024)An Empirical Study on Code Search Pre-trained Models: Academic Progresses vs. Industry RequirementsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3672580(41-50)Online publication date: 24-Jul-2024
  • (2024)Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow QuestionsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642596(1-17)Online publication date: 11-May-2024
  • (2024)Using an LLM to Help With Code UnderstandingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639187(1-13)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories
May 2018
627 pages
ISBN:9781450357166
DOI:10.1145/3196398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '18
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)255
  • Downloads (Last 6 weeks)35
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Empirical Study on Code Search Pre-trained Models: Academic Progresses vs. Industry RequirementsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3672580(41-50)Online publication date: 24-Jul-2024
  • (2024)Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow QuestionsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642596(1-17)Online publication date: 11-May-2024
  • (2024)Using an LLM to Help With Code UnderstandingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639187(1-13)Online publication date: 20-May-2024
  • (2024)Supporting Web-Based API Searches in the IDE Using SignaturesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639089(1-12)Online publication date: 20-May-2024
  • (2024)Automated Code Editing With Search-Generate-ModifyIEEE Transactions on Software Engineering10.1109/TSE.2024.337638750:7(1675-1686)Online publication date: 1-Jul-2024
  • (2024)Developers’ information seeking in Question & Answer websites through a gender lensJournal of Computer Languages10.1016/j.cola.2024.10126779(101267)Online publication date: Jun-2024
  • (2024)A Code Search Method Incorporating Code AnnotationsCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54521-4_18(323-342)Online publication date: 23-Feb-2024
  • (2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 28-Sep-2023
  • (2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
  • (2023)Reusable Component Retrieval: A Semantic Search Approach for Low-Resource LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/356460422:5(1-31)Online publication date: 10-May-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media