Abstract
To improve code search, many query expansion (QE) approaches use APIs or crowd knowledge for expanding a query. However, these approaches may sometimes negatively impact the retrieval performance. This is because they can’t distinguish the relevant terms from the irrelevant ones among a large set of candidate expansion terms and expand a query with irrelevant terms. In this paper, we propose QREC, a query reformulation approach with evolving contexts that refer to new/deleted terms and dependent terms during the code evolution. By considering the new terms as the relevant and the deleted terms as the irrelevant, QREC could reformulate a query with appropriate expansion terms. The experimental results show that QREC outperforms the state-of-the-art QE approaches (e.g., CodeHow and QECK) by 9–11% and improves the precision of the code search algorithms IR, Portfolio and VF by up to 37–45%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The full qualified name (FQN),declaration, instantiation and the signatures of methods invoked and filed accessed, etc.
References
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1 (2012)
Chaparro, O., Florez, J.M., Marcus, A.: Using observed behavior to reformulate queries during text retrieval-based bug localization. In: IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE (2017)
Fischer, G., Henninger, S., Redmiles, D.: Cognitive tools for locating and comprehending software objects for reuse. In: Proceedings of the 13th International Conference on Software Engineering, pp. 318–328 (1991)
Fluri, B., Wursch, M., Pinzger, M., Gall, H.C.: Change distilling—tree differencing for fine-grained source code change extraction. IEEE Trans. Softw. Eng. 33(11), 725–743 (2007)
Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Andrea, L., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering (ICSE), pp. 842–851 (2013)
Howard, M.J., Gupta, S., Pollock, L., Vijay-Shanker, K.: Automatically mining software-based, semantically-similar words from comment code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 377–386 (2013)
Keivanloo, I., Rilling, J., Zou, Y.: Spotting working code examples. In: Proceedings of the 36th International Conference on Software Engineering, pp. 664–675 (2014)
Lemos, O., Bajracharya, S., Ossher, J., Morla, R., Masiero, P., Baldi, P., Lopes, C.: CodeGenie: using test-cases to search and reuse source code. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 525–526 (2007)
Lv, F., Zhang, H., Lou, J.-G., Wang, S., Zhang, D., Zhao, J.: CodeHow: effective code search based on API understanding and extended boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270 (2015)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., Xie, Q.: Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans. Softw. Eng. 38(5), 1069–1087 (2012)
Mcmillan, C., Poshyvanyk, D., Grechanik, M., Xie, Q., Fu, C.: Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans. Softw. Eng. Methodol. 22(4), 1–30 (2013)
Nguyen, A.T., Hilton, M., Codoban, M., Nguyen, H.A., Mast, L., Rademacher, E., Nguyen, T.N., Dig, D.: API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 511–522 (2016)
Nie, L., Jiang, H., Ren, Z., Sun, Z., Li, X.: Query expansion based on crowd knowledge for code search. IEEE Trans. Serv. Comput. 9(5), 771–783 (2016)
Proksch, S., Amann, S., Nadi, S., Mezini, M.: Evaluating the evaluations of code recommender systems: a reality check. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, pp. 111–121 (2016)
Sadowski, C., Stolee, K.T., Elbaum, S.: How users search for code: a case study. Presented at the Proceedings, 10th Joint Meeting Foundations of Software Engineering (2015)
Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Commun. ACM 26, 1022–1036 (1983)
Sim, S.E, Clarke, C.L.A., Holt, R.C.: Archetypal source code searches: a survey of software users and maintainers. In: International Workshop on Program Comprehension, Iwpc ’98, Proceedings. IEEE, pp. 180–187 (1998)
Sridhara, G., Hill, E., Pollock, L.L., Vijay-Shanker, K.: Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings 16th IEEE International Conference on Program Comprehension (ICPC 08), pp. 123–132 (2008)
Stolee, K.T., Elbaum, S., Dobos, D.: Solving the search for source code. ACM Trans. Softw. Eng. Methodol. (TOSEM) 23(3), 26 (2014)
Sun, X., Liu, X., Hu, J., Zhu, J.: Empirical studies on the NLP techniques for source code data preprocessing. In: Proceedings of the 3rd International Workshop on Evidential Assessment of Software Technologies, pp. 32–39 (2014)
Tian, Y., Lo, D., Lawall, J.: SEWordSim: software-specific word similarity database. In: Companion Proceedings of the 36th International Conference on Software Engineering. ACM (2014)
Xu, B., Lin, H., Lin, Y.: Assessment of learning to rank methods for query expansion. J. Assoc. Inf. Sci. Technol. (JASIST) 67(6), 1345–1357 (2016)
Ye, X., Shen, H., Ma, X., Bunescu, R.C., Liu, C.: From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, pp. 404–415 (2016)
Youm, K.C., Ahn, J., Lee, E.: Improved bug localization based on code change histories and bug reports. Inf. Softw. Technol. 82, 177–192 (2017)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61902162, 61762049, 61872272, 61877031, 61802350, 61862033, 61772246, 61562042 and 61672470).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, Q., Wu, G. Enhance code search via reformulating queries with evolving contexts. Autom Softw Eng 26, 705–732 (2019). https://doi.org/10.1007/s10515-019-00263-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-019-00263-5