[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ASE.2013.6693093acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
Article

Improving bug localization using structured information retrieval

Published: 11 November 2013 Publication History

Abstract

Locating bugs is important, difficult, and expensive, particularly for large-scale systems. To address this, natural language information retrieval techniques are increasingly being used to suggest potential faulty source files given bug reports. While these techniques are very scalable, in practice their effectiveness remains low in accurately localizing bugs to a small number of files. Our key insight is that structured information retrieval based on code constructs, such as class and method names, enables more accurate bug localization. We present BLUiR, which embodies this insight, requires only the source code and bug reports, and takes advantage of bug similarity data if available. We build BLUiR on a proven, open source IR toolkit that anyone can use. Our work provides a thorough grounding of IR-based bug localization research in fundamental IR theoretical and empirical knowledge and practice. We evaluate BLUiR on four open source projects with approximately 3,400 bugs. Results show that BLUiR matches or outperforms a current state-of-theart tool across applications considered, even when BLUiR does not use bug similarity data used by the other tool.

References

[1]
R. Abreu, P. Zoeteweij, R. Golsteijn, and A. J. C. Van Gemund. A practical evaluation of spectrum-based fault localization. J. Syst. Softw., 82(11):1780-1792, Nov. 2009.
[2]
D. Binkley and D. Lawrie. Information retrieval applications in software maintenance and evolution. Encyclopedia of Software Engineering, 2010.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993-1022, 2003.
[4]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on the World Wide Web (WWW), pages 107-117, 1998.
[5]
F. P. Brooks, Jr. No silver bullet essence and accidents of software engineering. Computer, 20(4):10-19, Apr. 1987.
[6]
S. Davies, M. Roper, and M. Wood. Using bug report similarity to enhance bug localisation. In Proceedings of the 2012 19th Working Conference on Reverse Engineering, WCRE '12, pages 125-134, Washington, DC, USA, 2012. IEEE Computer Society.
[7]
F. Diaz. Regularizing query-based retrieval scores. Information Retrieval, 10(6):531-562, 2007.
[8]
B. Dit, L. Guerrouj, D. Poshyvanyk, and G. Antoniol. Can better identifier splitting techniques help feature location? In Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension, ICPC '11, pages 11-20, Washington, DC, USA, 2011. IEEE Computer Society.
[9]
B. Dit, M. Revelle, M. Gethers, and D. Poshyvanyk. Feature location in source code: A taxonomy and survey. Journal of Software: Evolution and Process, 25(1):53-95, 2013.
[10]
E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker. Mining source code to automatically split identifiers for software analysis. In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, MSR '09, pages 71-80, Washington, DC, USA, 2009. IEEE Computer Society.
[11]
H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In Proc. of the ACM SIGIR conference, pages 49-56, 2004.
[12]
G. Gay, S. Haiduc, A. Marcus, and T. Menzies. On the use of relevance feedback in ir-based concept location. In Proceedings of the IEEE International Conference on Software Maintenance, 2009, pages 351-360, 2009.
[13]
E. Hill, S. Rao, and A. Kak. On the use of stemming for concern location and bug localization in java. In Proceedings of the 12th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM '12, 2012.
[14]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pages 50-57. ACM, 1999.
[15]
D. Hovemeyer and W. Pugh. Finding bugs is easy. SIGPLAN Not., 39(12):92-106, Dec. 2004.
[16]
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proc. of the 14th ACM conference on Information and knowledge management, pages 84-90, 2005.
[17]
J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, ASE '05, pages 273-282, New York, NY, USA, 2005. ACM.
[18]
M. Lease, J. Allan, and W. B. Croft. Regression Rank: Learning to Meet the Opportunity of Descriptive Queries. In Proceedings of the European Conference on Information Retrieval, pages 90-101, 2009.
[19]
B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 15-26, New York, NY, USA, 2005. ACM.
[20]
T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pages 3-10, 2007.
[21]
S. K. Lukins, N. A. Kraft, and L. H. Etzkorn. Bug localization using latent dirichlet allocation. Information and Software Technology, 52(9):972-990, 2010.
[22]
S. Mani, R. Catherine, V. S. Sinha, and A. Dubey. Ausum: approach for unsupervised bug report summarization. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE '12, pages 1-11, 2012.
[23]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[24]
A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. An information retrieval approach to concept location in source code. In Proceedings of the 11th Working Conference on Reverse Engineering, WCRE '04, pages 214-223, Washington, DC, USA, 2004. IEEE Computer Society.
[25]
A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. Nguyen. A topic-based approach for narrowing the search space of buggy files from a bug report. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, pages 263-272, 2011.
[26]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual ACM SIGIR conference, pages 275-281, 1998.
[27]
D. Poshyvanyk, Y.-G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering, 33(6):420-432, June 2007.
[28]
D. Poshyvanyk and A. Marcus. Combining formal concept analysis with information retrieval for concept location in source code. In Proceedings of the 15th IEEE International Conference on Program Comprehension, ICPC '07, pages 37-48, 2007.
[29]
S. Rao and A. Kak. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR'11, pages 43-52, 2011.
[30]
S. Rao and A. Kak. moreBugs: A New Dataset for Benchmarking Algorithms for Information Retrieval from Software Repositories (trece- 13-07). Technical report, Purdue University, School of Electrical and Computer Engineering, April 2013.
[31]
S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In Proc. of the 13th ACM Conference on Information and Knowledge Management (CIKM), pages 42-49, 2004.
[32]
S. E. Robertson, S. Walker, and M. Beaulieu. Experimentation as a way of life: Okapi at trec. Information Processing & Management, 36(1):95-108, 2000.
[33]
J. ROCCHIO. Relevance feedback in information retrieval. SMART Retrieval System: Experiments in Automatic Document Processing, 1971.
[34]
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523, 1988.
[35]
N. Shahmehri, M. Kamkar, and P. Fritzson. Semi-automatic bug localization in software maintenance. In Proceedings of the International Conference on Software Maintenance, pages 30-36, 1990.
[36]
E. Y. Shapiro. Algorithmic Program DeBugging. MIT Press, Cambridge, MA, USA, 1983.
[37]
A. Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4):35-43, 2001.
[38]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21-29. ACM, 1996.
[39]
B. Sisman and A. Kak. Incorporating version histories in information retrieval based bug localization. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories, pages 50-59, 2012.
[40]
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, pages 2-6, 2005.
[41]
H. K. Wright, M. Kim, and D. E. Perry. Validity concerns in software engineering research. In Proc. of the FSE/SDP Workshop on Future of Software Engineering Eesearch, FoSER '10, pages 411-414. ACM, 2010.
[42]
X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In Proceedings of the 31st European Conference on Information Retrieval (ECIR), pages 29-41. Springer-Verlag, 2009.
[43]
A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Trans. Softw. Eng., 28(2):183-200, Feb. 2002.
[44]
C. Zhai. Notes on the lemur tfidf model (unpublished work). Technical report, Carnegie Mellon University, 2001.
[45]
X. Zhang, H. He, N. Gupta, and R. Gupta. Experimental evaluation of using dynamic slices for fault location. In Proceedings of the 6th International Symposium on Automated Analysis-driven Debugging, AADEBUG'05, pages 33-42, New York, NY, USA, 2005. ACM.
[46]
J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pages 14-24, Piscataway, NJ, USA, 2012. IEEE Press.

Cited By

View all
  • (2025)Understanding vulnerabilities in software supply chainsEmpirical Software Engineering10.1007/s10664-024-10581-230:1Online publication date: 1-Feb-2025
  • (2024)FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production MonitoringProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695977(522-540)Online publication date: 4-Nov-2024
  • (2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '13: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering
November 2013
765 pages
ISBN:9781479902156
  • General Chair:
  • Ewen Denney,
  • Program Chairs:
  • Tevfik Bultan,
  • Andreas Zeller

Sponsors

Publisher

IEEE Press

Publication History

Published: 11 November 2013

Check for updates

Author Tags

  1. bug localization
  2. information retrieval
  3. search

Qualifiers

  • Article

Conference

ASE '13
Sponsor:
ASE '13: Automated Software Engineering
November 11 - 15, 2013
CA, Silicon Valley, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Understanding vulnerabilities in software supply chainsEmpirical Software Engineering10.1007/s10664-024-10581-230:1Online publication date: 1-Feb-2025
  • (2024)FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production MonitoringProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695977(522-540)Online publication date: 4-Nov-2024
  • (2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
  • (2024)An Investigation of Patch Porting Practices of the Linux Kernel EcosystemProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644902(63-74)Online publication date: 15-Apr-2024
  • (2024)Vulnerability Root Cause Function Locating For Java VulnerabilitiesProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3641225(444-446)Online publication date: 14-Apr-2024
  • (2024)A systematic mapping study of bug reproduction and localizationInformation and Software Technology10.1016/j.infsof.2023.107338165:COnline publication date: 1-Jan-2024
  • (2024)An empirical study of fault localization in Python programsEmpirical Software Engineering10.1007/s10664-024-10475-329:4Online publication date: 13-Jun-2024
  • (2023)Capturing the long-distance dependency in the control flow graph via structural-guided attention for bug localizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/249(2242-2250)Online publication date: 19-Aug-2023
  • (2023)Pre-training Code Representation with Semantic Flow Graph for Effective Bug LocalizationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616338(579-591)Online publication date: 30-Nov-2023
  • (2023)Triggering Modes in Spectrum-Based Multi-location Fault LocalizationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613884(1774-1785)Online publication date: 30-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media