More Web Proxy on the site http://driver.im/

Article

Improving bug localization using structured information retrieval

Authors:

Sarfraz Khurshid,

Dewayne E. PerryAuthors Info & Claims

ASE '13: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering

Pages 345 - 355

https://doi.org/10.1109/ASE.2013.6693093

Published: 11 November 2013 Publication History

Publisher Site Get Access

Abstract

Locating bugs is important, difficult, and expensive, particularly for large-scale systems. To address this, natural language information retrieval techniques are increasingly being used to suggest potential faulty source files given bug reports. While these techniques are very scalable, in practice their effectiveness remains low in accurately localizing bugs to a small number of files. Our key insight is that structured information retrieval based on code constructs, such as class and method names, enables more accurate bug localization. We present BLUiR, which embodies this insight, requires only the source code and bug reports, and takes advantage of bug similarity data if available. We build BLUiR on a proven, open source IR toolkit that anyone can use. Our work provides a thorough grounding of IR-based bug localization research in fundamental IR theoretical and empirical knowledge and practice. We evaluate BLUiR on four open source projects with approximately 3,400 bugs. Results show that BLUiR matches or outperforms a current state-of-theart tool across applications considered, even when BLUiR does not use bug similarity data used by the other tool.

References

[1]

R. Abreu, P. Zoeteweij, R. Golsteijn, and A. J. C. Van Gemund. A practical evaluation of spectrum-based fault localization. J. Syst. Softw., 82(11):1780-1792, Nov. 2009.

Digital Library

[2]

D. Binkley and D. Lawrie. Information retrieval applications in software maintenance and evolution. Encyclopedia of Software Engineering, 2010.

[3]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993-1022, 2003.

Digital Library

[4]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on the World Wide Web (WWW), pages 107-117, 1998.

Digital Library

[5]

F. P. Brooks, Jr. No silver bullet essence and accidents of software engineering. Computer, 20(4):10-19, Apr. 1987.

Digital Library

[6]

S. Davies, M. Roper, and M. Wood. Using bug report similarity to enhance bug localisation. In Proceedings of the 2012 19th Working Conference on Reverse Engineering, WCRE '12, pages 125-134, Washington, DC, USA, 2012. IEEE Computer Society.

Digital Library

[7]

F. Diaz. Regularizing query-based retrieval scores. Information Retrieval, 10(6):531-562, 2007.

Digital Library

[8]

B. Dit, L. Guerrouj, D. Poshyvanyk, and G. Antoniol. Can better identifier splitting techniques help feature location? In Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension, ICPC '11, pages 11-20, Washington, DC, USA, 2011. IEEE Computer Society.

Digital Library

[9]

B. Dit, M. Revelle, M. Gethers, and D. Poshyvanyk. Feature location in source code: A taxonomy and survey. Journal of Software: Evolution and Process, 25(1):53-95, 2013.

[10]

E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker. Mining source code to automatically split identifiers for software analysis. In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, MSR '09, pages 71-80, Washington, DC, USA, 2009. IEEE Computer Society.

Digital Library

[11]

H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In Proc. of the ACM SIGIR conference, pages 49-56, 2004.

Digital Library

[12]

G. Gay, S. Haiduc, A. Marcus, and T. Menzies. On the use of relevance feedback in ir-based concept location. In Proceedings of the IEEE International Conference on Software Maintenance, 2009, pages 351-360, 2009.

[13]

E. Hill, S. Rao, and A. Kak. On the use of stemming for concern location and bug localization in java. In Proceedings of the 12th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM '12, 2012.

Digital Library

[14]

T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pages 50-57. ACM, 1999.

Digital Library

[15]

D. Hovemeyer and W. Pugh. Finding bugs is easy. SIGPLAN Not., 39(12):92-106, Dec. 2004.

Digital Library

[16]

J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In Proc. of the 14th ACM conference on Information and knowledge management, pages 84-90, 2005.

Digital Library

[17]

J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, ASE '05, pages 273-282, New York, NY, USA, 2005. ACM.

Digital Library

[18]

M. Lease, J. Allan, and W. B. Croft. Regression Rank: Learning to Meet the Opportunity of Descriptive Queries. In Proceedings of the European Conference on Information Retrieval, pages 90-101, 2009.

Digital Library

[19]

B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 15-26, New York, NY, USA, 2005. ACM.

Digital Library

[20]

T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pages 3-10, 2007.

[21]

S. K. Lukins, N. A. Kraft, and L. H. Etzkorn. Bug localization using latent dirichlet allocation. Information and Software Technology, 52(9):972-990, 2010.

Digital Library

[22]

S. Mani, R. Catherine, V. S. Sinha, and A. Dubey. Ausum: approach for unsupervised bug report summarization. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE '12, pages 1-11, 2012.

Digital Library

[23]

C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.

[24]

A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. An information retrieval approach to concept location in source code. In Proceedings of the 11th Working Conference on Reverse Engineering, WCRE '04, pages 214-223, Washington, DC, USA, 2004. IEEE Computer Society.

Digital Library

[25]

A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. Nguyen. A topic-based approach for narrowing the search space of buggy files from a bug report. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, pages 263-272, 2011.

Digital Library

[26]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual ACM SIGIR conference, pages 275-281, 1998.

Digital Library

[27]

D. Poshyvanyk, Y.-G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering, 33(6):420-432, June 2007.

Digital Library

[28]

D. Poshyvanyk and A. Marcus. Combining formal concept analysis with information retrieval for concept location in source code. In Proceedings of the 15th IEEE International Conference on Program Comprehension, ICPC '07, pages 37-48, 2007.

Digital Library

[29]

S. Rao and A. Kak. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR'11, pages 43-52, 2011.

Digital Library

[30]

S. Rao and A. Kak. moreBugs: A New Dataset for Benchmarking Algorithms for Information Retrieval from Software Repositories (trece- 13-07). Technical report, Purdue University, School of Electrical and Computer Engineering, April 2013.

[31]

S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In Proc. of the 13th ACM Conference on Information and Knowledge Management (CIKM), pages 42-49, 2004.

Digital Library

[32]

S. E. Robertson, S. Walker, and M. Beaulieu. Experimentation as a way of life: Okapi at trec. Information Processing & Management, 36(1):95-108, 2000.

Digital Library

[33]

J. ROCCHIO. Relevance feedback in information retrieval. SMART Retrieval System: Experiments in Automatic Document Processing, 1971.

[34]

G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523, 1988.

Digital Library

[35]

N. Shahmehri, M. Kamkar, and P. Fritzson. Semi-automatic bug localization in software maintenance. In Proceedings of the International Conference on Software Maintenance, pages 30-36, 1990.

[36]

E. Y. Shapiro. Algorithmic Program DeBugging. MIT Press, Cambridge, MA, USA, 1983.

Digital Library

[37]

A. Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4):35-43, 2001.

[38]

A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21-29. ACM, 1996.

Digital Library

[39]

B. Sisman and A. Kak. Incorporating version histories in information retrieval based bug localization. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories, pages 50-59, 2012.

Digital Library

[40]

T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, pages 2-6, 2005.

[41]

H. K. Wright, M. Kim, and D. E. Perry. Validity concerns in software engineering research. In Proc. of the FSE/SDP Workshop on Future of Software Engineering Eesearch, FoSER '10, pages 411-414. ACM, 2010.

Digital Library

[42]

X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In Proceedings of the 31st European Conference on Information Retrieval (ECIR), pages 29-41. Springer-Verlag, 2009.

Digital Library

[43]

A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Trans. Softw. Eng., 28(2):183-200, Feb. 2002.

Digital Library

[44]

C. Zhai. Notes on the lemur tfidf model (unpublished work). Technical report, Carnegie Mellon University, 2001.

[45]

X. Zhang, H. He, N. Gupta, and R. Gupta. Experimental evaluation of using dynamic slices for fault location. In Proceedings of the 6th International Symposium on Automated Analysis-driven Debugging, AADEBUG'05, pages 33-42, New York, NY, USA, 2005. ACM.

Digital Library

[46]

J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pages 14-24, Piscataway, NJ, USA, 2012. IEEE Press.

Digital Library

Cited By

Shen YGao XSun HGuo Y(2025)Understanding vulnerabilities in software supply chainsEmpirical Software Engineering10.1007/s10664-024-10581-230:1Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s10664-024-10581-2
Yoon DWang YYu MHuang EJones JKukkadapu AKocas OWiepert JGoenka KChen SLin YHuang ZKong JChow MTang CWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production MonitoringProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695977(522-540)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695977
Wu YWen MYu ZGuo XJin HFilkov VRay BZhou M(2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695013
Show More Cited By

Recommendations

Information retrieval and spectrum based bug localization: better together
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Debugging often takes much effort and resources. To help developers debug, numerous information retrieval (IR)-based and spectrum-based bug localization techniques have been proposed. IR-based techniques process textual information in bug reports, ...
Bug localization with combination of deep learning and information retrieval
ICPC '17: Proceedings of the 25th International Conference on Program Comprehension

The automated task of locating the potential buggy files in a software project given a bug report is called bug localization. Bug localization helps developers focus on crucial files. However, the existing automated bug localization approaches face a ...
Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools

Information retrieval (IR) based bug localization approaches process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recently, several IR-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '13: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering

November 2013

765 pages

ISBN:9781479902156

General Chair:
Ewen Denney
SGT, NASA, Ames Research Center, Moffett Field
,
Program Chairs:
Tevfik Bultan
University of California, Santa Barbara
,
Andreas Zeller
Saarland University, Saarbrücken, Germany

Sponsors

NASA: National Aeronatics and Space Administration
SIGAI: ACM Special Interest Group on Artificial Intelligence
IEEE: IEEE Computer Society Technical Committee on Design Automation
SIGSOFT: ACM Special Interest Group on Software Engineering
Microsoft: Microsoft

Publisher

IEEE Press

Publication History

Published: 11 November 2013

Check for updates

Author Tags

Qualifiers

Article

Conference

ASE '13

Sponsor:

NASA
SIGAI
IEEE
SIGSOFT
Microsoft

ASE '13: Automated Software Engineering

November 11 - 15, 2013

CA, Silicon Valley, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

93
Total Citations
View Citations
94
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shen YGao XSun HGuo Y(2025)Understanding vulnerabilities in software supply chainsEmpirical Software Engineering10.1007/s10664-024-10581-230:1Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s10664-024-10581-2
Yoon DWang YYu MHuang EJones JKukkadapu AKocas OWiepert JGoenka KChen SLin YHuang ZKong JChow MTang CWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production MonitoringProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695977(522-540)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695977
Wu YWen MYu ZGuo XJin HFilkov VRay BZhou M(2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695013
Li XZhang ZQian ZJaeger TSong CSpinellis DConstantinou EBacchelli A(2024)An Investigation of Patch Porting Practices of the Linux Kernel EcosystemProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644902(63-74)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644902
Zhang LRoychoudhury APaiva AAbreu RStorey M(2024)Vulnerability Root Cause Function Locating For Java VulnerabilitiesProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3641225(444-446)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3641225
Wang DGalster MMorales-Trujillo M(2024)A systematic mapping study of bug reproduction and localizationInformation and Software Technology10.1016/j.infsof.2023.107338165:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.infsof.2023.107338
Rezaalipour MFuria C(2024)An empirical study of fault localization in Python programsEmpirical Software Engineering10.1007/s10664-024-10475-329:4Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1007/s10664-024-10475-3
Ma YDu YLi MElkind E(2023)Capturing the long-distance dependency in the control flow graph via structural-guided attention for bug localizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/249(2242-2250)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/249
Du YYu ZChandra SBlincoe KTonella P(2023)Pre-training Code Representation with Semantic Flow Graph for Effective Bug LocalizationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616338(579-591)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616338
Dao TMeng NNguyen TChandra SBlincoe KTonella P(2023)Triggering Modes in Spectrum-Based Multi-location Fault LocalizationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613884(1774-1785)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3613884
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents