[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2487085.2487122guideproceedingsArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article
Free access

Search-based duplicate defect detection: an industrial experience

Published: 18 May 2013 Publication History

Abstract

Duplicate defects put extra overheads on software organizations, as the cost and effort of managing duplicate defects are mainly redundant. Due to the use of natural language and various ways to describe a defect, it is usually hard to investigate duplicate defects automatically. This problem is more severe in large software organizations with huge defect repositories and massive number of defect reporters. Ideally, an efficient tool should prevent duplicate reports from reaching developers by automatically detecting and/or filtering duplicates. It also should be able to offer defect triagers a list of top-N similar bug reports and allow them to compare the similarity of incoming bug reports with the suggested duplicates. This demand has motivated us to design and develop a search-based duplicate bug detection framework at BlackBerry. The approach follows a generalized process model to evaluate and tune the performance of the system in a systematic way. We have applied the framework on software projects at BlackBerry, in addition to the Mozilla defect repository. The experimental results exhibit the performance of the developed framework and highlight the high impact of parameter tuning on its performance.

References

[1]
X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and execution information,” in Proceedings of the International Conference on Software Engineering (ICSE), 2008, pp. 461–470.
[2]
Y. C. Cavalcanti, E. S. Almeida, C. E. Cunha, D. Lucredio, and S. Meira, “An initial study on the bug report duplication problem,” in Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR), 2010, pp. 264–267.
[3]
N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim, “Duplicate bug reports considered harmful... really?” in Proceedings of the IEEE International Conference on Software Maintenance (ICSM), 2008, pp. 337–345.
[4]
S. Just, R. Premraj, and T. Zimmermann, “Towards the next generation of bug tracking systems,” in Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VLHCC), 2008, pp. 82–85.
[5]
P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of duplicate defect reports using natural language processing,” in Proceedings of the International Conference on Software Engineering (ICSE), 2007, pp. 499–510.
[6]
N. Jalbert and W. Weimer, “Automated duplicate detection for bug tracking systems.” in Proceedings of Dependable Systems and Networks (DSN), 2008, pp. 52–61.
[7]
C. Sun, D. Lo, X. Wang, J. Jiang, and S. C. Khoo, “A discriminative model approach for accurate duplicate bug report retrieval,” in Proceedings of the International Conference on Software Engineering (ICSE), 2010, pp. 45–54.
[8]
B. Ashok, J. Joy, H. Liang, S. Rajamani, G. Srinivasa, and V. Vangala, “Debugadvisor: a recommender system for debugging,” in Proceedings of European Software Engineering Conference and Symposium on The Foundations of Software Engineering, 2009, pp. 373–382.
[9]
N. Kaushik and L. Tahvildari, “A comparative study of the performance of ir models on duplicate bug detection,” in Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR), 2012, pp. 159 –168.
[10]
M. Harman, P. McMinn, J. Souza, and S. Yoo, “Search based software engineering: Techniques, taxonomy, tutorial,” in Empirical Software Engineering and Verification, ser. Lecture Notes in Computer Science, B. Meyer and M. Nordio, Eds., 2012, vol. 7007, pp. 1–59.
[11]
O. Gospodnetic, E. Hatcher, and M. McCandless, Lucene in Action, 2nd ed. Manning Publications Co., 2010.
[12]
C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press Cambridge, 2008, vol. 1.
[13]
. Available: http://books.google.ca/books?id=NLOYMQEACAAJ
[14]
A. Eiben, Z. Michalewicz, M. Schoenauer, and J. Smith, Parameter Setting in Evolutionary Algorithms, ser. Studies in Computational Intelligence. Springer Berlin Heidelberg, 2007, vol. 54, ch. Parameter Control in Evolutionary Algorithms, pp. 19–46.
[15]
C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in Proceedings of International Conference on Automated Software Engineering, 2011, pp. 253–262.
[16]
A. E. Eiben and S. K. Smit, Autonomous Search. Springer, 2012, ch. Evolutionary Algorithm Parameters and Methods to Tune Them, pp. 15–36.
[17]
D. Luenberger and Y. Ye, Linear and nonlinear programming. Springer, 2008, vol. 116.
[18]
J. Parapar, M. Vidal, and J. Santos, “Finding the best parameter setting: Particle swarm optimisation,” in Proceedings of Spanish Conference on Information Retrieval, 2012, pp. 49–60.
[19]
C. Zhai and J. Lafferty, “A study of smoothing methods for language models applied to ad hoc information retrieval,” in Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001, pp. 334–342.
[20]
Lucene-Java Wiki, “Scores as percentages,” http://wiki.apache.org/ lucene-java/ScoresAsPercentages, 2009.
[21]
M. Shepperd and S. MacDonell, “Evaluating prediction systems in software project estimation,” Journal of Information & Software Technology, vol. 54, no. 8, pp. 820––827, 2012.
[22]
A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, D. Lo, and C. Sun, “Duplicate bug report detection with a combination of information retrieval and topic modeling,” in Proceedings of International Conference on Automated Software Engineering, 2012, pp. 70–79.
[23]
Y. Tian, C. Sun, and D. Lo, “Improved duplicate bug report identification,” in Proceedings of European Conference on Software Maintenance and Reengineering (CSMR), 2012, pp. 385–390.
[24]
M. Harman, “The relationship between search based software engineering and prodictive modeling,” in Proceedings of the 6th International Conference on Predictive Models in Software Engineering, 2010, pp. 1–13.

Cited By

View all
  • (2019)Duplicate Pull Request DetectionProceedings of the 11th Asia-Pacific Symposium on Internetware10.1145/3361242.3361254(1-10)Online publication date: 28-Oct-2019
  • (2017)Data-driven application maintenanceProceedings of the 4th International Workshop on Software Engineering Research and Industrial Practice10.1109/SER-IP.2017..8(48-54)Online publication date: 20-May-2017
  • (2015)OSSMETER: a software measurement platform for automatically analysing open source software projectsProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering10.1145/2786805.2803186(970-973)Online publication date: 30-Aug-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
MSR '13: Proceedings of the 10th Working Conference on Mining Software Repositories
May 2013
438 pages
ISBN:9781467329361

Publisher

IEEE Press

Publication History

Published: 18 May 2013

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)13
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Duplicate Pull Request DetectionProceedings of the 11th Asia-Pacific Symposium on Internetware10.1145/3361242.3361254(1-10)Online publication date: 28-Oct-2019
  • (2017)Data-driven application maintenanceProceedings of the 4th International Workshop on Software Engineering Research and Industrial Practice10.1109/SER-IP.2017..8(48-54)Online publication date: 20-May-2017
  • (2015)OSSMETER: a software measurement platform for automatically analysing open source software projectsProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering10.1145/2786805.2803186(970-973)Online publication date: 30-Aug-2015
  • (2014)Generating duplicate bug datasetsProceedings of the 11th Working Conference on Mining Software Repositories10.1145/2597073.2597128(392-395)Online publication date: 31-May-2014

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media