[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

ChangeLocator: locate crash-inducing changes based on crash reports

Published: 01 October 2018 Publication History

Abstract

Software crashes are severe manifestations of software bugs. Debugging crashing bugs is tedious and time-consuming. Understanding software changes that induce a crashing bug can provide useful contextual information for bug fixing and is highly demanded by developers. Locating the bug inducing changes is also useful for automatic program repair, since it narrows down the root causes and reduces the search space of bug fix location. However, currently there are no systematic studies on locating the software changes to a source code repository that induce a crashing bug reflected by a bucket of crash reports. To tackle this problem, we first conducted an empirical study on characterizing the bug inducing changes for crashing bugs (denoted as crash-inducing changes). We also propose ChangeLocator, a method to automatically locate crash-inducing changes for a given bucket of crash reports. We base our approach on a learning model that uses features originated from our empirical study and train the model using the data from the historical fixed crashes. We evaluated ChangeLocator with six release versions of Netbeans project. The results show that it can locate the crash-inducing changes for 44.7%, 68.5%, and 74.5% of the bugs by examining only top 1, 5 and 10 changes in the recommended list, respectively. It significantly outperforms the existing state-of-the-art approach.

References

[1]
Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007. IEEE, Piscataway, pp 89-98.
[2]
Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comput Sci 2(9):735-739.
[3]
An L, Khomh F (2015) An empirical study of crash-inducing commits in mozilla firefox. In: Proceedings of the 11th international conference on predictive models and data analytics in software engineering. ACM, New York, p 5.
[4]
An L, Khomh F, Guéhéneuc Y-G (2017) An empirical study of crash-inducing commits in mozilla firefox. Softw Qual J, 1-32.
[5]
Arcuri A, Yao X (2008) A novel co-evolutionary approach to automatic software bug fixing. In: 2008 IEEE Congress on evolutionary computation (IEEE world congress on computational intelligence). IEEE, Piscataway.
[6]
Artzi S, Kim S, Ernst MD (2008) Recrash: making software failures reproducible by preserving object states. In: European conference on object-oriented programming, vol 8, pp 542-565.
[7]
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter 6(1):20-29.
[8]
Bell J, Sarda N, Kaiser G (2013) Chronicler: lightweight recording to reproduce field failures. In: Proceedings of the 2013 international conference on software engineering. IEEE press, Piscataway, pp 362-371.
[9]
Bug report list (2015) [online]. Available: https://bugzilla.mozilla.org/buglist.cgi?longdesc=regression%20range&longdesc_type=casesubstring&query_format=advanced&short_desc=crash&short_desc_type=allwordssubstr&order=bug_status%2cpriority%2cassigned_to%2cbug_id&limit=0.
[10]
Cao Y, Zhang H, Ding S (2014) Symcrash: selective recording for reproducing crashes. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering. ACM, New York, pp 791-802.
[11]
Mozilla crash reports (2015) [online]. Available: http://crashstats.mozilla.com.
[12]
da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan A (2017) A framework for evaluating the resultsof the szz approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641-657.
[13]
Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) Rebucket: a method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 34th international conference on software engineering. IEEE press, Piscataway, pp 1084-1093.
[14]
Dit B, Revelle M, Gethers M, Poshyvanyk D (2013) Feature location in source code: a taxonomy and survey. Journal of software: Evolution and Process 25(1):53-95.
[15]
Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: ten years of implementation and experience. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles. ACM, New York, pp 103-116.
[16]
Jin W, Orso A (2012) Bugredux: reproducing field failures for in-house debugging. In: Proceedings of the 34th international conference on software engineering. IEEE, Piscataway, pp 474-484.
[17]
Jones JA, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering. ACM, New York, pp 467-477.
[18]
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757-773.
[19]
Kim S, Pan K, Whitehead EJ Jr (2006) Micro pattern evolution. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, New York, pp 40-46.
[20]
Kim S, Zimmermann T, Pan K, James E Jr et al (2006) Automatic identification of bug-introducing changes. In: Proceedings of the 21st IEEE/ACM international conference on automated software engineering. IEEE, Piscataway, pp 81-90.
[21]
Kim S, Zimmermann T, Whitehead EJ Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering. IEEE computer society, Washington, pp 489-498.
[22]
Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181-196.
[23]
Kim D, Wang X, Kim S, Zeller A, Cheung S-C, Park S (2011) Which crashes should i fix first?: predicting top crashes at an early stage to prioritize debugging efforts. IEEE Trans Softw Eng 37(3):430-447.
[24]
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the 33rd international conference on software engineering. IEEE, Piscataway, pp 481-490.
[25]
Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: an aggregated view of multiple crashes to improve crash triage. In: 2011 IEEE/IFIP 41St international conference on dependable systems & networks. IEEE, Piscataway, pp 486-493.
[26]
Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Data preprocessing for supervised leaning. Int J Comput Sci 1(2):111-117.
[27]
Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 1:38.
[28]
Liblit B, Aiken A, Zheng AX, Jordan MI (2003) Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 conference on programming language design and implementation. ACM, New York, pp 141-154.
[29]
Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: ACM SIGPLAN Notices, vol 40, no 6. ACM, New York, pp 15-26.
[30]
Mani I, Zhang I (2003) knn approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets.
[31]
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other, the annals of mathematical statistics.
[32]
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge.
[33]
Moran K, Linares-Vásquez M, Bernal-Cárdenas C, Vendome C, Poshyvanyk D (2016) Automatically discovering, reporting and reproducing android application crashes. In: 2016 IEEE international conference on software testing, verification and validation. IEEE, Piscataway, pp 33-44.
[34]
Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: 2014 IEEE International conference on software maintenance and evolution. IEEE, Piscataway, pp 151-160.
[35]
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering. ACM, New York, pp 181-190.
[36]
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering. IEEE, Piscataway, pp 284-292.
[37]
Nallapati R (2004) Discriminative models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 64-71.
[38]
Netbeans bugzilla (2015) [online]. Available: https://netbeans.org/bugzilla.
[39]
Netbeans exception reports (2015) [online]. Available: http://statistics.netbeans.org/analytics/list.do?query.
[40]
Netbeans report exception faqs (2015) [online]. Available: http://wiki.netbeans.org/usecases.
[41]
Netbeans source code repository (2015) [online]. Available: http://hg.netbeans.org.
[42]
Technical note tn2123: Crashreporter (2015) [online]. Available: developer.apple.com/library/mac/#technotes/tn2004/tn2123.html.
[43]
Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers? In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, New York, pp 199-209.
[44]
Prati RC, Batista GE, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI 2004: advances In artificial intelligence, vol 4, pp 312-321.
[45]
Regression range (2015) [online]. Available: https://wiki.mozilla.org/firefox OS/performance/bisecting_regressions.
[46]
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th working conference on mining software repositories. ACM, New York, pp 43-52.
[47]
Robertson SE, Jones KS (1976) Relevance weighting of search terms. Journal of the Association for Information Science and Technology 27(3):129-146.
[48]
Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: 2013 IEEE/ACM 28Th international conference on automated software engineering. IEEE, Piscataway, pp 345-355.
[49]
Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs? In: 2010 7Th IEEE working conference on mining software repositories. IEEE, Piscataway, pp 118-121.
[50]
Seo H, Kim S (2012) Predicting recurring crash stacks. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering. ACM, New York, pp 180-189.
[51]
Sliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? ACM sigsoft software engineering notes 30(4):1-5.
[52]
Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 11-18.
[53]
Venkatesh GA (1991) The semantic approach to program slicing. In: ACM SIGPLAN Notices, vol 26, no 6. ACM, New York, pp 107-119.
[54]
Wang S, Lo D (2014) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension. ACM, New York, pp 53-63.
[55]
Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of the 2015 international symposium on software testing and analysis. ACM, New York, pp 1-11.
[56]
Wang S, Khomh F, Zou Y (2016) Improving bug management using correlations in crash reports. Empir Softw Eng 21(2):337-367.
[57]
Weimer W, Forrest S, Le Goues C, Nguyen T (2010) Automatic program repair with evolutionary computation. Commun ACM 53(5):109-116.
[58]
Weka (2016) [online]. Available: http://www.cs.waikato.ac.nz/ml/weka.
[59]
Wen M, Wu R, Cheung S-C (2016) Locus: locating bugs from software changes. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, New York.
[60]
White M, Linares-Vásquez M, Johnson P, Bernal-Cárdenas C, Poshyvanyk D (2015) Generating reproducible and replayable bug reports from android application crashes. In: 2015 IEEE 23Rd international conference on program comprehension. IEEE, Piscataway, pp 48-59.
[61]
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington.
[62]
Wong C-P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: 2014 IEEE international conference on software maintenance and evolution. IEEE, Piscataway, pp 181-190.
[63]
Wu R, Zhang H, Kim S, Cheung S-C (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering. ACM, New York, pp 15-25.
[64]
Wu R, Zhang H, Cheung S-C, Kim S (2014) Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 international symposium on software testing and analysis. ACM, New York, pp 204-214.
[65]
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, pp 689-699.
[66]
Zeller A (1999) Yesterday, my program worked. today, it does not. why? In: ACM SIGSOFT Software engineering notes, vol 24, no 6. Springer, Berlin, pp 253-267.
[67]
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th international conference on software engineering. IEEE, Piscataway, pp 14-24.

Cited By

View all
  • (2024)SymBisectProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699040(2493-2510)Online publication date: 14-Aug-2024
  • (2024)How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux KernelCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663828(62-73)Online publication date: 10-Jul-2024
  • (2024)The Impact Of Bug Localization Based on Crash Report Mining: A Developers' PerspectiveProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639730(13-24)Online publication date: 14-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 23, Issue 5
October 2018
551 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2018

Author Tags

  1. Bug localization
  2. Crash stack
  3. Crash-inducing change
  4. Software crash

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SymBisectProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699040(2493-2510)Online publication date: 14-Aug-2024
  • (2024)How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux KernelCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663828(62-73)Online publication date: 10-Jul-2024
  • (2024)The Impact Of Bug Localization Based on Crash Report Mining: A Developers' PerspectiveProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639730(13-24)Online publication date: 14-Apr-2024
  • (2024)Just-in-Time crash prediction for mobile appsEmpirical Software Engineering10.1007/s10664-024-10455-729:3Online publication date: 8-May-2024
  • (2023)Pre-training Code Representation with Semantic Flow Graph for Effective Bug LocalizationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616338(579-591)Online publication date: 30-Nov-2023
  • (2023)Code-line-level Bugginess Identification: How Far have We Come, and How Far have We Yet to Go?ACM Transactions on Software Engineering and Methodology10.1145/358257232:4(1-55)Online publication date: 27-May-2023
  • (2023)Generic and robust root cause localization for multi-dimensional data in online service systemsJournal of Systems and Software10.1016/j.jss.2023.111748203:COnline publication date: 13-Jul-2023
  • (2023)Utilizing source code syntax patterns to detect bug inducing commits using machine learning modelsSoftware Quality Journal10.1007/s11219-022-09611-331:3(775-807)Online publication date: 1-Sep-2023
  • (2023)BTLink : automatic link recovery between issues and commits based on pre-trained BERT modelEmpirical Software Engineering10.1007/s10664-023-10342-728:4Online publication date: 12-Jul-2023
  • (2023)The impact of class imbalance techniques on crashing fault residence prediction modelsEmpirical Software Engineering10.1007/s10664-023-10294-y28:2Online publication date: 22-Feb-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media