Abstract
Statistical fault localization is an easily deployed technique for quickly determining candidates for faulty code locations. If a human programmer has to search the fault beyond the top candidate locations, though, more traditional techniques of following dependencies along dynamic slices may be better suited. In a large study of 457 bugs (369 single faults and 88 multiple faults) in 46 open source C programs, we compare the effectiveness of statistical fault localization against dynamic slicing. For single faults, we find that dynamic slicing was eight percentage points more effective than the best performing statistical debugging formula; for 66% of the bugs, dynamic slicing finds the fault earlier than the best performing statistical debugging formula. In our evaluation, dynamic slicing is more effective for programs with single fault, but statistical debugging performs better on multiple faults. Best results, however, are obtained by a hybrid approach: If programmers first examine at most the top five most suspicious locations from statistical debugging, and then switch to dynamic slices, on average, they will need to examine 15% (30 lines) of the code. These findings hold for 18 most effective statistical debugging formulas and our results are independent of the number of faults (i.e. single or multiple faults) and error type (i.e. artificial or real errors).
Similar content being viewed by others
Notes
The scores for the faulty statement in Line 8 are tarantula\((s_{8})=\frac {1}{1}/\left (\frac {1}{1}+\frac {1}{5}\right )\), ochiai\((s_{8})=\frac {1}{\sqrt {1(1+1)}}\), and naish2\((s_{8})=1 - \frac {1}{1+4+1}\).
In ordinal ranking, lines with the same score are ranked by line number.
This is to avoid duplication of inspected statements, i.e. avoid double inspection.
The executable statements refers to statements for which coverage information are obtainable by Gcov, in particular, all program statements except spaces, blanks and comments.
To the best of our knowledge, there is no known benchmark of real-world programs containing multiple faults.
Note that all executable program statements are ranked in the suspiciousness rank, executable statements that are not contained in the dynamic slice are ranked lowest.
Ranking ties are broken in ascending order, i.e. if both lines 10 and 50 have the same score, then line number 10 is ranked above line number 50.
In this case, when several statements have the same rank as the faulty statement, we made the conservative assumption that a developer finds the faulty statement among other statements with the same rank.
Further evaluations (on single faults) in this paper use Kulczynski2 as the default “statistical debugging” formula.
Percentage improvement is measured as \(\frac {1-0.794}{1-0.737}\). Note that score by itself gives the number of statements that need not be examined.
Further evaluations of the hybrid approach use the best values of N (i.e. N = 2 and N = 5).
The SIR benchmark is the most used subject for the evaluation of AFL techniques, especially statistical fault localization (Wong et al. 2016).
Table 4 shows that Tarantula and Kulczynski2 performed best on small programs with effectiveness scores 0.76 and 0.70 for IntroClass and Codeflaws, respectively. Despite the fact that DStar performs best on large programs (i.e. CoREBench) by slightly outperforming Tarantula and Kulczynski2 (0.80 vs. 0.79); it performed significantly worse than Tarantula and Kulczynski2 on small programs, with effectiveness score 0.62 and 0.56 for IntroClass and Codeflaws, respectively.
References
Abreu R, Zoeteweij P, van Gemund AJC (2006) An evaluation of similarity coefficients for software fault localization. In: Proceedings of the 12th Pacific Rim international symposium on dependable computing, PRDC’06, pp 39–46
Abreu R, Zoeteweij P, van Gemund AJC (2007) On the accuracy of spectrum-based fault localization. In: Proceedings of the testing: academic and industrial conference practice and research techniques—MUTATION, TAICPART-MUTATION ’07, pp 89–98
Abreu R, Zoeteweij P, Golsteijn R, van Gemund A J C (2009a) A practical evaluation of spectrum-based fault localization. J Syst Softw 82 (11):1780–1792
Abreu R, Zoeteweij P, Van Gemund AJ (2009b) Spectrum-based multiple fault localization. In: 2009 IEEE/ACM international conference on automated software engineering. IEEE, pp 88–99
Agrawal H, Horgan JR (1990) Dynamic program slicing. In: Proceedings of the ACM SIGPLAN 1990 conference on programming language design and implementation, PLDI ’90, pp 246–256
Agrawal H, DeMillo R A, Spafford E H (1990) Efficient debugging with slicing and backtracking. Tech. rep., Purdue University
Agrawal H, Horgan JR, Krauser EW, London S (1993) Incremental regression testing. In: Proceedings of the conference on software maintenance, ICSM ’93, pp 348–357
Alves E, Gligoric M, Jagannath V, d’Amorim M (2011) Fault-localization using dynamic slicing and change impact analysis. In: 2011 26th IEEE/ACM international conference on automated software engineering (ASE 2011). IEEE, pp 520–523
Assiri F Y, Bieman J M (2017) Fault localization for automated program repair: effectiveness, performance, repair correctness. Software Quality Journal 25 (1):171–199. Springer
Binkley D, Gold N, Harman M (2007) An empirical study of static program slice size. ACM Trans Softw Eng Methodol 16(2)
Böhme M, Roychoudhury A (2014) Corebench: studying complexity of regression errors. In: Proceedings of the 23rd ACM/SIGSOFT international symposium on software testing and analysis, ISSTA, pp 105–115
Böhme M, Soremekun EO, Chattopadhyay S, Ugherughe E, Zeller A (2017) Where is the bug and how is it fixed? An experiment with practitioners. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 117–128
Burger M, Zeller A (2011) Minimizing reproduction of software failures. In: Proceedings of the 2011 international symposium on software testing and analysis, ISSTA ’11, pp 221–231
Chen MY, Kiciman E, Fratkin E, Fox A, Brewer E (2002) Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of the 2002 international conference on dependable systems and networks, DSN ’02, pp 595–604
Christi A, Olson M L, Alipour M A, Groce A (2018) Reduce before you localize: delta-debugging and spectrum-based fault localization. In: 2018 IEEE international symposium on software reliability engineering workshops. IEEE, ISSREW, pp 184–191
DiGiuseppe N, Jones JA (2011) On the influence of multiple faults on coverage-based fault localization. In: Proceedings of the 2011 international symposium on software testing and analysis, pp 210–220
Gopinath R, Kampmann A, Havrikov N, Soremekun EO, Zeller A (2020) Abstracting failure-inducing inputs. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, pp 237–248
Grissom R, Kim J (2005) Effect sizes for research: a broad practical approach. Lawrence Erlbaum
Guo A, Mao X, Yang D, Wang S (2018) An empirical study on the effect of dynamic slicing on automated program repair efficiency. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 554–558
Gyimóthy T, Beszédes A, Forgács I (1999) An efficient relevant slicing method for debugging. In: Proceedings of the 7th European software engineering conference held jointly with the 7th ACM SIGSOFT international symposium on foundations of software engineering, ESEC/FSE-7, pp 303–321
Hammoudi M, Burg B, Bae G, Rothermel G (2015) On the use of delta debugging to reduce recordings and facilitate debugging of web applications. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ESEC/FSE 2015. https://doi.org/10.1145/2786805.2786846. ACM, New York, pp 333–344
Heiden S, Grunske L, Kehrer T, Keller F, van Hoorn A, Filieri A, Lo D (2019) An evaluation of pure spectrum-based fault localization techniques for large-scale software systems. Softw: Pract Exp 49(8):1197–1224
Hofer B, Wotawa F (2012) Spectrum enhanced dynamic slicing for better fault localization. In: ECAI, vol 12, pp 420–425
Hutchins M, Foster H, Goradia T, Ostrand T (1994) Experiments on the effectiveness of dataflow-and control-flow-based test adequacy criteria. In: Proceedings of 16th international conference on software engineering. IEEE, pp 191–200
Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM international conference on automated software engineering, ASE ’05, pp 273–282
Jones JA, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering, ICSE ’02, pp 467–477
Jose M, Majumdar R (2011) Cause clue clauses: error localization using maximum satisfiability. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation, PLDI ’11. https://doi.org/10.1145/1993498.1993550. ACM, New York, pp 437–446
Kampmann A, Havrikov N, Soremekun EO, Zeller A (2020) When does my program do this? Learning circumstances of software behavior. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 1228–1239
Keller F, Grunske L, Heiden S, Filieri A, van Hoorn A, Lo D (2017) A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In: 2017 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 114–125
Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 802–811
Kirschner L, Soremekun E, Zeller A (2020) Debugging inputs. In: 2020 IEEE/ACM 42nd international conference on software engineering (ICSE). IEEE
Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176
Korel B, Laski J (1988) Dynamic program slicing. Inf Process Lett 29(3):155–163
Landsberg D (2016) Methods and measures for statistical fault localisation. PhD thesis, University of Oxford
Landsberg D, Chockler H, Kroening D, Lewis M (2015) Evaluation of measures for statistical fault localisation and an optimising scheme. In: International conference on fundamental approaches to software engineering. Springer, pp 115–129
Le TDB, Lo D, Le Goues C, Grunske L (2016) A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th international symposium on software testing and analysis, pp 177–188
Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72
Le Goues C, Holtschulte N, Smith E K, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The manybugs and introclass benchmarks for automated repair of c programs. IEEE Trans Softw Eng 41(12):1236– 1256
Lei Y, Mao X, Dai Z, Wang C (2012) Effective statistical fault localization using program slices. In: 2012 IEEE 36th annual computer software and applications conference. IEEE, pp 1–10
Li X, Orso A (2020) More accurate dynamic slicing for better supporting software debugging. In: 2020 IEEE 13th international conference on software testing, validation and verification (ICST). IEEE, pp 28–38
Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, PLDI ’05, pp 15–26
Lin Y, Sun J, Tran L, Bai G, Wang H, Dong J (2018) Break the dead end of dynamic slicing: localizing data and control omission bug. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, pp 509–519
Liu B, Lucia Nejati S, Briand L C, Bruckmann T (2016) Simulink fault localization: an iterative statistical debugging approach. Softw Test Verif Reliab 26(6):431–459
Liu B, Nejati S, Briand L C, et al. (2017) Improving fault localization for simulink models using search-based testing and prediction models. In: 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 359–370
Mann H B, Whitney D R (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50–60
Naish L, Lee HJ (2013) Duals in spectral fault localization. In: 2013 22nd Australian software engineering conference. IEEE, pp 51–59
Naish L, Lee H J, Ramamohanarao K (2011) A model for spectra-based software diagnosis. ACM Trans Softw Eng Methodol (TOSEM) 20(3):11
Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) Semfix: program repair via semantic analysis. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 772–781
Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 international symposium on software testing and analysis, ISSTA ’11. https://doi.org/10.1145/2001420.2001445. ACM, New York, pp 199–209
Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst M D, Pang D, Keller B (2017) Evaluating and improving fault localization. In: 2017 IEEE/ACM 39th International conference on software engineering (ICSE). IEEE, pp 609–620
Perez A, Abreu R, d’Amorim M (2017) Prevalence of single-fault fixes and its impact on fault localization. In: 2017 IEEE international conference on software testing, verification and validation (ICST). IEEE, pp 12–22
Qi D, Nguyen H D T, Roychoudhury A (2013) Path exploration based on symbolic output. ACM Trans Softw Eng Methodol 22(4):32:1–32:41
Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th international conference on software engineering, ICSE 2014, pp 254– 265
Reis S, Abreu R, d’Amorim M (2019) Demystifying the combination of dynamic slicing and spectrum-based fault localization. In: IJCAI, pp 4760–4766
Renieris M, Reiss SP (2003) Fault localization with nearest neighbor queries. In: Proceedings of the 18th IEEE international conference on automated software engineering. IEEE, pp 30–39
Russel P F, Rao T R, et al. (1940) On habitat and association of species of anopheline larvae in south-eastern Madras. Journal of the Malaria Institute of India 3:153–178
Shu T, Wang L, Xia J (2017) Fault localization using a failed execution slice. In: 2017 International conference on software analysis, testing and evolution (SATE). IEEE, pp 37–44
Steimann F, Frenkel M, Abreu R (2013) Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators. In: Proceedings of the 2013 international symposium on software testing and analysis, ISSTA 2013, pp 314–324
Tan S H, Yi J, Mechtaev S, Roychoudhury A, et al. (2017) Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In: 2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C). IEEE, pp 180–182
Tip F (1995) A survey of program slicing techniques. J Program Lang 3(3):121–189
Weiser M (1981) Program slicing. In: Proceedings of the 5th international conference on software engineering, ICSE ’81, pp 439–449
Weiser M (1982) Programmers use slices when debugging. Commun ACM 25(7):446–452
Wong W E, Qi Y, Zhao L, Cai K Y (2007) Effective fault localization using code coverage. In: Computer software and applications conference, 2007. COMPSAC 2007. 31st annual international, vol 1. IEEE, pp 449–456
Wong WE, Debroy V, Li Y, Gao R (2012) Software fault localization using dstar (d*). In: 2012 IEEE sixth international conference on software security and reliability. IEEE, pp 21–30
Wong W E, Debroy V, Gao R, Li Y (2013) The dstar method for effective software fault localization. IEEE Trans Reliab 63(1):290–308
Wong W E, Gao R, Li Y, Abreu R, Wotawa F (2016) A survey on software fault localization. IEEE Trans Softw Eng p preprint
Wotawa F (2010) Fault localization based on dynamic slicing and hitting-set computation. In: 2010 10th international conference on quality software. IEEE, pp 161–170
Wotawa F, Stumptner M, Mayer W (2002) Model-based debugging or how to diagnose programs automatically. In: Proceedings of the 15th international conference on industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence, IEA/AIE ’02. http://dl.acm.org/citation.cfm?id=646864.708248. Springer, London, pp 746–757
Xie X, Chen T Y, Kuo F C, Xu B (2013a) A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Trans Softw Eng Methodol (TOSEM) 22(4):31
Xie X, Kuo FC, Chen TY, Yoo S, Harman M (2013b) Provably optimal and human-competitive results in sbse for spectrum based fault localisation. In: International symposium on search based software engineering. Springer, pp 224–238
Yang J, Zhikhartsev A, Liu Y, Tan L (2017) Better test cases for better automated program repair. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 831–841
Yoo S (2012) Evolving human competitive spectra-based fault localisation techniques. In: International symposium on search based software engineering. Springer, pp 244–258
Yu Y, Jones J, Harrold MJ (2008) An empirical study of the effects of test-suite reduction on fault localization. In: 2008 ACM/IEEE 30th international conference on software engineering. IEEE, pp 201–210
Zeller A, Hildebrandt R (2002) Simplifying and isolating failure-inducing input. IEEE Trans Softw Eng 28(2):183–200. https://doi.org/10.1109/32.988498
Zhang X, He H, Gupta N, Gupta R (2005) Experimental evaluation of using dynamic slices for fault location. In: Proceedings of the sixth international symposium on automated analysis-driven debugging, pp 33–42
Zhang X, Gupta N, Gupta R (2007) A study of effectiveness of dynamic slicing in locating real faults. Empir Softw Eng 12(2):143–160
Zhang M, Li X, Zhang L, Khurshid S (2017) Boosting spectrum-based fault localization using pagerank. In: Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis, pp 261–272
Zheng AX, Jordan MI, Liblit B, Aiken A (2003) Statistical debugging of sampled programs. In: Advances in neural information processing systems, pp 9–11
Zheng AX, Jordan MI, Liblit B, Naik M, Aiken A (2006) Statistical debugging: simultaneous identification of multiple bugs. In: Proceedings of the 23rd international conference on machine learning, pp 1105–1112
Zou D, Liang J, Xiong Y, Ernst MD, Zhang L (2019) An empirical study of fault localization families and their combinations. IEEE Trans Softw Eng 47(2, Feb. 1 2021):332–347
Acknowledgements
We thank the anonymous reviewers for their helpful comments. This work was funded by Deutsche Forschungsgemeinschaft, Project “Extracting and Mining of Probabilistic Event Structures from Software Systems (EMPRESS)”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Hadi Hemmati
Additional material
All of our scripts, tools, benchmarks, and results are freely available as an artifact, in order to support scrutiny, evaluation, reproduction, and extension: https://tinyurl.com/HybridFaultLocalization
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Soremekun, E., Kirschner, L., Böhme, M. et al. Locating faults with program slicing: an empirical analysis. Empir Software Eng 26, 51 (2021). https://doi.org/10.1007/s10664-020-09931-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-020-09931-7