Abstract
Fault localization is useful for reducing debugging effort. Such techniques require test cases with oracles, which can determine whether a program behaves correctly for every test input. Although most fault localization techniques can localize faults relatively accurately even with a small number of test cases, choosing the right test cases and creating oracles for them are not easy. Test oracle creation is expensive because it can take much manual labeling effort (i.e., effort needed to decide whether the test cases pass or fail). Given a number of test cases to be executed, it is challenging to minimize the number of test cases requiring manual labeling and in the meantime achieve good fault localization accuracy. To address this challenge, this paper presents a novel test case selection strategy based on Diversity Maximization Speedup (Dms). Dms orders a set of unlabeled test cases in a way that maximizes the effectiveness of a fault localization technique. Developers are only expected to label a much smaller number of test cases along this ordering to achieve good fault localization results. We evaluate the performance of Dms on 2 different types of programs, single-fault and multi-fault programs. Our experiments with 411 faults from the Software-artifact Infrastructure Repository show (1) that Dms can help existing fault localization techniques to achieve comparable accuracy with on average 67 and 6 % fewer labeled test cases than previously best test case prioritization techniques for single-fault and multi-fault programs, and (2) that given a labeling budget (i.e., a fixed number of labeled test cases), Dms can help existing fault localization techniques reduce their debugging cost (in terms of the amount of code needed to be inspected to locate faults). We conduct hypothesis test and show that the saving of the debugging cost we achieve for the real C programs are statistically significant.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Program elements with the same suspiciousness score are assigned the same low rank since developers are expected to investigate all of the elements having the same score if they are ever going to investigate one. For example, if statements \(s_{1}\), \(s_{2}\), \(s_{3}\) have the highest suspiciousness score, then the ranks of the 3 statements are all 3.
We call such groups as suspicious groups since we simply want to state the fact that every group may contain potentially suspicious elements. Some other studies (e.g. González-Sanchez et al. 2011a) call them ambiguity groups as that term may emphasize more on the fact that the elements in the groups have the same but ambiguous suspiciousness scores.
If there is more than one test that fails, Dms randomly selects one of them to begin with.
There are a number of alternative ways to define diagnostic costs and accuracies, such as using the number of faults located when up to a certain percentage of program elements are inspected (e.g. Wong et al. 2014; Debroy and Wong 2013; Cleve and Zeller 2005; Baah et al. 2010; Jones et al. 2002; Lucia et al. 2014), or assuming a random ordering for elements with the same score and incorporating their expected rank \(\frac{\left| \{j\ |\ f_{T_{\mathcal {S}}}(d_j) = f_{T_{\mathcal {S}}}(d_*)\}\right| + 1}{2}\) in calculating cost (e.g. Ali et al. 2009; Steimann et al. 2013; Steimann and Frenkel 2012; Xu et al. 2011). We use the one in Eq. 7 and 8 as they are commonly used and easy to understand, and our focus is on evaluating whether different test case prioritization techniques change the diagnostic costs, instead of measuring the absolute costs. We believe that if our technique shows significant improvements over one kind of diagnostic cost, it should also show improvements over other kinds of diagnostic cost.
Fnr is the program passing rate when program element is the real fault and executed in test case. Usually when Fnr is high, the fault is difficult to be detected by Spectrum-based fault localization techniques.
References
Abreu, R., Zoeteweij, P., Golsteijn, R., van Gemund, A.: A practical evaluation of spectrum-based fault localization. J. Syst. Softw. 82(11), 1780–1792 (2009)
Ali, S., Andrews, J., Dhandapani, T., Wang, W.: Evaluating the accuracy of fault localization techniques. In: IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 76–87 (2009)
Arcuri, A., Briand, L.C .: Adaptive random testing: an illusion of effectiveness? In: ISSTA, pp. 265–275 (2011)
Artzi, S., Dolby, J., Tip, F., Pistoia, M.: Directed test generation for effective fault localization. In: ISSTA, pp. 49–60 (2010)
Baah, G.K., Podgurski, A., Harrold, M.J.: Causal inference for statistical fault localization. In: ISSTA, pp. 73–84 (2010)
Baah, G.K., Podgurski, A., Harrold, M.J.: Mitigating the confounding effects of program dependences for effective fault localization. In: SIGSOFT FSE, pp. 146–156 (2011)
Baudry, B., Fleurey, F., Traon, Y.L.: Improving test suites for efficient fault localization. In: ICSE, pp. 82–91 (2006)
Beizer, B.: Software Testing Techniques, 2nd edn. International Thomson Computer Press, Boston (1990)
Bowring, J.F., Rehg, J.M., Harrold, M.J.: Active learning for automatic classification of software behavior. In: ISSTA, pp. 195–205 (2004)
Campos, J., Abreu, R., Fraser, G., d’Amorim, M.: Entropy-based test generation for improved fault localization. In: IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), IEEE, pp. 257–267 (2013)
Cleve, H., Zeller, A.: Locating causes of program failures. In: ICSE (2005)
Debroy, V., Wong, W.E.: A consensus-based strategy to improve the quality of fault localization. Softw. Pract. Exp. 43(8), 989–1011 (2013)
Do, H., Elbaum, S.G., Rothermel, G.: Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact. Empir. Softw. Eng. 10(4), 405–435 (2005)
Elbaum, S., Malishevsky, A.G., Rothermel, G.: Test case prioritization: a family of empirical studies. IEEE TSE 28, 159–182 (2002)
Godefroid, P., Klarlund, N., Sen, K.: Dart: Directed automated random testing. In: PLDI, pp. 213–223 (2005)
González-Sanchez, A., Abreu, R., Groß, H.G., van Gemund, A.J.C.: Prioritizing tests for fault localization through ambiguity group reduction. In: ASE, pp. 83–92 (2011a)
González-Sanchez, A., Piel, É., Abreu, R., Groß, H.G., van Gemund, A.J.C.: Prioritizing tests for software fault diagnosis. Softw. Pract. Exper. 41(10), 1105–1129 (2011b)
Graybill, F.A., Iyer, H.K.: Regression Analysis: Concepts and Applications. Duxbury Press, Belmont (1994)
Hamlet, R.: Testing programs with the aid of a compiler. IEEE TSE 3(4), 279–290 (1977)
Jiang, B., Zhang, Z., Chan, W.K., Tse, T.H.: Adaptive random test case prioritization. In: ASE, pp, 233–244 (2009)
Jiang, B., Chan, W.K., Tse, T.H.: On practical adequate test suites for integrated test case prioritization and fault localization. In: QSIC, pp. 21–30 (2011)
Jiang, L., Su, Z.: Context-aware statistical debugging: from bug predictors to faulty control flow paths. In: IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 184–193 (2007)
Jones, J., Harrold, M.: Empirical evaluation of the tarantula automatic fault-localization technique. In: ASE (2005)
Jones, J., Harrold, M., Stasko, J.: Visualization of test information to assist fault detection. ICSE, pp. 467–477. Orlando, FL (2002)
Li, Z., Harman, M., Hierons, R.: Search algorithms for regression test case prioritization. IEEE TSE 3, 225–237 (2007)
Liblit, B., Aiken, A., Zheng, A.X., Jordan, M.I.: Bug isolation via remote program sampling. In: PLDI, pp. 141–154 (2003)
Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI) (2005)
Liu, C., Yan, X., Fei, L., Han, J., Midkiff, S.P.: SOBER: Statistical model-based bug localization. In: ESEC/FSE (2005)
Lucia, Lo D., Jiang, L., Thung, F., Budi, A.: Extended comprehensive study of association measures for fault localization. J. Softw. Evolut. Process 26(2), 172–219 (2014)
Nainar, P.A., Chen, T., Rosin, J., Liblit, B.: Statistical debugging using compound boolean predicates. In: ISSTA, pp. 5–15 (2007)
Naish, L., Lee, H.J., Ramamohanarao, K.: A model for spectra-based software diagnosis. ACM TOSEM 20(3), 11:1–11:32 (2011)
National Institute of Standards and Technology (NIST): Software Errors Cost U.S. Economy \({\$}\)59.5 Billion Annually (2002)
Pacheco, C., Ernst, M.D.: Automatic generation and classification of test inputs. In: ECOOP, pp. 504–527 (2005)
Parnin, C., Orso, A.: Are automated debugging techniques actually helping programmers? In: ISSTA, pp. 199–209 (2011)
Renieris, M., Reiss, S.: Fault localization with nearest neighbor queries. In: ASE, pp. 141–154 (2003)
Rothermel, G., Untch, R.H., Chu, C., Harrold, M.J.: Prioritizing test cases for regression testing. In: IEEE TSE, pp. 929–948 (2001)
Santelices, R.A., Jones, J.A., Yu, Y., Harrold, M.J.: Lightweight fault-localization using multiple coverage types. In: ICSE, pp. 56–66 (2009)
Sen, K., Marinov, D., Agha, G.: Cute: a concolic unit testing engine for c. In: ESEC/SIGSOFT FSE, pp. 263–272 (2005)
Steimann, F., Frenkel, M.: Improving coverage-based localization of multiple faults using algorithms from integer linear programming. In: IEEE International Symposium on Software Reliability Engineering (ISSRE), pp. 121–130 (2012)
Steimann, F., Frenkel, M., Abreu, R.: Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators. In: International Symposium on Software Testing and Analysis (ISSTA), pp. 314–324 (2013)
Wilcoxon, F.: Individual comparisons by ranking methods. In: Biometrics, pp. 80–3 (1943)
Wong, W.E., Debroy, V., Gao, R., Li, Y.: The dstar method for effective software fault localization. IEEE Trans. Reliab. 63(1), 290–308 (2014)
Xie, T.: Augmenting automatically generated unit-test suites with regression oracle checking. In: ECOOP, pp. 380–403 (2006)
Xie, X., Chen, T.Y., Kuo, F.C., Xu, B.: A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM TOSEM 22(4), 31:1–31:40 (2013)
Xu, X., Debroy, V., Wong, W.E., Guo, D.: Ties within fault localization rankings: exposing and addressing the problem. Int. J. Softw. Eng. Knowl. Eng. 21(6), 803–827 (2011)
Zeller, A.: Isolating cause-effect chains from computer programs. In: FSE, pp. 1–10, doi:10.1145/587051.587053 (2002)
Zhang, L., Hao, D., Zhang, L., Rothermel, G., Mei, H.: Bridging the gap between the total and additional test-case prioritization strategies. In: ICSE, IEEE Press, pp. 192–201 (2013)
Acknowledgments
This work is partially supported by NSFC Program (Nos. 61073006 and 61103032), Tsinghua University project 2010THZ0, and National Key Technology R&D Program of the Ministry of Science and Technology of China (No. 2013BAH01B03). We thank researchers at University of Nebraska–Lincoln, Georgia Tech, and Siemens Corporate Research for the Software-artifact Infrastructure Repository. We would also like to thank the anonymous reviewers for providing us with constructive comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was done while the author was visiting Singapore Management University.
Xin Xia and Liang Gong have contribute equally for this work.
Rights and permissions
About this article
Cite this article
Xia, X., Gong, L., Le, TD.B. et al. Diversity maximization speedup for localizing faults in single-fault and multi-fault programs. Autom Softw Eng 23, 43–75 (2016). https://doi.org/10.1007/s10515-014-0165-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-014-0165-z