[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2568225.2568271acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Coverage is not strongly correlated with test suite effectiveness

Published: 31 May 2014 Publication History

Abstract

The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did not account for the confounding influence of test suite size. In addition, most of the studies were done with adequate suites, which are are rare in practice, so the results may not generalize to typical test suites.
We have extended these studies by evaluating the relationship between test suite size, coverage, and effectiveness for large Java programs. Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness.
We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.

References

[1]
L. M. Adler. A modification of Kendall’s tau for the case of arbitrary ties in both rankings. Journal of the American Statistical Association, 52(277), 1957.
[2]
J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In Proc. of the Int’l Conf. on Soft. Eng., 2005.
[3]
J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Soft. Eng., 32(8), 2006.
[4]
Apache POI. http://poi.apache.org.
[5]
L. Briand and D. Pfahl. Using simulation for assessing the real impact of test coverage on defect coverage. In Proc. of the Int’l Symposium on Software Reliability Engineering, 1999.
[6]
X. Cai and M. R. Lyu. The effect of code coverage on fault detection under different testing profiles. In Proc. of the Int’l Workshop on Advances in Model-Based Testing, 2005.
[7]
Closure Compiler. https://code.google.com/p/closure-compiler/.
[8]
CodeCover. http://codecover.org/.
[9]
Coverlipse. http://coverlipse.sourceforge.net/.
[10]
M. Daran and P. Thévenod-Fosse. Software error analysis: a real case study involving real faults and mutations. In Proc. of the Int’l Symposium on Software Testing and Analysis, 1996.
[11]
K. El Emam, S. Benlarbi, N. Goel, and S. N. Rai. The confounding effect of class size on the validity of object-oriented metrics. IEEE Transactions on Soft. Eng., 27(7), 2001.
[12]
N. E. Fenton and N. Ohlsson. Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Soft. Eng., 26(8), 2000.
[13]
M. Fowler. Test coverage. http: //martinfowler.com/bliki/TestCoverage.html, 2012.
[14]
P. G. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. In Proc. of the Int’l Symposium on Foundations of Soft. Eng., 1998.
[15]
P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria. In Proc. of the Symposium on Testing, Analysis, and Verification, 1991.
[16]
P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Transactions on Soft. Eng., 19(8), 1993.
[17]
P. G. Frankl, S. N. Weiss, and C. Hu. All-uses vs mutation testing: an experimental comparison of effectiveness. Journal of Systems and Software, 38(3), 1997.
[18]
J. D. Gibbons. Nonparametric Measures of Association. Sage Publications, 1993.
[19]
M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proc. of the Int’l Symp. on Soft. Testing and Analysis, 2013.
[20]
R. Gopinath, C. Jenson, and A. Groce. Code coverage for suite evaluation by developers. In Proc. of the Int’l Conf. on Soft. Eng., 2014.
[21]
J. P. Guilford. Fundamental Statistics in Psychology and Education. McGraw-Hill, 1942.
[22]
K. Hayhurst, D. Veerhusen, J. Chilenski, and L. Rierson. A practical tutorial on modified condition/decision coverage. Technical report, NASA Langley Research Center, 2001.
[23]
HSQLDB. http://hsqldb.org.
[24]
M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In Proc. of the Int’l Conf. on Soft. Eng., 1994.
[25]
JFreeChart. http://jfree.org/jfreechart.
[26]
Joda Time. http://joda-time.sourceforge.net.
[27]
R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? Technical Report UW-CSE-14-02-02, University of Washington, March 2014.
[28]
K. Kapoor. Formal analysis of coupling hypothesis for logical faults. Innovations in Systems and Soft. Eng., 2(2), 2006.
[29]
E. Kit. Software Testing in the Real World: Improving the Process. ACM Press, 1995.
[30]
B. Marick. How to misuse code coverage. http://www. exampler.com/testing-com/writings/coverage.pdf, 1997.
[31]
A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proc. of the Int’l Symposium on Software Testing and Analysis, 2009.
[32]
A. J. Offutt. Investigations of the software testing coupling effect. ACM Transactions on Soft. Eng. and Methodology, 1(1), 1992.
[33]
A. J. Offutt and J. Pan. Detecting equivalent mutants and the feasible path problem. In Proc. of the Conf. on Computer Assurance, 1996.
[34]
W. Perry. Effective Methods for Software Testing. Wiley Publishing, 2006.
[35]
PIT. http://pitest.org/.
[36]
Randoop. https://code.google.com/p/randoop/.
[37]
R. Sharma. Guidelines for coverage-based comparisons of non-adequate test suites. Master’s thesis, University of Illinois at Urbana-Champaign, 2013.
[38]
SLOCCount. http://dwheeler.com/sloccount.
[39]
W. E. Wong, J. R. Horgan, S. London, and A. P. Mathur. Effect of test set size and block coverage on the fault detection effectiveness. In Proc. of the Int’l Symposium on Software Reliability Engineering, 1994.

Cited By

View all
  • (2025)Subsumption, correctness and relative correctness: Implications for software testingScience of Computer Programming10.1016/j.scico.2024.103177239(103177)Online publication date: Jan-2025
  • (2025)Impact of methodological choices on the analysis of code metrics and maintenanceJournal of Systems and Software10.1016/j.jss.2024.112263220(112263)Online publication date: Feb-2025
  • (2025)Testability-driven developmentComputer Standards & Interfaces10.1016/j.csi.2024.10387791:COnline publication date: 1-Jan-2025
  • Show More Cited By

Recommendations

Reviews

Andrew Brooks

Should developers aim to write test suites with high coverage__?__ To answer this question, the relationships between coverage, test suite size, and test suite effectiveness were systematically explored making use of five reasonably large Java programs that had existing master test suites. The tool CodeCover was used to measure statement, decision, and modified condition coverage. The tool PIT was used to generate mutants and to report mutation kills as the effectiveness measure. Test suite size was varied by randomly sampling from the existing master test suites. A moderate to high correlation was found between effectiveness and the number of test methods in a test suite. A moderate to high correlation was found between effectiveness and coverage when test suite size was ignored. When test suite size was controlled for, the correlation between effectiveness and coverage was found to range from low to moderate. The authors suggest that coverage should not be used as a quality target. Evidence was also found suggesting that the use of complex coverage measures such as modified condition coverage is not justified. Since manually determining if thousands of mutants are equivalents is a very costly exercise, the authors simply assumed that all mutants not detected by the existing master test suites were equivalent. A sampling strategy should have been adopted to at least gauge the degree to which this assumption was correct. Overall, there is much in this study to commend over previous research, and this paper is very strongly recommended to the software engineering community. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
May 2014
1139 pages
ISBN:9781450327565
DOI:10.1145/2568225
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Coverage
  2. test suite effectiveness
  3. test suite quality

Qualifiers

  • Research-article

Conference

ICSE '14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)262
  • Downloads (Last 6 weeks)57
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Subsumption, correctness and relative correctness: Implications for software testingScience of Computer Programming10.1016/j.scico.2024.103177239(103177)Online publication date: Jan-2025
  • (2025)Impact of methodological choices on the analysis of code metrics and maintenanceJournal of Systems and Software10.1016/j.jss.2024.112263220(112263)Online publication date: Feb-2025
  • (2025)Testability-driven developmentComputer Standards & Interfaces10.1016/j.csi.2024.10387791:COnline publication date: 1-Jan-2025
  • (2024)An Automated Method for Checking and Debugging Test Scenarios Based on Formal ModelsControl Systems and Computers10.15407/csc.2024.03.033(33-44)Online publication date: Oct-2024
  • (2024)Improving the Comprehension of R Programs by Hybrid Dataflow AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695603(2490-2493)Online publication date: 27-Oct-2024
  • (2024)Effective Unit Test Generation for Java Null Pointer ExceptionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695484(1044-1056)Online publication date: 27-Oct-2024
  • (2024)WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language ModelsProceedings of the ACM on Programming Languages10.1145/36897368:OOPSLA2(709-735)Online publication date: 8-Oct-2024
  • (2024)Software Engineering Methods for AI-Driven Deductive Legal ReasoningProceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3689492.3690050(85-95)Online publication date: 17-Oct-2024
  • (2024)Increasing The Thoroughness Of Data Flow Testing With The Required k-Use ChainsProceedings of the 2024 10th International Conference on Computer Technology Applications10.1145/3674558.3674563(27-32)Online publication date: 15-May-2024
  • (2024)Can Coverage Criteria Guide Failure Discovery for Image Classifiers? An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/367244633:7(1-28)Online publication date: 13-Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media