More Web Proxy on the site http://driver.im/

research-article

Coverage is not strongly correlated with test suite effectiveness

Authors:

Laura Inozemtseva,

Reid HolmesAuthors Info & Claims

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

Pages 435 - 445

https://doi.org/10.1145/2568225.2568271

Published: 31 May 2014 Publication History

Abstract

The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did not account for the confounding influence of test suite size. In addition, most of the studies were done with adequate suites, which are are rare in practice, so the results may not generalize to typical test suites.

We have extended these studies by evaluating the relationship between test suite size, coverage, and effectiveness for large Java programs. Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness.

We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.

References

[1]

L. M. Adler. A modification of Kendall’s tau for the case of arbitrary ties in both rankings. Journal of the American Statistical Association, 52(277), 1957.

[2]

J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In Proc. of the Int’l Conf. on Soft. Eng., 2005.

Digital Library

[3]

J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Soft. Eng., 32(8), 2006.

Digital Library

[4]

Apache POI. http://poi.apache.org.

[5]

L. Briand and D. Pfahl. Using simulation for assessing the real impact of test coverage on defect coverage. In Proc. of the Int’l Symposium on Software Reliability Engineering, 1999.

Digital Library

[6]

X. Cai and M. R. Lyu. The effect of code coverage on fault detection under different testing profiles. In Proc. of the Int’l Workshop on Advances in Model-Based Testing, 2005.

Digital Library

[7]

Closure Compiler. https://code.google.com/p/closure-compiler/.

[8]

CodeCover. http://codecover.org/.

[9]

Coverlipse. http://coverlipse.sourceforge.net/.

[10]

M. Daran and P. Thévenod-Fosse. Software error analysis: a real case study involving real faults and mutations. In Proc. of the Int’l Symposium on Software Testing and Analysis, 1996.

Digital Library

[11]

K. El Emam, S. Benlarbi, N. Goel, and S. N. Rai. The confounding effect of class size on the validity of object-oriented metrics. IEEE Transactions on Soft. Eng., 27(7), 2001.

Digital Library

[12]

N. E. Fenton and N. Ohlsson. Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Soft. Eng., 26(8), 2000.

Digital Library

[13]

M. Fowler. Test coverage. http: //martinfowler.com/bliki/TestCoverage.html, 2012.

[14]

P. G. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. In Proc. of the Int’l Symposium on Foundations of Soft. Eng., 1998.

Digital Library

[15]

P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria. In Proc. of the Symposium on Testing, Analysis, and Verification, 1991.

Digital Library

[16]

P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Transactions on Soft. Eng., 19(8), 1993.

Digital Library

[17]

P. G. Frankl, S. N. Weiss, and C. Hu. All-uses vs mutation testing: an experimental comparison of effectiveness. Journal of Systems and Software, 38(3), 1997.

Digital Library

[18]

J. D. Gibbons. Nonparametric Measures of Association. Sage Publications, 1993.

[19]

M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proc. of the Int’l Symp. on Soft. Testing and Analysis, 2013.

Digital Library

[20]

R. Gopinath, C. Jenson, and A. Groce. Code coverage for suite evaluation by developers. In Proc. of the Int’l Conf. on Soft. Eng., 2014.

Digital Library

[21]

J. P. Guilford. Fundamental Statistics in Psychology and Education. McGraw-Hill, 1942.

[22]

K. Hayhurst, D. Veerhusen, J. Chilenski, and L. Rierson. A practical tutorial on modified condition/decision coverage. Technical report, NASA Langley Research Center, 2001.

Digital Library

[23]

HSQLDB. http://hsqldb.org.

[24]

M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In Proc. of the Int’l Conf. on Soft. Eng., 1994.

Digital Library

[25]

JFreeChart. http://jfree.org/jfreechart.

[26]

Joda Time. http://joda-time.sourceforge.net.

[27]

R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? Technical Report UW-CSE-14-02-02, University of Washington, March 2014.

[28]

K. Kapoor. Formal analysis of coupling hypothesis for logical faults. Innovations in Systems and Soft. Eng., 2(2), 2006.

[29]

E. Kit. Software Testing in the Real World: Improving the Process. ACM Press, 1995.

Digital Library

[30]

B. Marick. How to misuse code coverage. http://www. exampler.com/testing-com/writings/coverage.pdf, 1997.

[31]

A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proc. of the Int’l Symposium on Software Testing and Analysis, 2009.

Digital Library

[32]

A. J. Offutt. Investigations of the software testing coupling effect. ACM Transactions on Soft. Eng. and Methodology, 1(1), 1992.

Digital Library

[33]

A. J. Offutt and J. Pan. Detecting equivalent mutants and the feasible path problem. In Proc. of the Conf. on Computer Assurance, 1996.

[34]

W. Perry. Effective Methods for Software Testing. Wiley Publishing, 2006.

Digital Library

[35]

PIT. http://pitest.org/.

[36]

Randoop. https://code.google.com/p/randoop/.

[37]

R. Sharma. Guidelines for coverage-based comparisons of non-adequate test suites. Master’s thesis, University of Illinois at Urbana-Champaign, 2013.

[38]

SLOCCount. http://dwheeler.com/sloccount.

[39]

W. E. Wong, J. R. Horgan, S. London, and A. P. Mathur. Effect of test set size and block coverage on the fault detection effectiveness. In Proc. of the Int’l Symposium on Software Reliability Engineering, 1994.

Cited By

AlBlwi SMarsit IKhaireddine BAyad ALoh JMili A(2025)Subsumption, correctness and relative correctness: Implications for software testingScience of Computer Programming10.1016/j.scico.2024.103177239(103177)Online publication date: Jan-2025
https://doi.org/10.1016/j.scico.2024.103177
Ahmad SChowdhury SHolmes R(2025)Impact of methodological choices on the analysis of code metrics and maintenanceJournal of Systems and Software10.1016/j.jss.2024.112263220(112263)Online publication date: Feb-2025
https://doi.org/10.1016/j.jss.2024.112263
Parsa SZakeri-Nasrabadi MTurhan B(2025)Testability-driven developmentComputer Standards & Interfaces10.1016/j.csi.2024.10387791:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.csi.2024.103877
Show More Cited By

Index Terms

Coverage is not strongly correlated with test suite effectiveness
1. General and reference
  1. Cross-computing tools and techniques
    1. Metrics
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Assertions are strongly correlated with test suite effectiveness
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Code coverage is a popular test adequacy criterion in practice. Code coverage, however, remains controversial as there is a lack of coherent empirical evidence for its relation with test suite effectiveness. More recently, test suite size has been ...
Using coverage effectiveness to evaluate test suite prioritizations
WEASELTech '07: Proceedings of the 1st ACM international workshop on Empirical assessment of software engineering languages and technologies: held in conjunction with the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE) 2007

Regression test suite prioritization techniques reorder a test suite with the goal of ensuring that the reorganized test suite finds faults faster than the initial ordering. It is challenging to empirically evaluate the effectiveness of a new test case ...
Investigating faults missed by test suites achieving high code coverage
Highlights
- Studies how effectively test suites with high code coverage find 45 different faults.
Abstract
Code coverage criteria are commonly used to determine the adequacy of a test suite. However, studies investigating code coverage and fault-finding capabilities have mixed results. Some studies have shown that creating test suites to ...

Reviews

Reviewer: Andrew Brooks

Should developers aim to write test suites with high coverage__?__ To answer this question, the relationships between coverage, test suite size, and test suite effectiveness were systematically explored making use of five reasonably large Java programs that had existing master test suites. The tool CodeCover was used to measure statement, decision, and modified condition coverage. The tool PIT was used to generate mutants and to report mutation kills as the effectiveness measure. Test suite size was varied by randomly sampling from the existing master test suites. A moderate to high correlation was found between effectiveness and the number of test methods in a test suite. A moderate to high correlation was found between effectiveness and coverage when test suite size was ignored. When test suite size was controlled for, the correlation between effectiveness and coverage was found to range from low to moderate. The authors suggest that coverage should not be used as a quality target. Evidence was also found suggesting that the use of complex coverage measures such as modified condition coverage is not justified. Since manually determining if thousands of mutants are equivalents is a very costly exercise, the authors simply assumed that all mutants not detected by the existing master test suites were equivalent. A sampling strategy should have been adopted to at least gauge the degree to which this assumption was correct. Overall, there is much in this study to commend over previous research, and this paper is very strongly recommended to the software engineering community. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

May 2014

1139 pages

ISBN:9781450327565

DOI:10.1145/2568225

General Chair:
Pankaj Jalote
IIIT-Delhi, India
,
Program Chairs:
Lionel Briand
University of Luxembourg, Luxembourg
,
André van der Hoek
University of California, Irvine, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '14

Sponsor:

SIGSOFT

ICSE '14: 36th International Conference on Software Engineering

May 31 - June 7, 2014

Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

305
Total Citations
View Citations
2,549
Total Downloads

Downloads (Last 12 months)262
Downloads (Last 6 weeks)57

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

AlBlwi SMarsit IKhaireddine BAyad ALoh JMili A(2025)Subsumption, correctness and relative correctness: Implications for software testingScience of Computer Programming10.1016/j.scico.2024.103177239(103177)Online publication date: Jan-2025
https://doi.org/10.1016/j.scico.2024.103177
Ahmad SChowdhury SHolmes R(2025)Impact of methodological choices on the analysis of code metrics and maintenanceJournal of Systems and Software10.1016/j.jss.2024.112263220(112263)Online publication date: Feb-2025
https://doi.org/10.1016/j.jss.2024.112263
Parsa SZakeri-Nasrabadi MTurhan B(2025)Testability-driven developmentComputer Standards & Interfaces10.1016/j.csi.2024.10387791:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.csi.2024.103877
Kolchyn OPotiyenko S(2024)An Automated Method for Checking and Debugging Test Scenarios Based on Formal ModelsControl Systems and Computers10.15407/csc.2024.03.033(33-44)Online publication date: Oct-2024
https://doi.org/10.15407/csc.2024.03.033
Sihler FFilkov VRay BZhou M(2024)Improving the Comprehension of R Programs by Hybrid Dataflow AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695603(2490-2493)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695603
Lee MBak JMoon SJhi YOh HFilkov VRay BZhou M(2024)Effective Unit Test Generation for Java Null Pointer ExceptionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695484(1044-1056)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695484
Yang CDeng YLu RYao JLiu JJabbarvand RZhang L(2024)WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language ModelsProceedings of the ACM on Programming Languages10.1145/36897368:OOPSLA2(709-735)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689736
Padhye REdwards JTaeumel M(2024)Software Engineering Methods for AI-Driven Deductive Legal ReasoningProceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3689492.3690050(85-95)Online publication date: 17-Oct-2024
https://dl.acm.org/doi/10.1145/3689492.3690050
Kolchyn OPotiyenko S(2024)Increasing The Thoroughness Of Data Flow Testing With The Required k-Use ChainsProceedings of the 2024 10th International Conference on Computer Technology Applications10.1145/3674558.3674563(27-32)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3674558.3674563
Wang ZXu SFan LCai XLi LLiu Z(2024)Can Coverage Criteria Guide Failure Discovery for Image Classifiers? An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/367244633:7(1-28)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672446
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents