Article

Is mutation an appropriate tool for testing experiments?

Authors:

J. H. Andrews,

L. C. Briand,

Y. LabicheAuthors Info & Claims

ICSE '05: Proceedings of the 27th international conference on Software engineering

Pages 402 - 411

https://doi.org/10.1145/1062455.1062530

Published: 15 May 2005 Publication History

Get Access

Abstract

The empirical assessment of test techniques plays an important role in software testing research. One common practice is to instrument faults, either manually or by using mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. This paper investigates this important question based on a number of programs with comprehensive pools of test cases and known faults. It is concluded that, based on the data available thus far, the use of mutation operators is yielding trustworthy results (generated mutants are similar to real faults). Mutants appear however to be different from hand-seeded faults that seem to be harder to detect than real faults.

References

[1]

J. H. Andrews and Y. Zhang, "General Test Result Checking with Log File Analysis," IEEE Transactions on Software Engineering, vol. 29 (7), pp. 634--648, 2003.

Digital Library

Google Scholar

[2]

L. Briand, Y. Labiche and Y. Wang, "Using Simulation to Empirically Investigate Test Coverage Criteria," Proc. IEEE/ACM International Conference on Software Engineering, Edinburgh, pp. 86--95, May, 2004.

Digital Library

Google Scholar

[3]

T. A. Budd and D. Angluin, "Two Notions of Correctness and their Relation to Testing," Acta Informatica, vol. 18 (1), pp. 31--45, 1982.

Digital Library

Google Scholar

[4]

D. T. Campbell and J. C. Stanley, Experimental and Quasi-Experimental Designs for Research, Houghton Mifflin Company, 1990.

Google Scholar

[5]

W. Chen, R. H. Untch, G. Rothermel, S. Elbaum and J. von Ronne, "Can fault-exposure-potential estimates improve the fault detection abilities of test suites?," Software Testing, Verification and Reliability, vol. 12 (4), pp. 197--218, 2002.

Crossref

Google Scholar

[6]

R. A. DeMillo, R. J. Lipton and F. G. Sayward, "Hints on Test Data Selection: Help for the Practicing Programmer," IEEE Computer, vol. 11 (4), pp. 34--41, 1978.

Digital Library

Google Scholar

[7]

J. L. Devore, Probability and Statistics for Engineering and the Sciences, Duxbury Press, 5th Edition, 1999.

Google Scholar

[8]

H. Do, G. Rothermel and S. Elbaum, "Infrastructure support for controlled experimentation with software testing and regression testing techniques," Oregon State University, Corvallis, OR, USA, Technical report 04-06-01, January, 2004.

Google Scholar

[9]

P. G. Frankl and O. Iakounenko, "Further Empirical Studies of test Effectiveness," Proc. 6th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Orlando (FL, USA), pp. 153--162, November 1-5, 1998.

Digital Library

Google Scholar

[10]

P. G. Frankl and S. N. Weiss, "An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria," Proc. 4th Symposium on Testing, Analysis, and Verification, New York, pp. 154--164, 1991.

Digital Library

Google Scholar

[11]

T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter and G. Rothermel, "An Empirical Study of Regression Test Selection Techniques," ACM Transactions on Software Engineering and Methodology, vol. 10 (2), pp. 184--208, 2001.

Digital Library

Google Scholar

[12]

R. G. Hamlet, "Testing programs with the aid of a compiler," IEEE Transactions on Software Engineering, vol. 3 (4), pp. 279--290, 1977.

Digital Library

Google Scholar

[13]

M. Harder, J. Mellen and M. D. Ernst, "Improving Test Suites via Operational Abstraction," Proc. 25th International Conference on Software Engineering, Portland, OR, USA, pp. 60--71, May, 2003.

Digital Library

Google Scholar

[14]

M. Hutchins, H. Froster, T. Goradia and T. Ostrand, "Experiments on the Effectiveness of Dataflow- and Controlflow-Based Test Adequacy Criteria," Proc. 16th IEEE International Conference on Software Engineering, Sorrento (Italy), pp. 191--200, May 16-21, 1994.

Digital Library

Google Scholar

[15]

S. Kim, J. A. Clark and J. A. McDermid, "Investigating the Effectiveness of Object-Oriented Testing Strategies with the Mutation Method," Software Testing, Verification and Reliability, vol. 11 (3), pp. 207--225, 2001.

Crossref

Google Scholar

[16]

A. M. Memon, I. Banerjee and A. Nagarajan, "What Test Oracle Should I use for Effective GUI Testing?," Proc. IEEE International Conference on Automated Software Engineering (ASE'03), Montreal, Quebec, Canada, pp. 164--173, October, 2003.

Digital Library

Google Scholar

[17]

A. J. Offutt, "Investigations of the Software Testing Coupling Effect," ACM Transactions on Software Engineering and Methodology, vol. 1 (1), pp. 3--18, 1992.

Digital Library

Google Scholar

[18]

A. J. Offutt, A. Lee, G. Rothermel, R. H. Untch and C. Zapf, "An Experimental Determination of Sufficient Mutation Operators," ACM Transactions on Software Engineering and Methodology, vol. 5 (2), pp. 99--118, 1996.

Digital Library

Google Scholar

[19]

A. J. Offutt and J. Pan, "Detecting Equivalent Mutants and the Feasible Path Problem," Software Testing, Verification, and Reliability, vol. 7 (3), pp. 165--192, 1997.

Crossref

Google Scholar

[20]

A. J. Offutt and R. H. Untch, "Mutation 2000: Uniting the Orthogonal," Proc. Mutation, San Jose, CA, USA, pp. 45--55, October, 2000.

Google Scholar

[21]

T. J. Ostrand and M. J. Balcer, "The Category-Partition Method for Specifying and Generating Functional Test," Communications of the ACM, vol. 31 (6), pp. 676--686, 1988.

Digital Library

Google Scholar

[22]

J. Rice, Mathematical Statistics and Data Analysis, Duxbury press, 2nd Edition, 1995.

Google Scholar

[23]

G. Rothermel and M. J. Harrold, "Empirical Studies of a Safe Regression Test Selection Technique," IEEE Trans. on Software Engineering, vol. 24 (6), pp. 401--419, 1998.

Digital Library

Google Scholar

[24]

P. Thévenod-Fosse, H. Waeselynck and Y. Crouzet, "An experimental study on software structural testing: deterministic versus random input generation," Proc. 21st International Symposium on Fault-Tolerant Computing, Montreal, Canada, pp. 410--417, June, 1991.

Crossref

Google Scholar

[25]

F. I. Vokolos and P. G. Frankl, "Empirical evaluation of the textual differencing regression testing technique," Proc. IEEE International Conference on Software Maintenance, Bethesda, MD, USA, pp. 44--53, March, 1998.

Digital Library

Google Scholar

[26]

C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell and A. Wesslen, Experimentation in Software Engineering - An Introduction, Kluwer, 2000.

Digital Library

Google Scholar

Cited By

View all

AlBlwi SMarsit IKhaireddine BAyad ALoh JMili A(2025)Subsumption, correctness and relative correctness: Implications for software testingScience of Computer Programming10.1016/j.scico.2024.103177239(103177)Online publication date: Jan-2025
https://doi.org/10.1016/j.scico.2024.103177
Guerino LKuroishi PPaiva AVincenzi A(2024)Static and Dynamic Comparison of Mutation Testing Tools for PythonProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701659(199-209)Online publication date: 5-Nov-2024
https://dl.acm.org/doi/10.1145/3701625.3701659
Brasil ADe Souza MDomenicali FMachado RJunior J(2024)Mutation Testing as a Quality Assurance Technique in a Fintech Company: An Experience Report in The IndustryProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701629(446-451)Online publication date: 5-Nov-2024
https://dl.acm.org/doi/10.1145/3701625.3701629
Show More Cited By

Index Terms

Is mutation an appropriate tool for testing experiments?

Recommendations

Are mutants a valid substitute for real faults in software testing?
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

A good test suite is one that detects real faults. Because the set of faults in a program is usually unknowable, this definition is not useful to practitioners who are creating test suites, nor to researchers who are creating and evaluating tools that ...
How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults

Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring the ratio that is revealed by the candidate ...
An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption
ICSE '17: Proceedings of the 39th International Conference on Software Engineering

Many studies suggest using coverage concepts, such as branch coverage, as the starting point of testing, while others as the most prominent test quality indicator. Yet the relationship between coverage and fault-revelation remains unknown, yielding ...

Reviews

Reviewer: Andrew Brooks

The effectiveness of a program test suite can be measured by how many mutated versions of the program are detected that contain an injected defect. Mutations (defects) are injected through the application of simple rules, such as "negate decision" or "replace operator," a process that is objective and repeatable. But are the detection rates for mutations representative of the detection rates for real faults__?__ An experiment conducted to answer this question was based on one program with real faults, seven programs with hand-seeded faults, and existing comprehensive pools of test cases. Mutants (mutated versions) were systematically created for each program. Then, 5,000 test suites of varying sizes were randomly selected from the existing comprehensive pool of test cases for each program, and the detection rates for mutants, real faults, and hand-seeded faults were determined. Figures 5 and 6 in this paper, which show how detection rates vary by size of test suite, provide convincing evidence that mutations are similar to real faults in terms of detection difficulty, but that the hand-seeded faults are more difficult to detect. The latter result is explained by the fact that those responsible for the hand-seeded faults differentially filtered out faults that were easy to detect. The authors rightly conclude that the hand-seeding of faults is a subjective and undefined process, and that replication studies are needed to confirm their result that detection rates for mutations are representative of detection rates of real faults. This paper is strongly recommended to those researching the effectiveness of software testing techniques. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICSE '05: Proceedings of the 27th international conference on Software engineering

May 2005

754 pages

ISBN:1581139632

DOI:10.1145/1062455

General Chair:
Gruia-Catalin Roman
Washington University in St. Louis
,
Program Chairs:
William Griswold
University of California, San Diego
,
Bashar Nuseibeh
The Open University, UK

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICSE05

Sponsor:

ICSE05: 27th International Conference on Software Engineering

May 15 - 21, 2005

MO, St. Louis, USA

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

420
Total Citations
View Citations
2,743
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)4

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

AlBlwi SMarsit IKhaireddine BAyad ALoh JMili A(2025)Subsumption, correctness and relative correctness: Implications for software testingScience of Computer Programming10.1016/j.scico.2024.103177239(103177)Online publication date: Jan-2025
https://doi.org/10.1016/j.scico.2024.103177
Guerino LKuroishi PPaiva AVincenzi A(2024)Static and Dynamic Comparison of Mutation Testing Tools for PythonProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701659(199-209)Online publication date: 5-Nov-2024
https://dl.acm.org/doi/10.1145/3701625.3701659
Brasil ADe Souza MDomenicali FMachado RJunior J(2024)Mutation Testing as a Quality Assurance Technique in a Fintech Company: An Experience Report in The IndustryProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701629(446-451)Online publication date: 5-Nov-2024
https://dl.acm.org/doi/10.1145/3701625.3701629
Zhang XSong YXie XXin QXing CFilkov VRay BZhou M(2024)Do not neglect what's on your hands: localizing software faults with exception trigger streamProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695479(982-994)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695479
Song YZhang XXie XChen SLiu QGao R(2024)SURE: A Visualized Failure Indexing Approach Using Program Memory SpectrumACM Transactions on Software Engineering and Methodology10.1145/367695833:8(1-43)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3676958
Kuznetsov KGambi ADhiddi SHess JGopinath R(2024)Empirical Evaluation of Frequency Based Statistical Models for Estimating Killable MutantsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686669(61-71)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686669
Tavares PPaiva AAmalfitano DJust RChristakis MPradel M(2024)FRAFOL: FRAmework FOr Learning mutation testingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685306(1846-1850)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3685306
Tian ZShu HWang DCao XKamei YChen JChristakis MPradel M(2024)Large Language Models for Equivalent Mutant Detection: How Far Are We?Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680395(1733-1745)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680395
Mansur RShaffer CEdwards SDorodchi MZhange MCooper S(2024)Mutating Matters: Analyzing the Influence of Mutation Testing in Programming CoursesProceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 110.1145/3649165.3690110(151-157)Online publication date: 5-Dec-2024
https://dl.acm.org/doi/10.1145/3649165.3690110
Song YZhang XXie XLiu QGao RXing CRoychoudhury APaiva AAbreu RStorey M(2024)ReClues: Representing and indexing failures in parallel debugging with program variablesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639098(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639098
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Are mutants a valid substitute for real faults in software testing?

How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults

An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption

Reviews

Access critical reviews of Computing literature here