[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1062455.1062530acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Is mutation an appropriate tool for testing experiments?

Published: 15 May 2005 Publication History

Abstract

The empirical assessment of test techniques plays an important role in software testing research. One common practice is to instrument faults, either manually or by using mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. This paper investigates this important question based on a number of programs with comprehensive pools of test cases and known faults. It is concluded that, based on the data available thus far, the use of mutation operators is yielding trustworthy results (generated mutants are similar to real faults). Mutants appear however to be different from hand-seeded faults that seem to be harder to detect than real faults.

References

[1]
J. H. Andrews and Y. Zhang, "General Test Result Checking with Log File Analysis," IEEE Transactions on Software Engineering, vol. 29 (7), pp. 634--648, 2003.
[2]
L. Briand, Y. Labiche and Y. Wang, "Using Simulation to Empirically Investigate Test Coverage Criteria," Proc. IEEE/ACM International Conference on Software Engineering, Edinburgh, pp. 86--95, May, 2004.
[3]
T. A. Budd and D. Angluin, "Two Notions of Correctness and their Relation to Testing," Acta Informatica, vol. 18 (1), pp. 31--45, 1982.
[4]
D. T. Campbell and J. C. Stanley, Experimental and Quasi-Experimental Designs for Research, Houghton Mifflin Company, 1990.
[5]
W. Chen, R. H. Untch, G. Rothermel, S. Elbaum and J. von Ronne, "Can fault-exposure-potential estimates improve the fault detection abilities of test suites?," Software Testing, Verification and Reliability, vol. 12 (4), pp. 197--218, 2002.
[6]
R. A. DeMillo, R. J. Lipton and F. G. Sayward, "Hints on Test Data Selection: Help for the Practicing Programmer," IEEE Computer, vol. 11 (4), pp. 34--41, 1978.
[7]
J. L. Devore, Probability and Statistics for Engineering and the Sciences, Duxbury Press, 5th Edition, 1999.
[8]
H. Do, G. Rothermel and S. Elbaum, "Infrastructure support for controlled experimentation with software testing and regression testing techniques," Oregon State University, Corvallis, OR, USA, Technical report 04-06-01, January, 2004.
[9]
P. G. Frankl and O. Iakounenko, "Further Empirical Studies of test Effectiveness," Proc. 6th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Orlando (FL, USA), pp. 153--162, November 1-5, 1998.
[10]
P. G. Frankl and S. N. Weiss, "An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria," Proc. 4th Symposium on Testing, Analysis, and Verification, New York, pp. 154--164, 1991.
[11]
T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter and G. Rothermel, "An Empirical Study of Regression Test Selection Techniques," ACM Transactions on Software Engineering and Methodology, vol. 10 (2), pp. 184--208, 2001.
[12]
R. G. Hamlet, "Testing programs with the aid of a compiler," IEEE Transactions on Software Engineering, vol. 3 (4), pp. 279--290, 1977.
[13]
M. Harder, J. Mellen and M. D. Ernst, "Improving Test Suites via Operational Abstraction," Proc. 25th International Conference on Software Engineering, Portland, OR, USA, pp. 60--71, May, 2003.
[14]
M. Hutchins, H. Froster, T. Goradia and T. Ostrand, "Experiments on the Effectiveness of Dataflow- and Controlflow-Based Test Adequacy Criteria," Proc. 16th IEEE International Conference on Software Engineering, Sorrento (Italy), pp. 191--200, May 16-21, 1994.
[15]
S. Kim, J. A. Clark and J. A. McDermid, "Investigating the Effectiveness of Object-Oriented Testing Strategies with the Mutation Method," Software Testing, Verification and Reliability, vol. 11 (3), pp. 207--225, 2001.
[16]
A. M. Memon, I. Banerjee and A. Nagarajan, "What Test Oracle Should I use for Effective GUI Testing?," Proc. IEEE International Conference on Automated Software Engineering (ASE'03), Montreal, Quebec, Canada, pp. 164--173, October, 2003.
[17]
A. J. Offutt, "Investigations of the Software Testing Coupling Effect," ACM Transactions on Software Engineering and Methodology, vol. 1 (1), pp. 3--18, 1992.
[18]
A. J. Offutt, A. Lee, G. Rothermel, R. H. Untch and C. Zapf, "An Experimental Determination of Sufficient Mutation Operators," ACM Transactions on Software Engineering and Methodology, vol. 5 (2), pp. 99--118, 1996.
[19]
A. J. Offutt and J. Pan, "Detecting Equivalent Mutants and the Feasible Path Problem," Software Testing, Verification, and Reliability, vol. 7 (3), pp. 165--192, 1997.
[20]
A. J. Offutt and R. H. Untch, "Mutation 2000: Uniting the Orthogonal," Proc. Mutation, San Jose, CA, USA, pp. 45--55, October, 2000.
[21]
T. J. Ostrand and M. J. Balcer, "The Category-Partition Method for Specifying and Generating Functional Test," Communications of the ACM, vol. 31 (6), pp. 676--686, 1988.
[22]
J. Rice, Mathematical Statistics and Data Analysis, Duxbury press, 2nd Edition, 1995.
[23]
G. Rothermel and M. J. Harrold, "Empirical Studies of a Safe Regression Test Selection Technique," IEEE Trans. on Software Engineering, vol. 24 (6), pp. 401--419, 1998.
[24]
P. Thévenod-Fosse, H. Waeselynck and Y. Crouzet, "An experimental study on software structural testing: deterministic versus random input generation," Proc. 21st International Symposium on Fault-Tolerant Computing, Montreal, Canada, pp. 410--417, June, 1991.
[25]
F. I. Vokolos and P. G. Frankl, "Empirical evaluation of the textual differencing regression testing technique," Proc. IEEE International Conference on Software Maintenance, Bethesda, MD, USA, pp. 44--53, March, 1998.
[26]
C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell and A. Wesslen, Experimentation in Software Engineering - An Introduction, Kluwer, 2000.

Cited By

View all
  • (2025)Subsumption, correctness and relative correctness: Implications for software testingScience of Computer Programming10.1016/j.scico.2024.103177239(103177)Online publication date: Jan-2025
  • (2024)Static and Dynamic Comparison of Mutation Testing Tools for PythonProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701659(199-209)Online publication date: 5-Nov-2024
  • (2024)Mutation Testing as a Quality Assurance Technique in a Fintech Company: An Experience Report in The IndustryProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701629(446-451)Online publication date: 5-Nov-2024
  • Show More Cited By

Recommendations

Reviews

Andrew Brooks

The effectiveness of a program test suite can be measured by how many mutated versions of the program are detected that contain an injected defect. Mutations (defects) are injected through the application of simple rules, such as "negate decision" or "replace operator," a process that is objective and repeatable. But are the detection rates for mutations representative of the detection rates for real faults__?__ An experiment conducted to answer this question was based on one program with real faults, seven programs with hand-seeded faults, and existing comprehensive pools of test cases. Mutants (mutated versions) were systematically created for each program. Then, 5,000 test suites of varying sizes were randomly selected from the existing comprehensive pool of test cases for each program, and the detection rates for mutants, real faults, and hand-seeded faults were determined. Figures 5 and 6 in this paper, which show how detection rates vary by size of test suite, provide convincing evidence that mutations are similar to real faults in terms of detection difficulty, but that the hand-seeded faults are more difficult to detect. The latter result is explained by the fact that those responsible for the hand-seeded faults differentially filtered out faults that were easy to detect. The authors rightly conclude that the hand-seeding of faults is a subjective and undefined process, and that replication studies are needed to confirm their result that detection rates for mutations are representative of detection rates of real faults. This paper is strongly recommended to those researching the effectiveness of software testing techniques. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '05: Proceedings of the 27th international conference on Software engineering
May 2005
754 pages
ISBN:1581139632
DOI:10.1145/1062455
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hand-seeded faults
  2. mutants
  3. real faults

Qualifiers

  • Article

Conference

ICSE05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Subsumption, correctness and relative correctness: Implications for software testingScience of Computer Programming10.1016/j.scico.2024.103177239(103177)Online publication date: Jan-2025
  • (2024)Static and Dynamic Comparison of Mutation Testing Tools for PythonProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701659(199-209)Online publication date: 5-Nov-2024
  • (2024)Mutation Testing as a Quality Assurance Technique in a Fintech Company: An Experience Report in The IndustryProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701629(446-451)Online publication date: 5-Nov-2024
  • (2024)Do not neglect what's on your hands: localizing software faults with exception trigger streamProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695479(982-994)Online publication date: 27-Oct-2024
  • (2024)SURE: A Visualized Failure Indexing Approach Using Program Memory SpectrumACM Transactions on Software Engineering and Methodology10.1145/367695833:8(1-43)Online publication date: 8-Jul-2024
  • (2024)Empirical Evaluation of Frequency Based Statistical Models for Estimating Killable MutantsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686669(61-71)Online publication date: 24-Oct-2024
  • (2024)FRAFOL: FRAmework FOr Learning mutation testingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685306(1846-1850)Online publication date: 11-Sep-2024
  • (2024)Large Language Models for Equivalent Mutant Detection: How Far Are We?Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680395(1733-1745)Online publication date: 11-Sep-2024
  • (2024)Mutating Matters: Analyzing the Influence of Mutation Testing in Programming CoursesProceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 110.1145/3649165.3690110(151-157)Online publication date: 5-Dec-2024
  • (2024)ReClues: Representing and indexing failures in parallel debugging with program variablesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639098(1-13)Online publication date: 20-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media