Article

Empirically Evaluating the Quality of Automatically Generated and Manually Written Test Suites

Authors:

Jeshua S. Kracht,

Jacob Z. Petrovic,

Kristen R. Walcott-JusticeAuthors Info & Claims

QSIC '14: Proceedings of the 2014 14th International Conference on Quality Software

Pages 256 - 265

https://doi.org/10.1109/QSIC.2014.33

Published: 02 October 2014 Publication History

Abstract

The creation, execution, and maintenance of tests are some of the most expensive tasks in software development. To help reduce the cost, automated test generation tools can be used to assist and guide developers in creating test cases. Yet, the tests that automated tools produce range from simple skeletons to fully executable test suites, hence their complexity and quality vary. This paper compares the complexity and quality of test suites created by sophisticated automated test generation tools to that of developer-written test suites. The empirical study in this paper examines ten real-world programs with existing test suites and applies two state-of-the-art automated test generation tools. The study measures the resulting test suite quality in terms of code coverage and fault-finding capability. On average, manual tests covered 31.5% of the branches while the automated tools covered 31.8% of the branches. In terms of mutation score, the tests generated by automated tools had an average mutation score of 39.8% compared to the average mutation score of 42.1% for manually written tests. Even though automatically created tests often contain more lines of source code than those written by developers, this paper's empirical results reveal that test generation tools can provide value by creating high quality test suites while reducing the cost and effort needed for testing.

Cited By

View all

Davis MChoi SEstep SMyers BSunshine JChandra SBlincoe KTonella P(2023)NaNofuzz: A Usable Tool for Automatic Test GenerationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616327(1114-1126)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616327
Kurmaku TEnoiu EKumrija M(2022)Human-based Test Design versus Automated Test Generation: A Literature Review and Meta-AnalysisProceedings of the 15th Innovations in Software Engineering Conference10.1145/3511430.3511433(1-11)Online publication date: 24-Feb-2022
https://dl.acm.org/doi/10.1145/3511430.3511433
Araujo FRizzo Vincenzi A(2020)How far are we from testing a program in a completely automated way, considering the mutation testing criterion at unit level?Proceedings of the XIX Brazilian Symposium on Software Quality10.1145/3439961.3439977(1-9)Online publication date: 1-Dec-2020
https://dl.acm.org/doi/10.1145/3439961.3439977
Show More Cited By

Index Terms

Empirically Evaluating the Quality of Automatically Generated and Manually Written Test Suites
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Process validation
      2. Software defect analysis
        Software testing and debugging

Recommendations

Evaluating String Distance Metrics for Reducing Automatically Generated Test Suites
AST '24: Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)

Regression test suites can have a large number of test cases, especially automatically generated ones, and tend to grow in size, making it costly to run the entire test suite. Test suite reduction aims to eliminate some test cases to reduce the test ...
Compressing Automatically Generated Unit Test Suites Through Test Parameterization
Fundamentals of Software Engineering
Abstract
Test maintenance has recently gained increasing attention from the software testing research community. When using automated unit test generation tools, the tests are typically created by random test generation or search-based algorithms. Although ...
The complementary aspect of automatically and manually generated test case sets
A-TEST 2016: Proceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation

The test is a mandatory activity for software quality assurance. The knowledge about the software under testing is necessary to generate high-quality test cases, but to execute more than 80% of its source code is not an easy task, and demands an in-...

Reviews

Reviewer: Andrew Brooks

Are automatically generated test suites better than manually written test suites__?__ To answer this question, ten Java applications with existing manually written test suites were tested using the EVOSUITE and CodePro automated test generation tools. Branch coverage and mutation scores were used to assess the quality of test suites. The Jacoco and MAJOR tools were used to calculate these measures. EVOSUITE covered 31.86 percent of branches on average and had an average mutation score of 39.89 percent. For the manually written test suites, the figures were, respectively, 31.5 percent and 42.14 percent. The authors conclude that their results should encourage use of a tool such as EVOSUITE for test production. CodePro's test quality was found to be much lower and this was attributed to absent or weaker oracles. Also investigated was the relationship between branch coverage and mutation score. By inspection, Figures 7 and 8 do indeed suggest correlations are present for EVOSUITE and the manually written test suites. It is unclear, however, what the actual correlation scores are. The investigators imply they calculated non-linear fits, but in Figures 7 and 8, linear lines are drawn. The analysis presented has two major weaknesses. First, there is no discussion or treatment of equivalent mutants. There can be sizable changes in mutation scores when equivalent mutants are factored out. Second, there is no discussion or treatment of the degree to which branches actually tested overlapped with branches actually containing mutations. Despite the shortcomings identified, this paper is strongly recommended to those working in software testing. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

QSIC '14: Proceedings of the 2014 14th International Conference on Quality Software

October 2014

366 pages

ISBN:9781479971985

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 October 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Davis MChoi SEstep SMyers BSunshine JChandra SBlincoe KTonella P(2023)NaNofuzz: A Usable Tool for Automatic Test GenerationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616327(1114-1126)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616327
Kurmaku TEnoiu EKumrija M(2022)Human-based Test Design versus Automated Test Generation: A Literature Review and Meta-AnalysisProceedings of the 15th Innovations in Software Engineering Conference10.1145/3511430.3511433(1-11)Online publication date: 24-Feb-2022
https://dl.acm.org/doi/10.1145/3511430.3511433
Araujo FRizzo Vincenzi A(2020)How far are we from testing a program in a completely automated way, considering the mutation testing criterion at unit level?Proceedings of the XIX Brazilian Symposium on Software Quality10.1145/3439961.3439977(1-9)Online publication date: 1-Dec-2020
https://dl.acm.org/doi/10.1145/3439961.3439977
Souza BSmaragdakis Y(2019)Is mutation score a fair metric?Proceedings Companion of the 2019 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3359061.3361084(41-43)Online publication date: 20-Oct-2019
https://dl.acm.org/doi/10.1145/3359061.3361084
Silva IAlves EMachado PKulesza UPrikladnicki RGerosa MWerner CAndrade R(2018)Can automated test case generation cope with extract method validation?Proceedings of the XXXII Brazilian Symposium on Software Engineering10.1145/3266237.3266274(152-161)Online publication date: 17-Sep-2018
https://dl.acm.org/doi/10.1145/3266237.3266274
Vincenzi ABachiega Tde Oliveira Dde Souza SMaldonado JVos TEldh SPrasetya W(2016)The complementary aspect of automatically and manually generated test case setsProceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation10.1145/2994291.2994295(23-30)Online publication date: 18-Nov-2016
https://dl.acm.org/doi/10.1145/2994291.2994295
Wang XZhang LTanofsky PYoung MXie T(2015)Experience report: how is dynamic symbolic execution different from manual testing? a study on KLEEProceedings of the 2015 International Symposium on Software Testing and Analysis10.1145/2771783.2771818(199-210)Online publication date: 13-Jul-2015
https://dl.acm.org/doi/10.1145/2771783.2771818

Abstract

Cited By

Index Terms

Recommendations

Evaluating String Distance Metrics for Reducing Automatically Generated Test Suites

Compressing Automatically Generated Unit Test Suites Through Test Parameterization

The complementary aspect of automatically and manually generated test case sets

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations