Abstract
Automated test generation tools enable test automation and further alleviate the low efficiency caused by writing hand-crafted test cases. However, existing automated tools are not mature enough to be widely used by software testing groups. This paper conducts an empirical study on the state-of-the-art automated tools for Java, i.e., EvoSuite, Randoop, JDoop, JTeXpert, T3, and Tardis. We design a test workflow to facilitate the process, which can automatically run tools for test generation, collect data, and evaluate various metrics. Furthermore, we conduct empirical analysis on these six tools and their related techniques from different aspects, i.e., code coverage, mutation score, test suite size, readability, and real fault detection ability. We discuss about the benefits and drawbacks of hybrid techniques based on experimental results. Besides, we introduce our experience in setting up and executing these tools, and summarize their usability and user-friendliness. Finally, we give some insights into automated tools in terms of test suite readability improvement, meaningful assertion generation, test suite reduction for random testing tools, and symbolic execution integration.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anand S, Burke E K, Chen T Y, Clark J, Cohen M B, Grieskamp W, Harman M, Harrold M J, McMinn P. An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software, 2013, 86(8): 1978–2001. DOI: https://doi.org/10.1016/j.jss.2013.02.061.
Chen J J, Bai Y W, Hao D, Zhang L M, Zhang L, Xie B. How do assertions impact coverage-based test-suite reduction? In Proc. the 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST), Mar. 2017, pp.418–423. DOI: https://doi.org/10.1109/ICST.2017.45.
Fraser G, Arcuri A. EvoSuite: Automatic test suite generation for object-oriented software. In Proc. the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Sept. 2011, pp.416–419. DOI: https://doi.org/10.1145/2025113.2025179.
Pacheco C, Lahiri S K, Ernst M D, Ball T. Feedback-directed random test generation. In Proc. the 299th International Conference on Software Engineering (ICSE’07), May 2007, pp.75–84. DOI: https://doi.org/10.1109/ICSE.2007.37.
Dimjašević M, Rakamarić Z. JPF-Doop: Combining concolic and random testing for Java. Collections, 2013, 422(3894): 58470. https://dimjasevic.net/marko/2013/11/17/presented-jpf-doop-at-java-pathfinder-workshop-2013/jpf-workshop-2013.pdf, Mar. 2024.
Sakti A, Pesant G, Gué;hé;neuc Y G. Instance generator and problem representation to improve object oriented code coverage. IEEE Trans. Software Engineering, 2015, 41(3): 294–313. DOI: https://doi.org/10.1109/TSE.2014.2363479.
Prasetya I S W B. T3, a combinator-based random testing tool for Java: Benchmarking. In Proc. the 1st International Workshop on Future Internet Testing, Nov. 2013, pp.101–110. DOI: https://doi.org/10.1007/978-3-319-07785-7_7.
Braione P, Denaro G, Mattavelli A, Pezzè M. Combining symbolic execution and search-based testing for programs with complex heap inputs. In Proc. the 26th ACM SIG-SOFT International Symposium on Software Testing and Analysis, Jul. 2017, pp.90–101. DOI: https://doi.org/10.1145/3092703.3092715.
Panichella A, Kifetew F M, Tonella P. A large scale empirical comparison of state-of-the-art search-based test case generators. Information and Software Technology, 2018, 104: 236–256. DOI: https://doi.org/10.1016/j.infsof.2018.08.009.
Baresi L, Lanzi P L, Miraz M. TestFul: An evolutionary test approach for Java. In Proc. the 3rd International Conference on Software Testing, Verification and Validation, Apr. 2010, pp.185–194. DOI: https://doi.org/10.1109/ICST.2010.54.
Pacheco C, Ernst M D. Randoop: Feedback-directed random testing for Java. In Proc. the Companion to the 22nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion, Oct. 2007, pp.815–816. DOI: https://doi.org/10.1145/1297846.1297902.
Csallner C, Smaragdakis Y. JCrasher: An automatic robustness tester for Java. Software: Practice and Experience, 2004, 34(11): 1025–1050. DOI: https://doi.org/10.1002/spe.602.
King J C. Symbolic execution and program testing. Communications of the ACM, 1976, 19(7): 385–394. DOI: https://doi.org/10.1145/360248.360252.
Păsăreanu C S, Rungta N. Symbolic PathFinder: Symbolic execution of Java bytecode. In Proc. the 25th IEEE/ACM International Conference on Automated Software Engineering, Sept. 2010, pp.179–180. DOI: https://doi.org/10.1145/1858996.1859035.
Mues M, Howar F. JDART: Dynamic symbolic execution for JAVA bytecode (competition contribution). In Proc. the 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Apr. 2020, pp.398–402. DOI: https://doi.org/10.1007/978-3-030-45237-7_28.
Li W B, Le Gall F, Spaseski N. A survey on model-based testing tools for test case generation. In Proc. the 4th International Conference on Tools and Methods for Program Analysis, Mar. 2017, pp.77–89. DOI: https://doi.org/10.1007/978-3319-71734-07.
Dranidis D, Bratanis K, Ipate F. JSXM: A tool for automated test generation. In Proc. the 100th International Conference on Software Engineering and Formal Methods, Oct. 2012, pp.352–366. DOI: https://doi.org/10.1007/978-3-642-33826-7_25.
Lakhotia K, Harman M, McMinn P. Handling dynamic data structures in search based testing. In Proc. the 10th Annual Conference on Genetic and Evolutionary Computation, Jul. 2008, pp.1759–1766. DOI: https://doi.org/10.1145/1389095.1389435.
Sen K. Concolic testing. In Proc. the 22nd IEEE/ACM International Conference on Automated Software Engineering, Nov. 2007, pp.571–572. DOI: https://doi.org/10.1145/1321631.1321746.
Braione P, Denaro G. SUSHI and TARDIS at the SBST2019 tool competition. In Proc. the 12th IEEE/ACM International Workshop on Search-Based Software Testing (SBST), May 2019, pp.25–28. DOI: https://doi.org/10.1109/SBST.2019.00016.
Chitirala S C R. Comparing the effectiveness of automated test generation tools “EVOSUITE” and “Tpakis” [Master’s Thesis]. University of Minnesota, Minnesota, 2015.
Ma L, Artho C, Zhang C, Sato H, Gmeiner J, Ramler R. GRT: Program-analysis-guided random testing (T). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov. 2015, pp.212–223. DOI: https://doi.org/10.1109/ASE.2015.49.
Zafar M N, Afzal W, Enoiu E, Stratis A, Arrieta A, Sagardui G. Model-based testing in practice: An industrial case study using graphWalker. In Proc. the 14th Innovations in Software Engineering Conference (Formerly Known as India Software Engineering Conference), Feb. 2021, Article No. 5. DOI: https://doi.org/10.1145/3452383.3452388.
Braione P, Denaro G, Pezzè M. JBSE: A symbolic executor for Java programs with complex heap inputs. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Nov. 2016, pp.1018–1022. DOI: https://doi.org/10.1145/2950290.2983940.
Grano G, De Iaco C, Palomba F, Gall H C. Pizza versus Pinsa: On the perception and measurability of unit test code quality. In Proc. the 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), Sept. 28–Oct. 2, 2020, pp.336–347. DOI: https://doi.org/10.1109/ICSME46990.2020.00040.
Hemmati H. How effective are code coverage criteria? In Proc. the 2015 IEEE International Conference on Software Quality, Reliability and Security, Aug. 2015, pp.151–156. DOI: https://doi.org/10.1109/QRS.2015.30.
Papadakis M, Kintis M, Zhang J, Jia Y, Le Traon Y, Harman M. Mutation testing advances: An analysis and survey. Advances in Computers, 2019, 112: 275–378. DOI: https://doi.org/10.1016/bs.adcom.2018.03.015.
Winkler D, Urbanke P, Ramler R. What do we know about readability of test code?—A systematic mapping study. In Proc. the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Mar. 2022, pp.1167–1174. DOI: https://doi.org/10.1109/SANER53432.2022.00135.
Buse R P L, Weimer W R. Learning a metric for code readability. IEEE Trans. Software Engineering, 2010, 36(4): 546–558. DOI: https://doi.org/10.1109/TSE.2009.70.
Aggarwal K K, Singh Y, Chhabra J K. An integrated measure of software maintainability. In Proc. the Annual Reliability and Maintainability Symposium (Cat. No. 02CH37318), Jan. 2002, pp.235–241. DOI: https://doi.org/10.1109//RAMS.2002.981648.
Börstler J, Caspersen M E, Nordström M. Beauty and the beast: On the readability of object-oriented example programs. Software Quality Journal, 2016, 24(2): 231 246. DOI: https://doi.org/10.1007/s11219-015-9267-5.
Kannavara R, Havlicek C J, Chen B, Tuttle M R, Cong K, Ray S, Xie F. Challenges and opportunities with concolic testing. In Proc. the 2015 National Aerospace and Electronics Conference (NAECON), Jun. 2015, pp.374–378. DOI: https://doi.org/10.1109/NAECON.2015.7443099.
Qu X, Robinson B. A case study of concolic testing tools and their limitations. In Proc. the 2011 International Symposium on Empirical Software Engineering and Measurement, Sept. 2011, pp.117–126. DOI: https://doi.org/10.1109/ESEM.2011.20.
Galeotti J P, Fraser G, Arcuri A. Improving search-based test suite generation with dynamic symbolic execution. In Proc. the 24th IEEE International Symposium on Software Reliability Engineering (ISSRE), Nov. 2013, pp.360–369. DOI: https://doi.org/10.1109/ISSRE.2013.6698889.
Almasi M M, Hemmati H, Fraser G, Arcuri A, Benefelds J. An industrial evaluation of unit test generation: Finding real faults in a financial application. In Proc. the 39th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), May 2017, pp.263–272. DOI: https://doi.org/10.1109/ICSE-SEIP.2017.27.
Daka E, Campos J, Fraser G, Dorn J, Weimer W. Modeling readability to improve unit tests. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Aug. 2015, pp.107–118. DOI: https://doi.org/10.1145/2786805.2786838.
Panichella S, Panichella A, Beller M, Zaidman A, Gall H C. The impact of test case summaries on bug fixing performance: An empirical investigation. In Proc. the 38th International Conference on Software Engineering, May 2016, pp.547–558. DOI: https://doi.org/10.1145/2884781.2884847.
Roy D, Zhang Z Y, Ma M, Arnaoudova V, Panichella A, Panichella S, Gonzalez D, Mirakhorli M. DeepTC-Enhancer: Improving the readability of automatically generated tests. In Proc. the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), Dec. 2020, pp.287–298. DOI: https://doi.org/10.1145/3324884.3416622.
Zhang Y C, Mesbah A. Assertions are strongly correlated with test suite effectiveness. In Proc. the 10th Joint Meeting on Foundations of Software Engineering, Aug. 2015, pp.214–224. DOI: https://doi.org/10.1145/2786805.2786858.
Watson C, Tufano M, Moran K, Bavota G, Poshyvanyk D. On learning meaningful assert statements for unit test cases. In Proc. the 42nd ACM/IEEE International Conference on Software Engineering, Jun. 2020, pp.1398–1409. DOI: https://doi.org/10.1145/3377811.3380429.
Tufano M, Drain D, Svyatkovskiy A, Sundaresan N. Generating accurate assert statements for unit test cases using pretrained transformers. In Proc. the 3rd ACM/IEEE International Conference on Automation of Software Test, May 2022, pp.54–64. DOI: https://doi.org/10.1145/3524481.3527220.
Cheon Y, Leavens G T. A simple and practical approach to unit testing: The JML and JUnit way. In Proc. the 16th European Conference on Object-Oriented Programming, Jun. 2002, pp.231–255. DOI: https://doi.org/10.1007/3-540-47993-7_10.
Tillmann N, De Halleux J. Pex—White box test generation for.NET. In Proc. the 2nd International Conference on Tests and Proofs, Apr. 2008, pp.134–153. DOI: https://doi.org/10.1007/978-3-540-79124-9_10.
Daka E, Fraser G. A survey on unit testing practices and problems. In Proc. the 25th IEEE International Symposium on Software Reliability Engineering, Nov. 2014, pp.201 -211. DOI: https://doi.org/10.1109/ISSRE.2014.11.
Jaygarl H, Lu K S, Chang C K. GenRed: A tool for generating and reducing object-oriented test cases. In Proc. the 34th IEEE Annual Computer Software and Applications Conference, Jul. 2010, pp.127–136. DOI: https://doi.org/10.1109/COMPSAC.2010.19.
Cruciani E, Miranda B, Verdecchia R, Bertolino A. Scalable approaches for test suite reduction. In Proc. the 41st IEEE/ACM International Conference on Software Engineering (ICSE), May 2019, pp.419–429. DOI: https://doi.org/10.1109/ICSE.2019.00055.
Chetouane N, Wotawa F, Felbinger H, Nica M. On using k-means clustering for test suite reduction. In Proc. the 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Oct. 2020, pp.380–385. DOI: https://doi.org/10.1109/ICSTW50294.2020.00068.
Mues M, Howar F. JDART: Portfolio solving, breadth-first search and SMT-Lib strings (competition contribution). In Proc. the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Mar. 27–Apr. 1, 2021, pp.448–452. DOI: https://doi.org/10.1007/978-3-030-72013-1_30.
Baluda M. EvoSE: Evolutionary symbolic execution. In Proc. the 6th International Workshop on Automating Test Case Design, Selection and Evaluation, Aug. 2015, pp.16–19. DOI: https://doi.org/10.1145/2804322.2804325.
Olsthoorn M, Van Deursen A, Panichella A. Generating highly-structured input data by combining search-based testing and grammar-based fuzzing. In Proc. the 35th IEEE/ACM International Conference on Automated Software Engineering, Dec. 2020, pp.1224–1228. DOI: https://doi.org/10.1145/3324884.3418930.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest The authors declare that they have no conflict of interest.
Additional information
This work was supported by the National Natural Science Foundation of China under Grant Nos. 62072225 and 62025202.
Xiang-Jun Liu is currently pursuing her Master’s degree with the State Key Laboratory for Novel Software Technology and Department of Computer Science and Technology at Nanjing University, Nanjing. Her research interests include software testing, cloud computing, and big data technology.
Ping Yu received her Ph.D. degree in computer science and technology in 2008 from Nanjing University, Nanjing. She is an associate professor with the State Key Laboratory for Novel Software Technology and Department of Computer Science and Technology at Nanjing University, Nanjing. Her research interests include intelligent software engineering, cloud computing, and big data technology.
Xiao-Xing Ma is currently a full professor with the State Key Laboratory for Novel Software Technology and Department of Computer Science and Technology at Nanjing University, Nanjing. From the same university, he received his Ph.D. degree in computer science in 2003. His research interests include self-adaptive software systems, software architectures, and quality assurance for machine learning models used as software components. He co-authored over 100 peer-reviewed papers and served in technical program committees of various international software engineering conferences.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Liu, XJ., Yu, P. & Ma, XX. An Empirical Study on Automated Test Generation Tools for Java: Effectiveness and Challenges. J. Comput. Sci. Technol. 39, 715–736 (2024). https://doi.org/10.1007/s11390-023-1935-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-023-1935-5