More Web Proxy on the site http://driver.im/

research-article

On Using Decision Tree Coverage Criteria forTesting Machine Learning Models

Authors:

Sebastião Santos,

Beatriz Silveira,

Vinicius Durelli,

Rafael Durelli,

Marcio DelamaroAuthors Info & Claims

SAST '21: Proceedings of the 6th Brazilian Symposium on Systematic and Automated Software Testing

Pages 1 - 9

https://doi.org/10.1145/3482909.3482911

Published: 12 October 2021 Publication History

Abstract

Over the past decade, there has been a growing interest in applying machine learning (ML) to address a myriad of tasks. Owing to this interest, the adoption of ML-based systems has gone mainstream. However, this widespread adoption of ML-based systems poses new challenges for software testers that must improve the quality and reliability of these ML-based solutions. To cope with the challenges of testing ML-based systems, we propose novel test adequacy criteria based on decision tree models. Differently from the traditional approach to testing ML models, which relies on manual collection and labelling of data, our criteria leverage the internal structure of decision tree models to guide the selection of test inputs. Thus, we introduce decision tree coverage (DTC) and boundary value analysis (BVA) as approaches to systematically guide the creation of effective test data that exercises key structural elements of a given decision tree model. To evaluate these criteria, we carried out an experiment using 12 datasets. We measured the effectiveness of test inputs in terms of the difference in model’s behavior between the test input and the training data. The experiment results indicate that our testing criteria can be used to guide the generation of effective test data.

References

[1]

Mauricio Aniche, Erick Maziero, Rafael S. Durelli, and Vinicius H. S. Durelli. 2020. The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring. IEEE Transactions on Software Engineering (Early Access) 00, 01(2020), 1–15.

[2]

Houssem Ben Braiek and Foutse Khomh. 2020. On testing machine learning programs. Journal of Systems and Software 164, 00 (2020), 110542.

[3]

Jaganmohan Chandrasekaran, Huadong Feng, Yu Lei, Raghu Kacker, and D. Richard Kuhn. 2020. Effectiveness of Dataset Reduction in Testing Machine Learning Algorithms. In 2nd IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, New York, NY, 133–140.

[4]

V. H. S. Durelli, R. S. Durelli, S. S. Borges, A. T. Endo, M. M. Eler, D. R. C. Dias, and M. P. Guimarães. 2019. Machine Learning Applied to Software Testing: A Systematic Mapping Study. IEEE Transactions on Reliability 68, 3 (2019), 1189–1212.

[5]

R. A. Fisher. 1936. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, 2 (1936), 179–188.

[6]

Q. Hu, L. Ma, X. Xie, B. Yu, Y. Liu, and J. Zhao. 2019. DeepMutation++: A Mutation Testing Framework for Deep Learning Systems. In 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, New York, NY, 1157–1161.

[7]

Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: with Applications in R. Springer Texts in Statistics, New York, NY.

[8]

Zenan Li, Xiaoxing Ma, Chang Xu, and Chun Cao. 2019. Structural Coverage Criteria for Neural Networks Could Be Misleading. In 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). ACM, New York, NY, 89–92.

[9]

L. Ma, F. Juefei-Xu, M. Xue, B. Li, L. Li, Y. Liu, and J. Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, New York, NY, 614–618.

[10]

L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li, Y. Liu, J. Zhao, and Y. Wang. 2018. DeepMutation: Mutation Testing of Deep Learning Systems. In 29th IEEE International Symposium on Software Reliability Engineering (ISSRE). IEEE, New York, NY, 100–111.

[11]

D. Marijan, A. Gotlieb, and M. Kumar Ahuja. 2019. Challenges of Testing Machine Learning Based Systems. In 1st IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, New York, NY, 101–102.

[12]

A. C. Müller and Sarah Guido. 2016. Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media, Sebastopol, CA.

[13]

Glenford J. Myers, Corey Sandler, and Tom Badgett. 2011. The Art of Software Testing. Wiley, Hoboken, NJ.

Digital Library

[14]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2019. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Commun. ACM 62, 11 (2019), 137–145.

Digital Library

[15]

V. Riccio, G. Jahangirova, A. Stocco, N. Humbatova, M. Weiss, and P. Tonella. 2020. Testing Machine Learning Based Systems: A Systematic Mapping. Empirical Software Engineering 25, 6 (2020), 5193–5254.

Digital Library

[16]

Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and Rob Ashmore. 2019. Structural Test Coverage Criteria for Deep Neural Networks. ACM Transactions on Embedded Computing Systems 18, 5 (2019), 320–321.

Digital Library

[17]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In 40th International Conference on Software Engineering (ICSE). ACM, New York, NY, 303–314.

Digital Library

[18]

Scott W. VanderStoep and Deidre D. Johnson. 2008. Research Methods for Everyday Life: Blending Qualitative and Quantitative Approaches. Jossey-Bass, San Francisco, CA.

[19]

Dong Wang, Ziyuan Wang, Chunrong Fang, Yanshan Chen, and Zhenyu Chen. 2019. DeepPath: Path-Driven Testing Criteria for Deep Neural Networks. In 1st IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, New York, NY, 119–120.

[20]

C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. 2012. Experimentation in Software Engineering. Springer, New York, NY.

[21]

J. M. Zhang, M. Harman, L. Ma, and Y. Liu. 2020. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering (Early Access) 00, 00(2020), 1–37.

Cited By

On Using Decision Tree Coverage Criteria forTesting Machine Learning Models
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Test-Suite Reduction and Prioritization for Modified Condition/Decision Coverage
ICSM '01: Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)

Software testing is particularly expensive for developers of high-assurance software, such as software that is produced for commercial airborne systems. One reason for this expense is the Federal Aviation Administration's requirement that test suites be ...
Checked Coverage and Object Branch Coverage: New Alternatives for Assessing Student-Written Tests
SIGCSE '15: Proceedings of the 46th ACM Technical Symposium on Computer Science Education

Many educators currently use code coverage metrics to assess student-written software tests. While test adequacy criteria such as statement or branch coverage can also be used to measure the thoroughness of a test suite, they have limitations. Coverage ...
Comparing non-adequate test suites using coverage criteria
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

SAST '21: Proceedings of the 6th Brazilian Symposium on Systematic and Automated Software Testing

September 2021

63 pages

ISBN:9781450385039

DOI:10.1145/3482909

Editors:
Cristiano Vasconcellos
Universidade do Estado de Santa Catarina, Brazil
,
Karina Roggia
Universidade do Estado de Santa Catarina, Brazil
,
Paulo Bousfield
Universidade da Região de Joinville, Brazil
,
Vanessa Collere
Universidade da Região de Joinville, Brazil
,
Marcelo Eler
EACH-USP, Brazil
,
Wesley K. G. Assunção
PUC-Rio, Brazil

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

CAPES
FAPESP
CNPq

Conference

SAST'21

SAST'21: Brazilian Symposium on Systematic and Automated Software Testing

September 27 - October 1, 2021

Joinville, Brazil

Acceptance Rates

Overall Acceptance Rate 45 of 92 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
156
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)8

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents