[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3368089.3409730acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Model-based exploration of the frontier of behaviours for deep learning system testing

Published: 08 November 2020 Publication History

Abstract

With the increasing adoption of Deep Learning (DL) for critical tasks, such as autonomous driving, the evaluation of the quality of systems that rely on DL has become crucial. Once trained, DL systems produce an output for any arbitrary numeric vector provided as input, regardless of whether it is within or outside the validity domain of the system under test. Hence, the quality of such systems is determined by the intersection between their validity domain and the regions where their outputs exhibit a misbehaviour.
In this paper, we introduce the notion of frontier of behaviours, i.e., the inputs at which the DL system starts to misbehave. If the frontier of misbehaviours is outside the validity domain of the system, the quality check is passed. Otherwise, the inputs at the intersection represent quality deficiencies of the system. We developed DeepJanus, a search-based tool that generates frontier inputs for DL systems. The experimental results obtained for the lane keeping component of a self-driving car show that the frontier of a well trained system contains almost exclusively unrealistic roads that violate the best practices of civil engineering, while the frontier of a poorly trained one includes many valid inputs that point to serious deficiencies of the system.

Supplementary Material

Auxiliary Teaser Video (fse20main-p441-p-teaser.mp4)
This is a presentation video of my talk at FSE 2020 on our paper accepted in the research track. In this paper, we introduce the notion of frontier of behaviours, i.e., the inputs at which the deep learningsystem starts to misbehave. If the frontier of misbehaviours is outside the validity domain of the system, the quality check is passed. Otherwise, the inputs at the intersection represent quality deficiencies of the system. We developed DeepJanus, a search-based tool that generates frontier inputs for deep learningsystems. The experimental results obtained for the lane keeping component of a self-driving car show that the frontier of a well trained system contains almost exclusively unrealistic roads that violate the best practices of civil engineering, while the frontier of a poorly trained one includes many valid inputs that point to serious deficiencies of the system.
Auxiliary Presentation Video (fse20main-p441-p-video.mp4)
This is a presentation video of my talk at FSE 2020 on our paper accepted in the research track. In this paper, we introduce the notion of frontier of behaviours, i.e., the inputs at which the deep learningsystem starts to misbehave. If the frontier of misbehaviours is outside the validity domain of the system, the quality check is passed. Otherwise, the inputs at the intersection represent quality deficiencies of the system. We developed DeepJanus, a search-based tool that generates frontier inputs for deep learningsystems. The experimental results obtained for the lane keeping component of a self-driving car show that the frontier of a well trained system contains almost exclusively unrealistic roads that violate the best practices of civil engineering, while the frontier of a poorly trained one includes many valid inputs that point to serious deficiencies of the system.

References

[1]
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE. 63-74.
[2]
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing Vision-based Control Systems Using Learnable Evolutionary Algorithms. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE '18). ACM, New York, NY, USA, 1016-1026. https://doi.org/10.1145/3180155.3180160
[3]
Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing Autonomous Cars for Feature Interaction Failures Using Many-objective Search. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) ( ASE 2018). ACM, New York, NY, USA, 143-154. https://doi.org/10.1145/3238147.3238192
[4]
Phillip J. Barry and Ronald N. Goldman. 1988. A Recursive Evaluation Algorithm for a Class of Catmull-Rom Splines. SIGGRAPH Comput. Graph. 22, 4 ( June 1988 ), 199-204. https://doi.org/10.1145/378456.378511
[5]
BeamNG GmbH. [n.d.]. BeamNG.research. https://www.beamng.gmbh/research
[6]
Tara S. Behrend, David J. Sharek, Adam W. Meade, and Eric N. Wiebe. 2011. The viability of crowdsourcing for survey research. Behavior Research Methods 43, 3 ( 25 Mar 2011 ), 800. https://doi.org/10.3758/s13428-011-0081-0
[7]
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to End Learning for Self-Driving Cars. CoRR abs/1604.07316 ( 2016 ). arXiv: 1604.07316 http://arxiv.org/abs/1604.07316
[8]
Edwin Catmull and Raphael Rom. 1974. A Class of Local Interpolating Splines. In Computer Aided Geometric Design, R. E. Barnhill and R. F. Riesenfeld (Eds.). Academic Press, 317-326. https://doi.org/10.1016/B978-0-12-079050-0. 50020-5
[9]
Chenyi Chen, Ari Sef, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning afordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722-2730.
[10]
C. J. Clopper and E. S. Pearson. 1934. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika 26, 4 ( 12 1934 ), 404-413. https://doi.org/10.1093/biomet/26.4. 404
[11]
Edwin D. de Jong. 2004. The Incremental Pareto-Coevolution Archive. In Genetic and Evolutionary Computation-GECCO 2004, Kalyanmoy Deb (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 525-536.
[12]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (April 2002 ), 182-197. https://doi.org/10.1109/4235.996017
[13]
Ronald Aylmer Fisher. 1992. Statistical methods for research workers. In Breakthroughs in statistics. Springer, 66-70.
[14]
International Organization for Standardization (ISO ). 2019. ISO/PAS 21448: Road vehicles-Safety of the intended functionality.
[15]
Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. J. Mach. Learn. Res. 13, 1 ( July 2012 ), 2171-2175. http://dl.acm.org/citation. cfm?id= 2503308. 2503311
[16]
Alessio Gambi, Marc Müller, and Gordon Fraser. 2019. Automatically testing selfdriving cars with search-based procedural content generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA. Association for Computing Machinery, New York, NY, USA, 318-328. https://doi.org/10.1145/3293882.3330566
[17]
Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, MA, USA. http://www.deeplearningbook.org.
[18]
Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. DLFuzz: diferential fuzzing testing of deep learning systems. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE. Association for Computing Machinery, New York, NY, USA, 739-743. https://doi.org/10.1145/ 3236024.3264835
[19]
Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based Software Engineering: Trends, Techniques and Applications. ACM Comput. Surv. 45, 1, Article 11 ( Dec. 2012 ), 61 pages. https://doi.org/10.1145/2379776.2379787
[20]
M. Hazewinkel. 1997. Encyclopaedia of Mathematics: Supplement Volume 1. Number v. 1 in Encyclopaedia of Mathematics. Springer Netherlands. https: //doi.org/10.1007/ 978-94-015-1288-6
[21]
Jefrey Heer and Michael Bostock. 2010. Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) ( CHI '10). ACM, New York, NY, USA, 203-212. https://doi.org/10.1145/ 1753326.1753357
[22]
Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of 42nd International Conference on Software Engineering (ICSE '20). ACM, 12 pages.
[23]
ISO. 2011. Road vehicles-Functional safety.
[24]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering, ICSE. 1039-1049.
[25]
Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI '08). ACM, New York, NY, USA, 453-456. https://doi.org/10.1145/1357054.1357127
[26]
Kiran Lakhotia, Mark Harman, and Phil McMinn. 2007. A Multi-objective Approach to Search-based Test Data Generation. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (London, England) (GECCO '07). ACM, New York, NY, USA, 1098-1105. https://doi.org/10.1145/1276958. 1277175
[27]
Craig Larman. 1997. Applying UML and Patterns: An Introduction to ObjectOriented Analysis and Design. Prentice Hall.
[28]
Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Hafner, et al. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 ( 1998 ), 2278-2324.
[29]
Joel Lehman and Kenneth O. Stanley. 2011. Abandoning Objectives: Evolution Through the Search for Novelty Alone. Evolutionary Computation 19, 2 ( 2011 ), 189-223. https://doi.org/10.1162/EVCO_a_00025
[30]
Joel Lehman and Kenneth O. Stanley. 2011. Evolving a Diversity of Virtual Creatures Through Novelty Search and Local Competition. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (Dublin, Ireland) ( GECCO '11). ACM, New York, NY, USA, 211-218. https://doi.org/10. 1145/2001576.2001606
[31]
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707-710.
[32]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity Testing Criteria for Deep Learning Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) ( ASE 2018). ACM, New York, NY, USA, 120-131. https: //doi.org/10.1145/3238147.3238202
[33]
Ke Mao, Licia Capra, Mark Harman, and Yue Jia. 2017. A survey of the use of crowdsourcing in software engineering. Journal of Systems and Software 126 ( 2017 ), 57-84. https://doi.org/10.1016/j.jss. 2016. 09.015
[34]
Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective Automated Testing for Android Applications. In Proceedings of the 25th International Symposium on Software Testing and Analysis (Saarbrücken, Germany) ( ISSTA 2016). ACM, New York, NY, USA, 94-105. https://doi.org/10.1145/2931037.2931054
[35]
B. Marculescu, R. Feldt, and R. Torkar. 2016. Using Exploration Focused Techniques to Augment Search-Based Software Testing: An Experimental Evaluation. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). 69-79. https://doi.org/10.1109/ICST. 2016.26
[36]
Jean-Baptiste Mouret. 2011. Novelty-Based Multiobjectivization. In New Horizons in Evolutionary Robotics, Stéphane Doncieux, Nicolas Bredèche, and Jean-Baptiste Mouret (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 139-154.
[37]
Jean-Baptiste Mouret and Jef Clune. 2015. Illuminating search spaces by mapping elites. arXiv:1504.04909 [cs.AI]
[38]
J. A. Nelder and R. W. M. Wedderburn. 1972. Generalized Linear Models. Journal of the Royal Statistical Society: Series A (General) 135, 3 ( 1972 ), 370-384. https: //doi.org/10.2307/2344614
[39]
American Association of State Highway and Transportation Oficials. 2018. AASHTO Green Book (GDHS-7)-A Policy on Geometric Design of Highways and Streets. American Association of State Highway and Transportation Oficials.
[40]
Annibale Panichella, Fitsum Meshesha Kifetew, and Paolo Tonella. 2018. Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets. IEEE Transactions on Software Engineering 44, 2 ( 2018 ), 122-158.
[41]
F. Pastore, L. Mariani, and G. Fraser. 2013. CrowdOracles: Can the Crowd Solve the Oracle Problem?. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 342-351. https://doi.org/10.1109/ICST. 2013.13
[42]
Eyal Peer, Joachim Vosgerau, and Alessandro Acquisti. 2014. Reputation as a suficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods 46, 4 ( 01 Dec 2014 ), 1023-1031. https://doi.org/10.3758/s13428-013-0434-y
[43]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1-18.
[44]
Vincenzo Riccio, Gunel Jahangirova, Andrea Stocco, Nargiz Humbatova, Michael Weiss, and Paolo Tonella. 2020. Testing Machine Learning based Systems: A Systematic Mapping. Empirical Software Engineering ( 2020 ). https://doi.org/10. 1007/s10664-020-09881-0
[45]
P. Selinger. 2003. Potrace: a polygon-based tracing algorithm. ( 2003 ). http://potrace.sourceforge.net/potrace.pdf
[46]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-neural-network-driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE '18). ACM, New York, NY, USA, 303-314. https://doi.org/10.1145/3180155.3180220
[47]
DeepJanus 2019. DeepJanus: A Tool for Model-based Exploration of the Frontier of Behaviours for Deep Learning Systems Testing. https://github.com/ testingautomated-usi/DeepJanus.
[48]
Unity Technologies. 2019. Unity. https://unity.com
[49]
Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. 2018. Feature-Guided Black-Box Safety Testing of Deep Neural Networks. In Tools and Algorithms for the Construction and Analysis of Systems-24th International Conference, TACAS. 408-426.
[50]
Shin Yoo and Mark Harman. 2007. Pareto Eficient Multi-objective Test Case Selection. In Proceedings of the 2007 International Symposium on Software Testing and Analysis (London, United Kingdom) (ISSTA '07). ACM, New York, NY, USA, 140-150. https://doi.org/10.1145/1273463.1273483
[51]
Shin Yoo and Mark Harman. 2010. Using hybrid algorithm for Pareto eficient multi-objective test suite minimisation. Journal of Systems and Software 83, 4 ( 2010 ), 689-701. https://doi.org/10.1016/j.jss. 2009. 11.706
[52]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE. 132-142.

Cited By

View all
  • (2025)Reinforcement learning for online testing of autonomous driving systems: a replication and extension studyEmpirical Software Engineering10.1007/s10664-024-10562-530:1Online publication date: 1-Feb-2025
  • (2024)Present Development of Software for Railway SafetyDESIGN, CONSTRUCTION, MAINTENANCE10.37394/232022.2024.4.34(19-28)Online publication date: 25-Jun-2024
  • (2024)LeGEND: A Top-Down Approach to Scenario Generation of Autonomous Driving Systems Assisted by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695520(1497-1508)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. Model-based exploration of the frontier of behaviours for deep learning system testing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    November 2020
    1703 pages
    ISBN:9781450370431
    DOI:10.1145/3368089
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 November 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. deep learning
    2. model based testing
    3. search based software engineering
    4. software testing

    Qualifiers

    • Research-article

    Funding Sources

    • European Research Council - Advanced Grant (ERC-AdG)

    Conference

    ESEC/FSE '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)133
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Reinforcement learning for online testing of autonomous driving systems: a replication and extension studyEmpirical Software Engineering10.1007/s10664-024-10562-530:1Online publication date: 1-Feb-2025
    • (2024)Present Development of Software for Railway SafetyDESIGN, CONSTRUCTION, MAINTENANCE10.37394/232022.2024.4.34(19-28)Online publication date: 25-Jun-2024
    • (2024)LeGEND: A Top-Down Approach to Scenario Generation of Autonomous Driving Systems Assisted by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695520(1497-1508)Online publication date: 27-Oct-2024
    • (2024)In-Simulation Testing of Deep Learning Vision Models in Autonomous Robotic ManipulatorsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695281(2187-2198)Online publication date: 27-Oct-2024
    • (2024)Bridging the Gap between Real-world and Synthetic Images for Testing Autonomous Driving SystemsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695067(732-744)Online publication date: 27-Oct-2024
    • (2024)Neuron Semantic-Guided Test Generation for Deep Neural Networks FuzzingACM Transactions on Software Engineering and Methodology10.1145/368883534:1(1-38)Online publication date: 14-Aug-2024
    • (2024)Reinforcement Learning Informed Evolutionary Search for Autonomous Systems TestingACM Transactions on Software Engineering and Methodology10.1145/368046833:8(1-45)Online publication date: 27-Jul-2024
    • (2024)GIST: Generated Inputs Sets Transferability in Deep LearningACM Transactions on Software Engineering and Methodology10.1145/367245733:8(1-38)Online publication date: 13-Jun-2024
    • (2024)Keeper: Automated Testing and Fixing of Machine Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/367245133:7(1-33)Online publication date: 13-Jun-2024
    • (2024)Focused Test Generation for Autonomous Driving SystemsACM Transactions on Software Engineering and Methodology10.1145/366460533:6(1-32)Online publication date: 27-Jun-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media