More Web Proxy on the site http://driver.im/

research-article

Improving generalisation of AutoML systems with dynamic fitness evaluations

Authors:

Benjamin P. Evans,

Mengjie ZhangAuthors Info & Claims

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

Pages 324 - 332

https://doi.org/10.1145/3377930.3389805

Published: 26 June 2020 Publication History

Abstract

A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iteratively optimise the performance of an internal cross-validation (most often k-fold). While this internal cross-validation hopes to reduce this overfitting, we show we can still risk overfitting to the particular folds used. In this work, we aim to remedy this problem by introducing dynamic fitness evaluations which approximate repeated k-fold cross-validation, at little extra cost over single k-fold, and far lower cost than typical repeated k-fold. The results show that when time equated, the proposed fitness function results in significant improvement over the current state-of-the-art baseline method which uses an internal single k-fold. Furthermore, the proposed extension is very simple to implement on top of existing evolutionary computation methods, and can provide essentially a free boost in generalisation/testing performance.

References

[1]

Yoshua Bengio and Yves Grandvalet. 2004. No unbiased estimator of the variance of k-fold cross-validation. Journal of machine learning research 5, Sep (2004), 1089--1105.

[2]

Remco R Bouckaert and Eibe Frank. 2004. Evaluating the replicability of significance tests for comparing learning algorithms. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 3--12.

[3]

Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research 7, Jan (2006), 1--30.

Digital Library

[4]

Thomas G Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation 10, 7 (1998), 1895--1923.

[5]

Benjamin Evans. 2019. Population-based Ensemble Learning with Tree Structures for Classification. (2019).

[6]

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 2962--2970. http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf

Digital Library

[7]

P. Gijsbers, E. LeDell, S. Poirier, J. Thomas, B. Bischl, and J. Vanschoren. 2019. An Open Source AutoML Benchmark. arXiv preprint arXiv:1907.00909 [cs.LG] (2019). https://arxiv.org/abs/1907.00909 Accepted at AutoML Workshop at ICML 2019.

[8]

Isabelle Guyon, Lisheng Sun-Hosoya, Marc Boullé, Hugo Jair Escalante, Sergio Escalera, Zhengying Liu, Damir Jajetic, Bisakha Ray, Mehreen Saeed, Michèle Sebag, et al. 2019. Analysis of the AutoML Challenge Series 2015--2018. In Automated Machine Learning. Springer, 177--219.

[9]

Gregory S Hornby. 2006. ALPS: the age-layered population structure for reducing the problem of premature convergence. In Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM, 815--822.

Digital Library

[10]

Lars Kotthoff, Chris Thornton, Holger H Hoos, Frank Hutter, and Kevin Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. The Journal of Machine Learning Research 18, 1 (2017), 826--830.

Digital Library

[11]

John R Koza. 1997. Genetic programming. (1997).

[12]

Damjan Krstajic, Ljubomir J Buturovic, David E Leahy, and Simon Thomas. 2014. Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics 6, 1 (2014), 10.

[13]

Ludmila I Kuncheva and James C Bezdek. 1998. Nearest prototype classification: Clustering, genetic algorithms, or random search? IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 28, 1 (1998), 160--164.

Digital Library

[14]

TT Le, W Fu, and JH Moore. 2019. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics (Oxford, England) (2019).

[15]

Claude Nadeau and Yoshua Bengio. 2000. Inference for the generalization error. In Advances in neural information processing systems. 307--313.

[16]

Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (GECCO '16). ACM, New York, NY, USA, 485--492.

Digital Library

[17]

Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore. 2016. Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I. Springer International Publishing, Chapter Automating Biomedical Data Science Through Tree-Based Pipeline Optimization, 123--137.

[18]

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4780--4789.

Digital Library

[19]

Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 847--855.

Digital Library

[20]

Anh Truong, Austin Walters, Jeremy Goodsitt, Keegan Hines, Bayan Bruss, and Reza Farivar. 2019. Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools. arXiv preprint arXiv:1908.05557 (2019).

[21]

Gitte Vanwinckelen and Hendrik Blockeel. 2012. On estimating model accuracy with repeated cross-validation. In BeneLearn 2012: Proceedings of the 21st Belgian-Dutch Conference on Machine Learning. 39--44.

[22]

Peter A Whigham and Grant Dick. 2009. Implicitly controlling bloat in genetic programming. IEEE Transactions on Evolutionary Computation 14, 2 (2009), 173--190.

Digital Library

[23]

Frank Wilcoxon. 1945. Individual Comparisons by Ranking Methods. Biometrics Bulletin 1, 6 (1945), 80--83. http://www.jstor.org/stable/3001968

[24]

Yongli Zhang and Yuhong Yang. 2015. Cross-validation for selecting a model selection procedure. Journal of Econometrics 187, 1 (2015), 95--112.

Cited By

Zhang HZhou AChen QXue BZhang M(2024)SR-Forest: A Genetic Programming-Based Heterogeneous Ensemble Learning MethodIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.324317228:5(1484-1498)Online publication date: Oct-2024
https://doi.org/10.1109/TEVC.2023.3243172

Index Terms

Improving generalisation of AutoML systems with dynamic fitness evaluations
1. Computing methodologies
  1. Machine learning

Recommendations

Generalisation in named entity recognition

Quantitative study of NER performance in diverse corpora of different genres, including newswire and social media.Multiple state of the art NER approaches are tested.Possible reasons for NER failure are analysed and quantified: NE diversity, unseen NEs ...
Task Generalisation in Multi-Agent Reinforcement Learning
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Multi-agent reinforcement learning agents are typically trained in a single environment. As a consequence, they overfit to the training environment which results in sensitivity to perturbations and inability to generalise to similar environments. For ...
Bootstrapping to reduce bloat and improve generalisation in genetic programming
GECCO '13 Companion: Proceedings of the 15th annual conference companion on Genetic and evolutionary computation

Typically, the quality of a solution in Genetic Programming (GP) is represented by a score on a given training sample. However, in Machine Learning, we are most interested in estimating the quality of the evolving individuals on unseen data. In this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

June 2020

1349 pages

ISBN:9781450371285

DOI:10.1145/3377930

General Chair:
Carlos Artemio Coello Coello
CINVESTAV-IPN

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '20

Sponsor:

SIGEVO

GECCO '20: Genetic and Evolutionary Computation Conference

July 8 - 12, 2020

Cancún, Mexico

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
116
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang HZhou AChen QXue BZhang M(2024)SR-Forest: A Genetic Programming-Based Heterogeneous Ensemble Learning MethodIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.324317228:5(1484-1498)Online publication date: Oct-2024
https://doi.org/10.1109/TEVC.2023.3243172

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten