[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3512290.3528863acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Effects of imputation strategy on genetic algorithms and neural networks on a binary classification problem

Published: 08 July 2022 Publication History

Abstract

In this paper, we compare the performance of a canonical genetic algorithm (CGA), the Self Adaptive Genetic Algorithm (SAGA), and a feed-forward neural network (FFNN) on a predictive modeling problem with incomplete data. Predictive modeling involves learning relationships between the features and labels of the data points in a dataset. Datasets with missing input values may cause problems for some learning algorithms by biasing the learned models. Imputation refers to techniques for replacing missing data through methods such as statistical probabilities, multivariate analysis, machine learning, or K-nearest neighbors.
We study how imputed datasets impact the ability for CGA, SAGA, and FFNN to learn effective models. Results indicate that imputation method has little effect on CGA and SAGA performance and a noticeable effect on FFNN performance. All three algorithms perform similarly when applied to data imputed by univariate strategies, but FFNN is noticeably worse on data imputed by trained multivariate strategies. With increased quantities of imputed data, test accuracy decreases for all three algorithms while control accuracy remains surprisingly stable in all cases except for FFNN on trained multivariate imputation. Interestingly, CGA and SAGA identify the most relevant input values, even when a large amount of the data is imputed.

References

[1]
Felix Biessmann, Tammo Rukat, Phillipp Schmidt, Prathik Naidu, Sebastian Schelter, Andrey Taptunov, Dustin Lange, and David Salinas. 2019. DataWig: Missing Value Imputation for Tables. Journal of Machine Learning Research 20, 175 (2019), 1--6. http://jmlr.org/papers/v20/18-753.html
[2]
Felix Biessmann, David Salinas, Sebastian Schelter, Philipp Schmidt, and Dustin Lange. 2018. "Deep" Learning for Missing Value Imputationin Tables with Non-Numerical Data. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2017--2025.
[3]
Marvin L Brown and John F Kros. 2003. Data mining and the Impact of Missing Data. Industrial Management & Data Systems (2003).
[4]
Samuel F Buck. 1960. A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer. Journal of the Royal Statistical Society: Series B (Methodological) 22, 2 (1960), 302--306.
[5]
Lovedeep Gondara and Ke Wang. 2018. MIDA: Multiple Imputation Using Denoising Autoencoders. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 260--272.
[6]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. ArXiv Preprint (2014).
[7]
Pierre-Alexandre Mattei and Jes Frellsen. 2019. MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets. In International Conference on Machine Learning. PMLR, 4413--4423.
[8]
Reamonn Norat. 2020. Improving Usability of Genetic Algorithms through Self Adaptation on Static and Dynamic Environments. Electronic Theses and Dissertations (2020).
[9]
Sklearn. 2021. Bayesian Ridge Regression. Scikit-Learn Developers (2021).
[10]
Sklearn. 2021. K-Nearest-Neighbors. Scikit-Learn Developers (2021).
[11]
Sklearn. 2021. Linear Regression. Scikit-Learn Developers (2021).
[12]
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B Altman. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 6 (2001), 520--525.
[13]
Stef Van Buuren and Karin Groothuis-Oudshoorn. 2011. MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45, 1 (2011), 1--67.
[14]
Annie Wu, Xinliang Liu, and Reamonn Norat. 2019. A Genetic Algorithm Approach to Predictive Modeling of Medicare Payments to Physical Therapists, In The Thirty-Second International Flairs Conference. The Thirty-Second International Florida Artificial Intelligence Research Society Conference (FLAIRS-32).
[15]
Hongbao Zhang, Pengtao Xie, and Eric Xing. 2018. Missing Value Imputation Based on Deep Generative Models. ArXiv Preprint (2018).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference
July 2022
1472 pages
ISBN:9781450392372
DOI:10.1145/3512290
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data imputation
  2. evolutionary computation
  3. genetic algorithm
  4. neural networks
  5. real-world applications

Qualifiers

  • Research-article

Conference

GECCO '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 86
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media