Abstract
Nowadays, there is an increasing interest in automating KDD processes. Thanks to the increasing power and costs reduction of computation devices, the search of best features and model parameters can be solved with different meta-heuristics. Thus, researchers can be focused in other important tasks like data wrangling or feature engineering. In this contribution, GAparsimony R package is presented. This library implements GA-PARSIMONY methodology that has been published in previous journals and HAIS conferences. The objective of this paper is to show how to use GAparsimony for searching accurate parsimonious models by combining feature selection, hyperparameter optimization, and parsimonious model search. Therefore, this paper covers the cautions and considerations required for finding a robust parsimonious model by using this package and with a regression example that can be easily adapted for another problem, database or algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antonanzas-Torres, F., Urraca, R., Antonanzas, J., Fernandez-Ceniceros, J., de Pison, F.M.: Generation of daily global solar irradiation with support vector machines for regression. Energy Convers. Manag. 96, 277–286 (2015)
Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., Cox, D.D.: Hyperopt: a python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 8(1), 014008 (2015)
Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., Jones, Z.M.: mlr: Machine learning in R. J. Mach. Learn. Res. 17(170), 1–5 (2016)
Fernandez-Ceniceros, J., Sanz-Garcia, A., Antonanzas-Torres, F., de Pison, F.M.: A numerical-informational approach for characterising the ductile behaviour of the t-stub component. part 2: parsimonious soft-computing-based metamodel. Eng. Struct. 82, 249–260 (2015)
Gorissen, D., Couckuyt, I., Demeester, P., Dhaene, T., Crombecq, K.: A surrogate modeling and adaptive sampling toolbox for computer based design. J. Mach. Learn. Res. 11, 2051–2055 (2010)
Hashem, I.A., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Ullah Khan, S.: The rise of big data on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)
Martinez-de-Pison, F.: GAparsimony package for R (2017). https://github.com/jpison/GAparsimony
Michalewicz, Z., Janikow, C.Z.: Handling constraints in genetic algorithms. In: ICGA, pp. 151–157 (1991)
Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, pp. 485–492. ACM, New York (2016)
Sanz-Garcia, A., Fernandez-Ceniceros, J., Antonanzas-Torres, F., Pernia-Espinoza, A., Martinez-de Pison, F.J.: GA-PARSIMONY: A GA-SVR approach with feature selection and parameter optimization to obtain parsimonious solutions for predicting temperature settings in a continuous annealing furnace. Appl. Soft Comput. 35, 13–28 (2015)
Sanz-García, A., Fernández-Ceniceros, J., Antoñanzas-Torres, F., Martínez-de Pisón, F.J.: Parsimonious support vector machines modeling for set points in industrial processes based on genetic algorithm optimization. In: International Joint Conference SOCO13-CISIS13-ICEUTE13, Advances in Intelligent Systems and Computing, vol. 239, pp. 1–10. Springer. Cham (2014)
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, pp. 847–855. ACM, New York (2013)
Urraca, R., Sodupe-Ortega, E., Antonanzas, J., Antonanzas-Torres, F., de Pison, F.M.: Evaluation of a novel ga-based methodology for model structure selection: the ga-parsimony. Neurocomputing 271, 9–17 (2018)
Urraca, R., Sanz-Garcia, A., Fernandez-Ceniceros, J., Sodupe-Ortega, E., Martinez-de-Pison, F.J.: Improving hotel room demand forecasting with a hybrid GA-SVR methodology based on skewed data transformation, feature selection and parsimony tuning. In: Onieva, E., Santos, I., Osaba, E., Quintián, H., Corchado, E. (eds.) HAIS 2015. LNCS (LNAI), vol. 9121, pp. 632–643. Springer, Cham (2015)
Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Am. Stat. Assoc. 93(441), 120–131 (1998)
Acknowledgements
We are greatly indebted to Banco Santander for the APPI17/04 fellowship and to the University of La Rioja for the EGI16/19 fellowship. Also, A. Pernia wants to express her gratitude with the Instituto de Estudios Riojanos (IER) for the fellowship. This work used the Beronia cluster (Universidad de La Rioja), which is supported by FEDER-MINECO grant number UNLR-094E-2C-225.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Martinez-de-Pison, F.J., Gonzalez-Sendino, R., Ferreiro, J., Fraile, E., Pernia-Espinoza, A. (2018). GAparsimony: An R Package for Searching Parsimonious Models by Combining Hyperparameter Optimization and Feature Selection. In: de Cos Juez, F., et al. Hybrid Artificial Intelligent Systems. HAIS 2018. Lecture Notes in Computer Science(), vol 10870. Springer, Cham. https://doi.org/10.1007/978-3-319-92639-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-92639-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92638-4
Online ISBN: 978-3-319-92639-1
eBook Packages: Computer ScienceComputer Science (R0)