Abstract
This paper presents a workbench to get simple neural classification models based on product evolutionary networks via a prior data preparation at attribute level by means of filter-based feature selection. Therefore, the computation to build the classifier is shorter, compared to a full model without data pre-processing, which is of utmost importance since the evolutionary neural models are stochastic and different classifiers with different seeds are required to get reliable results. Feature selection is one of the most common techniques for pre-processing the data within any kind of learning task. Six filters have been tested to assess the proposal. Fourteen (binary and multi-class) difficult classification data sets from the University of California repository at Irvine have been established as the test bed. An empirical study between the evolutionary neural network models obtained with and without feature selection has been included. The results have been contrasted with nonparametric statistical tests and show that the current proposal improves the test accuracy of the previous models significantly. Moreover, the current proposal is much more efficient than the previous methodology; the time reduction percentage is above 40%, on average. Our approach has also been compared with several classifiers both with and without feature selection in order to illustrate the performance of the different filters considered. Lastly, a statistical analysis for each feature selector has been performed providing a pairwise comparison between machine learning algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aha D, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, New York
Angeline PJ, Saunders GM, Pollack JB (1994) An evolutionary algorithm that construct recurrent neural networks. IEEE Trans Neural Netw 5(1):54–65
Bache K, Lichman M (2013) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine
Battiti R, Tecchiolli G (1995) Training neural nets with the reactive tabu search. IEEE Trans Neural Netw 6(5):1185–1200
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
Boese KD, Kahng AB (1993) Simulated annealing of neural networks: the cooling strategy reconsidered. In: Proceedings of the IEEE international symposium on circuits and systems (ISCAS 1993), vol 4. IEEE, Chicago, Illinois, USA, pp 2572–2575
Bouckaert RR, Frank E, Hall MA, Holmes G, Pfahringer B, Reutemann P, Witten IH (2010) Weka—experiences with a java open-source project. J Mach Learn Res 11(1):2533–2541
Bridle JS (1990) Probabilistic Interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Fogelman Soulie F, Herault J (eds) Neurocomputing: algorithms, architectures and applications. Springer, Berlin, pp 227–236
Bryson AE, Yu-Chi H (1969) Applied optimal control: Optimization, estimation, and control. Blaisdell Publishing Company, Waltham
Caruana R, Freitag D (1994) Greedy attribute selection. In: Proceedings of the eleventh international conference on machine learning (ICML 1994). Morgan Kaufmann, New Brunswick, NJ, USA, pp 28–36
Cerný V (1985) Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J Optim Theory Appl 45(1):41–51
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Cover T, Thomas J (1991) Elements of information theory. Wiley, New York
Curran D, O’Riordan C (2002) Applying evolutionary computation to designing neural networks: a study of the state of the art. Technical report NUIG-IT-111002, National University of Ireland, Galway, Department of Information Technology
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1):155–176
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Durbin R, Rumelhart DE (1989) Product units: a computationally powerful and biologically plausible extension to backpropagation networks. Neural Comput 1(1):133–142
Embrechts MJ (2001) Computational intelligence for data mining. In: Proceedings of IEEE international conference on systems, man, and cybernetics (SMC 2001), vol 3. IEEE, Los Alamitos, pp 1484–1484
Ferreira CBR, Borges DL (2003) Analysis of mammogram classification using a wavelet transform decomposition. Pattern Recognit Lett 24(7):973–982
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the fifteenth international conference on machine learning (ICML 1998). Morgan Kaufmann, Madison, Wisconsin, USA, pp 144–151
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Fu KS, Min PJ, Li TJ (1970) Feature selection in pattern recognition. IEEE Trans Syst Sci Cybern 6(1):33–39
García-Pedrajas N, Hervás-Martínez C, Muñoz-Pérez J (2002) Multiobjetive cooperative coevolution of artificial neural networks. Neural Netw 15(10):1255–1274
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random Forests for land cover classification. Pattern Recognit Lett 27(4):294–300
Glover F (1977) Heuristics for integer programming using surrogate constraints. Decis Sci 8(1):156–166
Glover F (1986) Future paths for integer programming and links to artificial intelligence. Comput Oper Res 13(5):533–549
Gorunescu F, Belciug S, Gorunescu M, Badea R (2012) Intelligent decision-making for liver fibrosis stadialization based on tandem feature selection and evolutionary-driven neural network. Expert Syst Appl 39(17):12824–12832
Hall MA, Smith LA (1997) Feature subset selection: a correlation based filter approach. In: Proceedings of the 1997 international conference on neural information processing and intelligent information systems. Springer, New Zealand, pp 855–858
Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat Theory Methods 9(6):571–595
Hervás-Martínez C, Martínez-Estudillo FJ, Gutiérrez PA (2006) Classification by means of evolutionary product-unit neural networks. In: Proceedings of the international joint conference on neural networks (IJCNN 2006). IEEE, Vancouver, BC, Canada, pp 2834–2842
Jaeger H (2002) Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD report 159, German National Research Center for Information Technology
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: a tutorial. Computer 29(3):31–44
John GH, Kohavi R, Pfleger K (1994) Irrelevant feature and the subset selection problem. In: Proceedings of the eleventh international conference on machine learning (ICML 1994). Morgan Kaufmann, New Brunswick, NJ, USA, pp 121–129
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–680
Krasnopolsky VM, Fox-Rabinovitz MS (2006) Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Netw 19:122–134
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence (IJCAI 1995), vol 2. Morgan Kaufmann, Montréal, Québec, Canada, pp 1137–1145
Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Koller D, Sahami M (1996) Toward optimal feature selection. In: Proceedings of the thirteenth international conference on machine learning (ICML 1996). Morgan Kaufmann, Bari, Italy, pp 284–292
Kuncheva LI, del Rio Vilas VJ, Rodríguez JJ (2007) Diagnosing scrapie in sheep: a classification experiment. Comput Biol Med 37(8):1194–1202
Kwak N, Choi CH (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159
Larson J, Newman F (2011) An implementation of scatter search to train neural networks for brain lesion recognition. Involve J Math 4(3):203–211
Liu H, Motoda H (2008) Computational methods of feature selection. Chapman & Hall/CRC, Boca Raton
Liu H, Setiono R (1998) Some issues on scalable feature selection. Expert Syst Appl 15(3–4):333–339
Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst Appl 38(4):4600–4607
Martínez-Estudillo FJ, Hervás-Martínez C, Gutiérrez-Peña PA, Martínez-Estudillo AC, Ventura-Soto S (2006) Evolutionary product-unit neural networks for classification. In: Proceedings of the seventh international conference on intelligent data engineering and automated learning (IDEAL 2006). Springer, Burgos, Spain, pp 1320–1328
Miller GF, Todd PM, Hegde SU (1989) Designing neural networks using genetic algorithms. In: Proceedings of the 3rd international conference on genetic algorithms (ICGA 1989). Morgan Kaufmann, George Mason University, Fairfax, Virginia, USA, pp 379–384
Milne L (1995) Feature selection using neural networks with contribution measures. In: Proceedings of the eighth Australian joint conference on artificial intelligence (AI 95). Canberra, Australia, pp 215–221
Murty MN, Devi VS (2011) Pattern recognition: An algorithmic approach. Springer, New York
Nemenyi PB (1963) Distribution-free multiple comparisons. PhD, Princeton University
Ohkura K, Yasuda T, Kawamatsu Y, Matsumura Y, Ueda K (2007) MBEANN: mutation-based evolving artificial neural networks. In: Advances in artificial life, proceedings of the 9th European conference (ECAL 2007). Springer, Lisbon, Portugal, pp 936–945
Parker DB (1985) Learning logic. Technical report TR-47, MIT Center for Research in Computational Economics and Management Science, Cambridge, MA
Prechelt L (1994) Proben1—a set of neural network benchmark problems and benchmarking rules. Technical report 21/94, Fakultat für Informatik, Univ. Karlsruhe, Karlsruhe, Germany
Quinlan J (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco
Rechenberg I (1989) Evolution strategy: Nature’s way of optimization. In: Bergmann HW (ed) Optimization: Methods and applications, possibilities and limitations. Springer, Bonn, pp 106–126
Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2003) Fast feature ranking algorithm. In: Proceedings of the seventh international conference on knowledge-based intelligent information and engineering systems (KES 2003). Springer, Oxford, UK, pp 325–331
Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit 39(12):2383–2392
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, the PDP Research Group (eds) Parallel distributed processing: explorations in the microstructure of cognition (volume 1: foundations). MIT Press, Cambridge, MA, pp 318–362
Schaffer JD, Whitley D, Eshelman LJ (1992) Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: Proceedings of the international workshop on combinations of genetic algorithms and neural networks (COGANN 1992). IEEE Society Press, Los Alamitos, CA, pp 1–37
Sethi IK, Jain AK (2014) Artificial neural networks and statistical pattern recognition: Old and new connections. Machine intelligence and pattern recognition series, vol 11. Elsevier, Amsterdam
Sexton R, Dorsey R, Johnson J (1999) Optimization of neural networks: a comparative analysis of the genetic algorithm and simulated annealing. Eur J Oper Res 114(3):589–601
Tallón-Ballesteros AJ, Gutiérrez-Peña PA, Hervás-Martínez C (2007) Distribution of the search of evolutionary product unit neural networks for classification. In: Proceedings of the IADIS international conference on applied computing (AC 2007). IADIS, Salamanca, Spain, pp 266–273
Tallón-Ballesteros AJ, Hervás-Martínez C (2011) A two-stage algorithm in evolutionary product unit neural networks for classification. Expert Syst Appl 38(1):743–754
Tallón-Ballesteros AJ, Hervás-Martínez C, Riquelme JC, Ruiz R (2013) Feature selection to enhance a two-stage evolutionary algorithm in product unit neural networks for complex classification problems. Neurocomputing 114:107–117
Towell GG, Shavlik JW (1994) Knowledge-based artificial neural networks. Artif Intell 70(1–2):119–165
Vapnik VN (1995) The nature of statistical learning theory. Springer, Heidelberg
Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioural sciences. PhD thesis, Harvard University, Boston
Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Proceedings of the international conference on machine learning (ICML 2001). Morgan Kaufmann, San Francisco, CA, pp 601–608
Yao X, Liu Y (1997) A new evolutionary system for evolving artificial neural networks. IEEE Trans Neural Netw 8(3):694–713
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Zhen S, Jianlin C, Di T, Zhou YCT (2004) Comparison of steady state and elitist selection genetic algorithms. In: Proceedings of international conference on intelligent mechatronics and automation (ICMA 2004). IEEE, pp 495–499
Acknowledgements
This work has been partially subsidised by TIN2011-28956-C02-02 and TIN2014-55894-C2-R projects of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT) and FEDER funds.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tallón-Ballesteros, A.J., Riquelme, J.C. & Ruiz, R. Filter-based feature selection in the context of evolutionary neural networks in supervised machine learning. Pattern Anal Applic 23, 467–491 (2020). https://doi.org/10.1007/s10044-019-00798-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00798-z