Abstract
In classification and clustering problems, feature selection techniques can be used to reduce the dimensionality of the data and increase the performances. However, feature selection is a challenging task, especially when hundred or thousands of features are involved. In this framework, we present a new approach for improving the performance of a filter-based genetic algorithm. The proposed approach consists of two steps: first, the available features are ranked according to a univariate evaluation function; then the search space represented by the first M features in the ranking is searched using a filter-based genetic algorithm for finding feature subsets with a high discriminative power.
Experimental results demonstrated the effectiveness of our approach in dealing with high dimensional data, both in terms of recognition rate and feature number reduction.
Similar content being viewed by others
Notes
- 1.
Note that the same holds also for the feature-class correlation.
References
Nips 2003 workshop on feature extraction and feature selection challenge (2003). http://clopinet.com/isabelle/Projects/NIPS2003
Bermejo, P., Gámez, J.A., Puerta, J.M.: Improving incremental wrapper-based subset selection via replacement and early stopping. IJPRAI 25(5), 605–625 (2011)
Cordella, L.P., De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: Combining single class features for improving performance of a two stage classifier. In: 20th International Conference on Pattern Recognition (ICPR 2010), pp. 4352–4355. IEEE Computer Society (2010)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)
De Stefano, C., Fontanella, F., Marrocco, C.: A GA-based feature selection algorithm for remote sensing images. In: Giacobini, M., et al. (eds.) EvoWorkshops 2008. LNCS, vol. 4974, pp. 285–294. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78761-7_29
De Stefano, C., Fontanella, F., Maniaci, M., Scotto di Freca, A.: A method for scribe distinction in medieval manuscripts using page layout features. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011. LNCS, vol. 6978, pp. 393–402. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24085-0_41
Gütlein, M., Frank, E., Hall, M., Karwath, A.: Large scale attribute selection using wrappers. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009) (2009)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)
Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540, April 1997
Lee, J.S., Oh, I.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)
Li, R., Lu, J., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23(3), 195–201 (2010)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: ICTAI, pp. 88–91. IEEE Computer Society, Washington, DC (1995)
Manimala, K., Selvi, K., Ahila, R.: Hybrid soft computing techniques for feature selection and parameter optimization in power quality data mining. Appl. Soft Comput. 11(8), 5485–5497 (2011). http://www.sciencedirect.com/science/article/pii/S1568494611001694
Ochoa, G.: Error thresholds in genetic algorithms. Evol. Comput. 14(2), 157–182 (2006)
Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4, Part 2), 2052–2064 (2014)
Spolaôr, N., Lorena, A.C., Lee, H.D.: Multi-objective genetic algorithm evaluation in feature selection. In: Takahashi, R.H.C., Deb, K., Wanner, E.F., Greco, S. (eds.) EMO 2011. LNCS, vol. 6576, pp. 462–476. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19893-9_32
Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft Comput. 12(2), 111–120 (2007)
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)
Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn. Lett. 30(5), 525–534 (2009)
Zhai, Y., Ong, Y.S., Tsang, I.: The emerging “big dimensionality”. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
De Stefano, C., Fontanella, F., Scotto di Freca, A. (2017). Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-55849-3_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55848-6
Online ISBN: 978-3-319-55849-3
eBook Packages: Computer ScienceComputer Science (R0)