Abstract
Feature selection is a technique used in data pre-processing to select the most relevant subset of features from a larger set, with the goal of improving classification performance. Evolutionary algorithms have been commonly proposed to solve feature selection problems, but they can suffer from issues originated from diversity reduction and crowding distance decrease, which can lead to suboptimal results. In this study, we propose a new evolutionary algorithm called clustering strategy based evolutionary algorithm (CEA) for feature selection in classification. CEA combines the clustering mechanism to gather individuals into different clusters, and the crossover operation is dominated by the parents in different clusters, thus enhancing the exploration ability of the algorithm and avoiding the population falling into the local optimal solution space. The performance of CEA was evaluated on 13 classification datasets and compared to four mainstream evolutionary algorithms. The experimental results showed that CEA was able to achieve better classification performance using similar or fewer features than the other algorithms.
This research was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI under Grant JP22H03643, Japan Science and Technology Agency (JST) Support for Pioneering Research Initiated by the Next Generation (SPRING) under Grant JPMJSP2145, JST through the Establishment of University Fellowships towards the Creation of Science Technology Innovation under Grant JPMJFS2115, and Natural Science Foundation of Jiangsu Province (No. BK20210605).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Frank, E., et al.: WEKA-a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1269–1277. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-09823-4_66
Wang, Z., Gao, S., Zhou, M., Sato, S., Cheng, J., Wang, J.: Information-theory-based nondominated sorting ant colony optimization for multiobjective feature selection in classification. IEEE Trans. Cybern. (2022)
Wang, Z., Gao, S., Zhang, Y., Guo, L.: Symmetric uncertainty-incorporated probabilistic sequence-based ant colony optimization for feature selection in classification. Knowl.-Based Syst. 256, 109874 (2022)
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2015)
Zhan, Z.-H., Shi, L., Tan, K.C., Zhang, J.: A survey on evolutionary computation for complex continuous optimization. Artif. Intell. Rev. 55(1), 59–110 (2021). https://doi.org/10.1007/s10462-021-10042-y
Sudholt, D.: The benefits of population diversity in evolutionary algorithms: a survey of rigorous runtime analyses. In: Theory of Evolutionary Computation. NCS, pp. 359–404. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29414-4_8
Wang, Y., Gao, S., Zhou, M., Yu, Y.: A multi-layered gravitational search algorithm for function optimization and real-world problems. IEEE/CAA J. Automatica Sinica 8(1), 94–109 (2020)
Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43(1), 5–13 (2010)
Xu, H., Xue, B., Zhang, M.: A duplication analysis-based evolutionary algorithm for biobjective feature selection. IEEE Trans. Evol. Comput. 25(2), 205–218 (2020)
Deng, X., Li, Y., Weng, J., Zhang, J.: Feature selection for text classification: a review. Multimed. Tools Appl. 78, 3797–3816 (2019)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Heris, M.K.: Binary and real-coded genetic algorithms in matlab (2015). https://yarpiz.com/23/ypea101-genetic-algorithms
Kumar, V., Kumar, D.: Binary whale optimization algorithm and its application to unit commitment problem. Neural Comput. Appl. 32, 2095–2123 (2020)
Price, K.V.: Differential evolution. In: Zelinka, I., Snášel, V., Abraham, A. (eds.) Handbook of Optimization: From Classical to Modern Approach, pp. 187–214. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-30504-7_8
Bertsimas, D., Tsitsiklis, J.: Simulated annealing. Stat. Sci. 8(1), 10–15 (1993)
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2, 165–193 (2015)
Zeebaree, D.Q., Haron, H., Abdulazeez, A.M., Zeebaree, S.: Combination of k-means clustering with genetic algorithm: a review. Int. J. Appl. Eng. Res. 12(24), 14238–14245 (2017)
Sinha, A., Jana, P.K.: A hybrid mapreduce-based k-means clustering using genetic algorithm for distributed datasets. J. Supercomput. 74(4), 1562–1579 (2018)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Yu, Y., Gao, S., Wang, Y., Todo, Y.: Global optimum-based search differential evolution. IEEE/CAA J. Automatica Sinica 6(2), 379–394 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, B., Wang, Z., Lei, Z., Yu, J., Jin, T., Gao, S. (2023). A Clustering Strategy-Based Evolutionary Algorithm for Feature Selection in Classification. In: Fujita, H., Wang, Y., Xiao, Y., Moonis, A. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2023. Lecture Notes in Computer Science(), vol 13925. Springer, Cham. https://doi.org/10.1007/978-3-031-36819-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-36819-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36818-9
Online ISBN: 978-3-031-36819-6
eBook Packages: Computer ScienceComputer Science (R0)