[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Feature selection using metaheuristics made easy: : Open source MAFESE library in Python

Published: 18 October 2024 Publication History

Abstract

Artificial intelligence (AI) often relies on feature selection (FS) to recognize and highlight the most relevant and major features in a dataset. The procedure of training and optimizing AI systems with key data points is decisive for its development and efficacy. To address this challenge, the present study introduces MAFESE, an open-source Python library that employs metaheuristic algorithms for selecting the optimal set of attributes, particularly when dealing with complex and high-dimensional data. MAFESE encompasses a wide range of feature selection techniques, including unsupervised-based, filter-based, embedded-based, and wrapper-based methods. Notably, within the wrapper-based category, MAFESE offers users access to over 200 metaheuristic algorithms, empowering them to choose the most suitable algorithm based on their specific datasets and requirements. Additionally, MAFESE incorporates built-in evaluation metrics that enable efficient comparison among various algorithms. Open-source design of MAFESE encourages cooperation within the data science community, allowing for continuous upgrades. This collaborative environment promotes the sharing of ideas, proposals, and changes, resulting in a stronger and more adaptive feature selection framework. MAFESE distinguishes itself with an easy-to-use Python interface that follows object-oriented programming concepts. It supports both experienced researchers and practitioners. MAFESE offers many resources, including documentation, examples, and test cases, for a smooth user onboarding experience. The modular architecture enables users to enhance features and interface with other tools, such as scikit-learn. MAFESE can be a valuable tool for identifying meaningful features in complicated, high-dimensional datasets, making it a significant addition to feature selection. MAFESE utilizes metaheuristic algorithms to help users tackle complex feature selection problems more efficiently and accurately. Through openness and teamwork, MAFESE may become a comprehensive resource that responds to the evolving demands of the data science community. The source code and materials of MAFESE are publicly available in the official GitHub repository at https://github.com/thieu1995/mafese.

Highlights

We propose a user-friendly, and flexible feature selection software called MAFESE.
MAFESE provides unsupervised, embedded, filter, and wrapper-based feature selection.
MAFESE offers over 200 metaheuristic algorithms for tailored dataset-specific selections.
Built-in evaluation metrics for efficient comparison of feature selection algorithms.
Modular and well-documented, promoting community collaboration and growth.

References

[1]
Chandrashekar G., Sahin F., A survey on feature selection methods, Comput. Electr. Eng. 40 (1) (2014) 16–28,. URL: https://linkinghub.elsevier.com/retrieve/pii/S0045790613003066.
[2]
Li J., Cheng K., Wang S., Morstatter F., Trevino R.P., Tang J., Liu H., Feature selection: A data perspective, ACM Comput. Surv. 50 (6) (2018) 1–45,. URL: https://dl.acm.org/doi/10.1145/3136625.
[3]
Khaire U.M., Dhanalakshmi R., Stability of feature selection algorithm: A review, J King Saud Univ Comput. Inf. Sci. 34 (4) (2022) 1060–1073,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1319157819304379.
[4]
Solorio-Fernández S., Carrasco-Ochoa J.A., Martínez-Trinidad J.F., A review of unsupervised feature selection methods, Artif. Intell. Rev. 53 (2) (2020) 907–948,. URL: http://link.springer.com/10.1007/s10462-019-09682-y.
[5]
Zhao Z., Zhang R., Cox J., Duling D., Sarle W., Massively parallel feature selection: an approach based on variance preservation, Mach. Learn. 92 (1) (2013) 195–220,. URL: http://link.springer.com/10.1007/s10994-013-5373-4.
[6]
Senawi A., Wei H.-L., Billings S.A., A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking, Pattern Recognit. 67 (2017) 47–61,. URL: https://linkinghub.elsevier.com/retrieve/pii/S003132031730033X.
[7]
Roy S., Mondal S., Ekbal A., Desarkar M.S., Dispersion ratio based decision tree model for classification, Expert Syst. Appl. 116 (2019) 1–9,. URL: https://linkinghub.elsevier.com/retrieve/pii/S0957417418305438.
[8]
Miah A.S.M., Rahim M.A., Shin J., Motor-imagery classification using Riemannian geometry with median absolute deviation, Electronics 9 (10) (2020) 1584,. URL: https://www.mdpi.com/2079-9292/9/10/1584.
[9]
Ang J.C., Mirzal A., Haron H., Hamed H.N.A., Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection, IEEE/ACM Trans. Comput. Biol. Bioinform. 13 (5) (2016) 971–989,. URL: http://ieeexplore.ieee.org/document/7264992/.
[10]
Di Mauro M., Galatro G., Fortino G., Liotta A., Supervised feature selection techniques in network intrusion detection: A critical review, Eng. Appl. Artif. Intell. 101 (2021),. URL: https://linkinghub.elsevier.com/retrieve/pii/S0952197621000634.
[11]
Sánchez-Maroño N., Alonso-Betanzos A., Tombilla-Sanromán M., Filter methods for feature selection–a comparative study, in: International Conference on Intelligent Data Engineering and Automated Learning, Springer, 2007, pp. 178–187.
[12]
Liu H., Zhou M., Liu Q., An embedded feature selection method for imbalanced data classification, IEEE/CAA J. Autom. Sin. 6 (3) (2019) 703–715,. URL: https://ieeexplore.ieee.org/document/8677302/.
[13]
Mafarja M., Mirjalili S., Whale optimization approaches for wrapper feature selection, Appl. Soft Comput. 62 (2018) 441–453,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1568494617306695.
[14]
Van Thieu N., Mirjalili S., MEALPY: An open-source library for latest meta-heuristic algorithms in Python, J. Syst. Archit. 139 (2023),. URL: https://linkinghub.elsevier.com/retrieve/pii/S1383762123000504.
[15]
Van Thieu N., Deb Barma S., Van Lam T., Kisi O., Mahesha A., Groundwater level modeling using Augmented Artificial Ecosystem Optimization, J. Hydrol. 617 (2023),. URL: https://linkinghub.elsevier.com/retrieve/pii/S0022169422016043.
[16]
Nguyen T., Nguyen G., Nguyen B.M., EO-CNN: An enhanced CNN model trained by equilibrium optimization for traffic transportation prediction, Procedia Comput. Sci. 176 (2020) 800–809,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1877050920319724.
[17]
Nguyen T., Nguyen T., Vu Q.-H., Huynh T.T.B., Nguyen B.M., Multi-objective sparrow search optimization for task scheduling in fog-cloud-blockchain systems, in: 2021 IEEE International Conference on Services Computing, SCC, IEEE, 2021, pp. 450–455.
[18]
Jadhav S., He H., Jenkins K., Information gain directed genetic algorithm wrapper feature selection for credit rating, Appl. Soft Comput. 69 (2018) 541–553,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1568494618302242.
[19]
Zhang Y., Gong D.-w., Gao X.-z., Tian T., Sun X.-y., Binary differential evolution with self-learning for multi-objective feature selection, Inform. Sci. 507 (2020) 67–85,. URL: https://linkinghub.elsevier.com/retrieve/pii/S0020025519307819.
[20]
Song X.-F., Zhang Y., Guo Y.-N., Sun X.-Y., Wang Y.-L., Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data, IEEE Trans. Evol. Comput. 24 (5) (2020) 882–895,. URL: https://ieeexplore.ieee.org/document/8966273/.
[21]
Nguyen T., Nguyen B.M., Nguyen G., Building resource auto-scaler with functional-link neural network and adaptive bacterial foraging optimization, in: International Conference on Theory and Applications of Models of Computation, Springer, 2019, pp. 501–517.
[22]
Walowe Mwadulo M., A review on feature selection methods for classification tasks, Int. J. Comput. Appl. Technol. Res. 5 (6) (2016) 395–402,. URL: http://ijcat.com/archives/volume5/issue6/ijcatr05061013.pdf.
[23]
Nadimi-Shahraki M.H., Zamani H., Mirjalili S., Enhanced whale optimization algorithm for medical feature selection: A COVID-19 case study, Comput. Biol. Med. 148 (2022),. URL: https://linkinghub.elsevier.com/retrieve/pii/S0010482522006126.
[24]
Pashaei E., Aydin N., Binary black hole algorithm for feature selection and classification on biological data, Appl. Soft Comput. 56 (2017) 94–106,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1568494617301242.
[25]
Anita E., Yadav A., AEFA: Artificial electric field algorithm for global optimization, Swarm Evol. Comput. 48 (2019) 93–108,. URL: https://linkinghub.elsevier.com/retrieve/pii/S2210650218305030.
[26]
Nguyen T., Hoang B., Nguyen G., Nguyen B.M., A new workload prediction model using extreme learning machine and enhanced tug of war optimization, Procedia Comput. Sci. 170 (2020) 362–369,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1877050920305007.
[27]
Pourpanah F., Shi Y., Lim C.P., Hao Q., Tan C.J., Feature selection based on brain storm optimization for data classification, Appl. Soft Comput. 80 (2019) 761–775,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1568494619302297.
[28]
Allam M., Nandhini M., Optimal feature selection using binary teaching learning based optimization algorithm, J. King Saud Univ. Comput. Inform. Sci. 34 (2) (2022) 329–341,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1319157818306463.
[29]
Thaher T., Mafarja M., Abdalhaq B., Chantar H., Wrapper-based feature selection for imbalanced data using binary queuing search algorithm, in: 2019 2nd International Conference on New Trends in Computing Sciences, ICTCS, IEEE, Amman, Jordan, 2019, pp. 1–6,. URL: https://ieeexplore.ieee.org/document/8923039/.
[30]
Ho Y., Pepyne D., Simple explanation of the no-free-lunch theorem and its implications, J. Optim. Theory Appl. 115 (3) (2002) 549–570,. URL: http://link.springer.com/10.1023/A:1021251113462.
[31]
Ibrahim R.A., Elaziz M.A., Oliva D., Cuevas E., Lu S., An opposition-based social spider optimization for feature selection, Soft Comput. 23 (2019) 13547–13567.
[32]
Sayed S.A.-F., Nabil E., Badr A., A binary clonal flower pollination algorithm for feature selection, Pattern Recognit. Lett. 77 (2016) 21–27.
[33]
Dhiman G., Oliva D., Kaur A., Singh K.K., Vimal S., Sharma A., Cengiz K., BEPO: A novel binary emperor penguin optimizer for automatic feature selection, Knowl.-Based Syst. 211 (2021).
[34]
Ibrahim R.A., Abd Elaziz M., Oliva D., Lu S., An improved runner-root algorithm for solving feature selection problems based on rough sets and neighborhood rough sets, Appl. Soft Comput. 97 (2020).
[35]
Zhang Y., Liu R., Wang X., Chen H., Li C., Boosted binary Harris hawks optimizer and feature selection, Eng. Comput. 37 (2021) 3741–3770.
[36]
Hussien A.G., Hassanien A.E., Houssein E.H., Swarming behaviour of salps algorithm for predicting chemical compound activities, in: 2017 Eighth International Conference on Intelligent Computing and Information Systems, ICICIS, IEEE, 2017, pp. 315–320.
[37]
Houssein E.H., Hosney M.E., Oliva D., Mohamed W.M., Hassaballah M., A novel hybrid Harris hawks optimization and support vector machines for drug design and discovery, Comput. Chem. Eng. 133 (2020).
[38]
Oliva D., Elaziz M.A., An improved brainstorm optimization using chaotic opposite-based learning with disruption operator for global optimization and feature selection, Soft Comput. 24 (18) (2020) 14051–14072.
[39]
Marjuni A., Adji T.B., Ferdiana R., Unsupervised software defect prediction using median absolute deviation threshold based spectral classifier on signed Laplacian matrix, J. Big Data 6 (1) (2019) 87,. URL: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0250-z.
[40]
Lei C., Zhu X., Unsupervised feature selection via local structure learning and sparse learning, Multimedia Tools Appl. 77 (22) (2018) 29605–29622,. URL: http://link.springer.com/10.1007/s11042-017-5381-7.
[41]
Pes B., Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains, Neural Comput. Appl. 32 (10) (2020) 5951–5973,. URL: http://link.springer.com/10.1007/s00521-019-04082-3.
[42]
Bommert A., Sun X., Bischl B., Rahnenführer J., Lang M., Benchmark for filter methods for feature selection in high-dimensional classification data, Comput. Statist. Data Anal. 143 (2020),. URL: https://linkinghub.elsevier.com/retrieve/pii/S016794731930194X.
[43]
Croux C., Dehon C., Influence functions of the Spearman and Kendall correlation measures, Stat. Methods Appl. 19 (2010) 497–515.
[44]
Kornbrot D., Point biserial correlation, Wiley StatsRef: Stat. Ref. Online (2014).
[45]
Cohen I., Huang Y., Chen J., Benesty J., Benesty J., Chen J., Huang Y., Cohen I., Pearson correlation coefficient, in: Noise Reduction in Speech Processing, Springer, 2009, pp. 1–4.
[46]
Zhang H., Wang J., Sun Z., Zurada J.M., Pal N.R., Feature selection for neural networks using group Lasso regularization, IEEE Trans. Knowl. Data Eng. 32 (4) (2020) 659–673,. URL: https://ieeexplore.ieee.org/document/8613832/.
[47]
Kamkar I., Gupta S.K., Phung D., Venkatesh S., Stable feature selection for clinical prediction: Exploiting ICD tree structure using tree-lasso, J. Biomed. Inform. 53 (2015) 277–290,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1532046414002639.
[48]
Yan K., Zhang D., Feature selection and analysis on correlated gas sensor data with recursive feature elimination, Sensors Actuators B 212 (2015) 353–363,. URL: https://linkinghub.elsevier.com/retrieve/pii/S0925400515001872.
[49]
Meyer H., Reudenbach C., Hengl T., Katurji M., Nauss T., Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation, Environ. Model. Softw. 101 (2018) 1–9,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1364815217310976.
[50]
Haq A.U., Li J., Memon M.H., Hunain Memon M., Khan J., Marium S.M., Heart disease prediction system using model of machine learning and sequential backward selection algorithm for features selection, in: 2019 IEEE 5th International Conference for Convergence in Technology, I2CT, IEEE, Bombay, India, 2019, pp. 1–4,. URL: https://ieeexplore.ieee.org/document/9033683/.
[51]
Pirgazi J., Alimoradi M., Esmaeili Abharian T., Olyaee M.H., An efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets, Sci. Rep. 9 (1) (2019) 18580.
[52]
Huang J., Cai Y., Xu X., A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern Recognit. Lett. 28 (13) (2007) 1825–1844,. URL: https://linkinghub.elsevier.com/retrieve/pii/S0167865507001754.
[53]
Mafarja M., Eleyan D., Abdullah S., Mirjalili S., S-shaped vs. V-shaped transfer functions for ant lion optimization algorithm in feature selection problem, in: Proceedings of the International Conference on Future Networks and Distributed Systems, ACM, Cambridge United Kingdom, 2017, pp. 1–7,. URL: https://dl.acm.org/doi/10.1145/3102304.3102325.
[54]
Nguyen T., Tran N., Nguyen B.M., Nguyen G., A resource usage prediction system using functional-link and genetic algorithm neural network for multivariate cloud metrics, in: 2018 IEEE 11th Conference on Service-Oriented Computing and Applications, SOCA, IEEE, Paris, 2018, pp. 49–56,. URL: https://ieeexplore.ieee.org/document/8599578/.
[55]
Xie C., Nguyen H., Bui X.-N., Nguyen V.-T., Zhou J., Predicting roof displacement of roadways in underground coal mines using adaptive neuro-fuzzy inference system optimized by various physics-based optimization algorithms, J. Rock. Mech. Geotech. Eng. 13 (6) (2021) 1452–1465,. URL: https://linkinghub.elsevier.com/retrieve/pii/S1674775521001141.
[56]
Abedinpourshotorban H., Mariyam Shamsuddin S., Beheshti Z., Jawawi D.N., Electromagnetic field optimization: A physics-inspired metaheuristic optimization algorithm, Swarm Evol. Comput. 26 (2016) 8–22,. URL: https://linkinghub.elsevier.com/retrieve/pii/S2210650215000528.
[57]
Rao R., Savsani V., Vakharia D., Teaching–learning-based optimization: A novel method for constrained mechanical design optimization problems, Comput. Aided Des. 43 (3) (2011) 303–315,. URL: https://linkinghub.elsevier.com/retrieve/pii/S0010448510002484.
[58]
Li S., Chen H., Wang M., Heidari A.A., Mirjalili S., Slime mould algorithm: A new method for stochastic optimization, Future Gener. Comput. Syst. 111 (2020) 300–323,. URL: https://linkinghub.elsevier.com/retrieve/pii/S0167739X19320941.
[59]
Ahmadianfar I., Heidari A.A., Gandomi A.H., Chu X., Chen H., RUN beyond the metaphor: An efficient optimization algorithm based on Runge Kutta method, Expert Syst. Appl. 181 (2021),. URL: https://linkinghub.elsevier.com/retrieve/pii/S0957417421005200.

Index Terms

  1. Feature selection using metaheuristics made easy: Open source MAFESE library in Python
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Future Generation Computer Systems
          Future Generation Computer Systems  Volume 160, Issue C
          Nov 2024
          966 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 18 October 2024

          Author Tags

          1. Feature selection
          2. MAFESE Library
          3. Metaheuristic algorithms
          4. Wrapper-based methods
          5. Artificial intelligence
          6. Open source Python software

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 07 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media