Feature selection algorithm based on P systems

Hongping Song ORCID: orcid.org/0000-0001-6619-2787¹,
Yourui Huang¹^na1,
Qi Song¹^na1,
Tao Han¹^na1 &
…
Shanyong Xu¹^na1

1899 Accesses
4 Citations
Explore all metrics

Abstract

Since the number of features of the dataset is much higher than the number of patterns, the higher the dimension of the data, the greater the impact on the learning algorithm. Dimension disaster has become an important problem. Feature selection can effectively reduce the dimension of the dataset and improve the performance of the algorithm. Thus, in this paper, A feature selection algorithm based on P systems (P-FS) is proposed to exploit the parallel ability of cell-like P systems and the advantage of evolutionary algorithms in search space to select features and remove redundant information in the data. The proposed P-FS algorithm is tested on five UCI datasets and an edible oil dataset from practical applications. At the same time, the P-FS algorithm and genetic algorithm feature selection (GAFS) are compared and tested on six datasets. The experimental results show that the P-FS algorithm has good performance in classification accuracy, stability, and convergence. Thus, the P-FS algorithm is feasible in feature selection.

Multi-Objective Feature Selection Algorithm Based on Mutual Information and NSGA-II

Multi-objective Optimization Based Feature Selection Using Correlation

Feature Subset Selection Approach by Gray-Wolf Optimization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Membrane computing is a new branch of natural computing. Inspired by the way cells deal with chemicals and the structure of cells, Gheorghe Păun, first proposed the membrane computing model in Technical report of Turku Center for Computer Science, Finland, in 1998. The first paper on membrane computing(Păun 2000) was published in 2000. In recent years, the research of membrane computing(Krishna 2007) has developed rapidly and become one of the frontier research fields in theoretical computer science. Membrane computing mainly studies computational models abstracted from cells, tissues or organs, also known as P systems, which can be divided into three types: cell-like, tissue-like, and neural-like, has been successfully applied in various fields.

Cell-like P systems are abstracted from the structure and function of the cell, which is mainly composed of the membranes, objects, and evolution rules. Academician Gheorghe Păun introduced the formal definition and computability of cell-like P systems in detail. He proved that cell-like P systems have the same computing power as Turing machines and have great parallelism, distribution, and uncertainty. In 2006, Nishida(Nishida 2006) successfully solved the traveling salesman problem using membrane computing, showing that membrane computing can be used to solve the NP-hard problem. Since then, membrane computing has developed rapidly in the field of evolutionary computing and engineering(Xiao et al. 2012^{) (}Xiao et al. 2014).

With the rapid development of machine learning and data processing technology, the problem of dimension disaster(Dong et al. 2020) is more serious. This kind of problem can be solved by dimension reduction. Feature extraction and feature selection(FS) (Farahat et al. 2013) are two commonly used methods. Feature extraction maps feature space to smaller space. Feature selection reduces the number of features directly by selecting a feature subset with enough information through evaluation criteria. The method has been applied in many engineering fields, including network anomaly detection(Emiro and Eduardo 2014), facial expression recognition(Mlakar and Fister 2017), face detection(Pan et al. 2013), and medical applications(Vivekanandan and Ch 2017) et al. It is an NP-Hard problem. The simplest method to evaluate each feature subset is the exhaustive method which finally determines the optimal feature subset. However, there are many shortcomings, such as the time-consuming and high cost of the assessment. Using a metaheuristic algorithm((Welikala et al. 2015; Singh and Singh 2019)) can avoid increasing the computational complexity of feature selection. Feature selection methods based on metaheuristic algorithms, such as particle swarm optimization (PSO)(Amoozegar and Minaei-Bidgoli 2018), ant colony optimization(Ghosh et al. 2019), artificial bee colony optimization (ABC)(Xue et al. 2018), differential evolution (DE)(Al-Ani et al. 2013), firefly algorithm (FA)(Marie-Sainte and Alalyani 2020), bat algorithm (BA)(Ab et al. 2020), cuckoo algorithm (COA)(Prabukumar et al. 2019), gray wolf optimization (GWO)(Tu et al. 2019), bacterial foraging algorithm(Niu et al. 2021) and whale optimization algorithm (WOA)(Aziz et al. 2018). Recently, metaheuristic feature selection methods have developed rapidly. Genetic algorithm(GA)(Dong et al. 2018) is easier to apply to feature selection problems using binary coding(Xue et al. 2019). Raman et al.(Raman et al. 2017) used a genetic algorithm to build an intrusion detection system. Lin et al.(Lin et al. 2014) designed a genetic algorithm feature selection that can be used for image retrieval and classification. Das et al.(Das et al. 2017) combined double objective and genetic algorithm for feature selection.

Inspired by membrane computing, this paper proposes a new feature selection method—P-FS algorithm, which uses the advantages of the GA algorithm in feature selection and the parallel characteristics of cell-like P systems to find the optimal feature subset. The P-FS algorithm and the GAFS algorithm are both tested on five UCI(Dheeru and Karra 2017) datasets and an edible oil dataset. The performance is verified from practical applications in terms of classification accuracy, the number of selected features, stability, and convergence.

2 Methodology

2.1 Cell-like P systems

Cell-like P systems with degree $m$ can be defined as:

$$ \Pi { = }\left( {V,O,H,\mu ,\omega_{1} , \cdot \cdot \cdot ,\omega_{m} ,R_{1} , \cdot \cdot \cdot ,R_{m} ,i_{0} } \right) $$

(1)

where: $V$ is a nonempty finite alphabet, and the elements in the alphabet are called objects; $O \subseteq V$ is a collection of output objects; $H$ is a set of membrane labels, $H = \left. {\left\{ {1,2, \cdot \cdot \cdot ,m} \right.} \right\}$; $\mu$ is a membrane structure with $m$ membranes, and $m$ is called the degree of $\Pi$; $\omega_{i} \in V^{ * } \left( {1 \le i \le m} \right)$, represents the multisets of objects in the region $i$ of the membrane structure $\mu$, and $V^{ * }$ is set of all multisets over $V$; $R_{i} \left( {1 \le i \le m} \right)$ is the set of evolution rules in each region $i$ of membrane structure $\mu$; $i_{0}$ is the label of the output membrane of the membrane systems, $i_{0} \in H$.

When the cell-like P system starts running, the objects in each membrane evolve according to evolutionary rules. The rules are executed in parallel, and the system terminates when there are no rules available in the system.

2.2 Genetic algorithm

GA is a metaheuristic algorithm, which belongs to the category of evolutionary algorithms (EA). It usually uses biological heuristic operators, such as mutation, crossover, and selection, to generate high-quality optimization and search solutions. GA for feature selection includes initialization of population, evaluation of individual fitness, selection, crossover, mutation, and end condition judgment. The flow chart of the algorithm is shown in Fig. 1.

Population initialization: set the number of iterations and initialize to generate n chromosomes. A chromosome is represented by a binary string composed of 0 or 1. The length of the chromosome is the dimension of the dataset (the total number of features in the dataset). 0 means that the feature is not selected, and 1 means that the feature is selected.

Evaluation of individual fitness: fitness function is used to calculate the fitness value of the feature subset corresponding to each chromosome. In this paper, the Support Vector Machine (SVM) algorithm(Cortes and Vapnik 1995) is used to calculate the fitness value.

Selection: different fitness values of chromosomes lead to the different probability of being selected. This paper uses the roulette method to select chromosomes.

Crossover: two chromosomes exchange genes in a certain position according to a certain probability to produce new chromosomes.

Mutation: a gene in a chromosome changes from 0 to 1 or from 1 to 0 according to a certain probability to produce a new individual.

Meet termination conditions: when the current number of iterations is equal to the maximum number of iterations, the algorithm ends running.

3 Feature selection algorithm based on P systems

GA has a powerful search ability in feature selection, but it is computationally complex and takes a long time to process high-dimensional data. At the same time, the frequency of mutation is certain, and the effect of using the mutation factor to jump out of the local optimum has volatility. Due to the ability of parallel processing in P systems, the P-FS algorithm is proposed. Using the computing rules of cell-like P systems and GA to select features has a strong ability for global search, which is helpful to jump out of the global optimum.

3.1 Algorithm design

The designed P-FS algorithm adopts the membrane structure of cell-like P systems. Its structural form is defined as:

$$ \Pi { = }\left( {V,O,H,\mu ,\omega_{1} , \cdot \cdot \cdot ,\omega_{4} ,R_{1} , \cdot \cdot \cdot ,R_{4} ,i_{0} } \right) $$

(2)

where:

(1)$V$ is a nonempty finite alphabet whose objects are the feature subset corresponding to each chromosome in the genetic algorithm;

(2)$O \subseteq V$ is the output alphabet and the output algorithm results;

(3)$H$ is a set of membrane labels, $H = \left. {\left\{ {1,2,3,4} \right.} \right\}$;

(4)$\mu$ is a membrane structure with $m$ membranes, $m = 4$;

(5)$\omega_{i} \in V^{ * } \left( {1 \le i \le m} \right)$, represents the multiset of objects in a region $i$ in membrane structure $\mu$, corresponding to the initial binary chromosome populations in membrane 3 and membrane 4;

(6)$R_{i} \left( {1 \le i \le m} \right)$ is a set of evolution rules in each region of membrane structure, including computing fitness values for chromosomes, the selection, crossover, and mutation in GA, and the communication rules for transferring objects in membrane to adjacent regions;

(7)$i_{0}$ is the label of the output membrane from the membrane systems, $i_{0} = 2$.

3.2 Evolution rules

In the cell-like P system, objects in each region evolve according to their own evolutionary rules and the execution of evolutionary rules in parallel. The system will terminate if there are no rules to execute in the system. In Fig. 1, there is no operation in membrane 1, but only the chromosome transferred from membrane 2 is recovered. As the main membrane, membrane 2 receives the chromosome population and corresponding fitness value transmitted by membrane 3 and membrane 4. Then, it transfers the population containing the best fitness value to membrane 3 and membrane 4, and the other population to membrane 1. Membrane 3 and membrane 4 mainly search the space globally to find the region where the optimal solution is located.

The chromosomes in membrane 3 were updated according to selection, crossover, and mutation. At each iteration, the chromosomes are sorted according to the fitness value from greatest to smallest. After sorting, the population and the optimal fitness value are transferred to membrane 2. The evolution rules of membrane 3 are as follows:

$$ \begin{array}{*{20}l} {r_{{31}} :C_{{31}}^{t} ,C_{{32}}^{t} , \cdot \cdot \cdot ,C_{{3k}}^{t} \to C_{{31}}^{{t^{\prime}}} ,C_{{32}}^{{t^{\prime}}} , \cdot \cdot \cdot ,C_{{3k}}^{{t^{\prime}}} } \hfill \\ {f_{{31}}^{t} ,f_{{32}}^{t} , \cdot \cdot \cdot ,f_{{3k}}^{t} \to f_{{31}}^{{t^{\prime}}} ,f_{{32}}^{{t^{\prime}}} , \cdot \cdot \cdot ,f_{{3k}}^{{t^{\prime}}} } \hfill \\ {r_{{32}} :C_{{31}}^{{t^{\prime}}} C_{{32}}^{{t^{\prime}}} \cdot \cdot \cdot C_{{3k}}^{{t^{\prime}}} \to C_{{31}}^{{t^{\prime}}} C_{{32}}^{{t^{\prime}}} \cdot \cdot \cdot C_{{3k}}^{{t^{\prime}}} \left( {C_{{31}}^{{t^{\prime}}} C_{{32}}^{{t^{\prime}}} \cdot \cdot \cdot C_{{3k}}^{{t^{\prime}}} } \right)_{{in2}} } \hfill \\ {f_{{31}}^{{t^{\prime}}} f_{{32}}^{{t^{\prime}}} \cdot \cdot \cdot f_{{3k}}^{{t^{\prime}}} \to f_{{31}}^{{t^{\prime}}} f_{{32}}^{{t^{\prime}}} \cdot \cdot \cdot f_{{3k}}^{{t^{\prime}}} \left( {f_{{31}}^{{t^{\prime}}} } \right)_{{in2}} } \hfill \\ \end{array} $$

(3)

where: $k$ is the number of the population in membrane and the number of the population in membrane 3 and membrane 4 is equal; $C_{31}^{{t^{\prime}}} C_{32}^{{t^{\prime}}} \cdot \cdot \cdot C_{3k}^{{t^{\prime}}}$ is the sequence of chromosome selection, crossover, mutation, and sorting in membrane 3 when the number of iterations is $t$; $f_{31}^{{t^{\prime}}} ,f_{32}^{{t^{\prime}}} , \cdot \cdot \cdot ,f_{3k}^{{t^{\prime}}}$ is the fitness value of chromosomes in membrane 3 when the number of iterations is $t$.

The chromosomes in membrane 4 were updated according to selection, crossover, and mutation. At each iteration, the chromosomes are sorted according to the fitness value from greatest to smallest. After sorting, the population and the optimal fitness value are transferred to membrane 2. The evolution rules of membrane 4 are as follows:

$$ \begin{array}{*{20}l} {r_{{41}} :C_{{41}}^{t} ,C_{{42}}^{t} , \cdot \cdot \cdot ,C_{{4k}}^{t} \to C_{{41}}^{{t^{\prime}}} ,C_{{42}}^{{t^{\prime}}} , \cdot \cdot \cdot ,C_{{4k}}^{{t^{\prime}}} } \hfill \\ {{\text{ }}f_{{41}}^{t} ,f_{{42}}^{t} , \cdot \cdot \cdot ,f_{{4k}}^{t} \to f_{{41}}^{{t^{\prime}}} ,f_{{42}}^{{t^{\prime}}} , \cdot \cdot \cdot ,f_{{4k}}^{{t^{\prime}}} } \hfill \\ {r_{{42}} :C_{{41}}^{{t^{\prime}}} C_{{42}}^{{t^{\prime}}} \cdot \cdot \cdot C_{{4k}}^{{t^{\prime}}} \to C_{{41}}^{{t^{\prime}}} C_{{42}}^{{t^{\prime}}} \cdot \cdot \cdot C_{{4k}}^{{t^{\prime}}} \left( {C_{{41}}^{{t^{\prime}}} C_{{42}}^{{t^{\prime}}} \cdot \cdot \cdot C_{{4k}}^{{t^{\prime}}} } \right)_{{in2}} } \hfill \\ {f_{{41}}^{{t^{\prime}}} f_{{42}}^{{t^{\prime}}} \cdot \cdot \cdot f_{{4k}}^{{t^{\prime}}} \to f_{{41}}^{{t^{\prime}}} f_{{42}}^{{t^{\prime}}} \cdot \cdot \cdot f_{{4k}}^{{t^{\prime}}} \left( {f_{{41}}^{{t^{\prime}}} } \right)_{{in2}} } \hfill \\ \end{array} $$

(4)

where: $C_{41}^{{t^{\prime}}} C_{42}^{{t^{\prime}}} \cdot \cdot \cdot C_{4k}^{{t^{\prime}}}$ is the sequence of chromosome selection, crossover, mutation, and sorting in membrane 4 when the number of iterations is $t$; $f_{41}^{{t^{\prime}}} f_{42}^{{t^{\prime}}} \cdot \cdot \cdot f_{4k}^{{t^{\prime}}}$ is the fitness value of chromosomes in membrane 4 when the number of iterations is $t$.

In membrane 2, the optimal fitness values transmitted from membrane 3 and membrane 4 were sorted from greatest to smallest, the populations corresponding to great fitness values were transmitted to membrane 3 and membrane 4, and the populations corresponding to small fitness values were transmitted to membrane 1. The evolution rules of membrane 2 are as follows:

$$ \begin{array}{*{20}l} {r_{{21}} :f_{{31}}^{{t^{\prime}}} f_{{41}}^{{t^{\prime}}} \to f_{1}^{{t^{\prime}}} f_{2}^{{t^{\prime}}} } \hfill \\ {r_{{22}} :f_{1}^{{t^{\prime}}} f_{2}^{{t^{\prime}}} \to f_{1}^{{t^{\prime}}} f_{2}^{{t^{\prime}}} \left( {C_{1}^{{t^{\prime}}} ,C_{2}^{{t^{\prime}}} , \cdot \cdot \cdot ,C_{k}^{{t^{\prime}}} } \right)_{{in3,4}} } \hfill \\ {f_{1}^{{t^{\prime}}} f_{2}^{{t^{\prime}}} \to f_{1}^{{t^{\prime}}} f_{2}^{{t^{\prime}}} \left( {C_{{1^{\prime}}}^{{t^{\prime}}} ,C_{{2^{\prime}}}^{{t^{\prime}}} , \cdot \cdot \cdot ,C_{{k^{\prime}}}^{{t^{\prime}}} } \right)_{{in1}} } \hfill \\ \end{array} $$

(5)

where: $f_{1}^{{t^{\prime}}} f_{2}^{{t^{\prime}}}$ is the sequence of fitness values after sorting when the number of iterations is $t$. $C_{1}^{{t^{\prime}}} C_{2}^{{t^{\prime}}} \cdot \cdot \cdot C_{k}^{{t^{\prime}}}$ is the population corresponding to $f_{1}^{{t^{\prime}}}$ when the number of iterations is $t$. $C_{{1^{\prime}}}^{{t^{\prime}}} C_{{2^{\prime}}}^{{t^{\prime}}} \cdot \cdot \cdot C_{{k^{\prime}}}^{{t^{\prime}}}$ is the population corresponding to $f_{2}^{{t^{\prime}}}$ when the number of iterations is $t$.

The structure of the algorithm is shown in Fig. 2, and the flow chart of the algorithm is shown in Fig. 3.

In Fig. 2, the initial object $C_{31}^{{}} C_{32}^{{}} \cdot \cdot \cdot C_{3k}^{{}}$ and $C_{41}^{{}} C_{42}^{{}} \cdot \cdot \cdot C_{4k}^{{}}$ are the chromosome population composed of 0 or 1. The length of the chromosome is equal to the characteristic number of the data set. $r_{21}$ is the sorting rule, and $r_{22}$ is the exchange rule of membrane 2 and membrane 1, membrane 3 and membrane 4. $r_{31}$ And $r_{41}$ are the selection, crossover, variation and fitness calculation rules. $r_{32}$ is the exchange rules for membrane 3 and membrane 2. $r_{42}$ the exchange rules for membrane 4 and membrane 2. When the system starts to run, there are initial objects in membrane 3 and membrane 4 and the rules can be run. Therefore, the rules in membrane 3 and membrane 4 are executed in parallel. The population executes the selection, crossover and mutation rules to obtain a new species group. Then, according to the position and number of 1 in the chromosome, a feature subset equal to the number of the population is obtained. For each feature subset, SVM is used to calculate the fitness. Then the fitness and the corresponding chromosomes are sorted. Finally, membrane 3 and membrane 4 send the optimal fitness value and population to membrane 2 at the same time. The rules in membrane 2 are implemented. Membrane 2 sorts the optimal fitness transmitted from membrane 3 and membrane 4. It sends the larger fitness and its corresponding population to membrane 3 and membrane 4 at the same time and sends the smaller fitness and its corresponding population to membrane 1. At this time, the objects in membrane 3 and membrane 4 can continue to execute according to the rules. When the system meets the number of execution steps, the output objects of the system are the larger fitness and the corresponding population in membrane 2.

4 Experiments and results

4.1 Dataset

The purpose of feature selection is to reduce the dimension of dataset and remove the redundant information of dataset. The dimension of the dataset is equal to the number of features in the dataset. The features of the dataset have actual physical meaning. Feature selection can reduce the dimension of the dataset without changing the physical meaning. The selection of feature subset is a NP-hard problem. The number of features in the dataset has a great impact on the computational complexity of the algorithm. Select datasets with different dimensions to observe the impact of data dimensions on the model. We collected five datasets from different fields of the UCI Machine Learning Repository and used laser-induced fluorescence technology to collect the fluorescence spectrum data of edible oil. To calculate the fitness, we randomly divided each dataset into a training set and a validation set according to 3:1.

Table 1 shows the statistical data of the six datasets. The number of features in Oil is 2048, indicating the fluorescence intensity at different wavelengths. The number of features in Gas is 128, which represents the reaction results between the sensor surface and chemical substances at different times. The number of features in Musk is 166, indicating the shape or conformation of different molecules. The number of features in Sonar is 60, which represents the bounce sonar signal obtained on the metal cylinder from different positions and angles. The number of features in Ulc is 147, which represents the image information of land cover in different cities. The number of features in Wine is 13, indicating the chemical analysis results of different wines. Feature selection of the above six data sets can analyze the impact of different features on the model performance and remove redundant features. In Table 1, the six datasets contain a significantly different number of features, categories, and total data, which can verify the performance of the P-FS algorithm on different types of datasets.

Table 1 Statistics of datasets

Full size table

4.2 Evaluation metrics

To prove the efficiency of the proposed method, the confusion matrix of six datasets was calculated, which included the following basic indicators:

TP (True Positive): The real label and the prediction label are 0;

FN (False Negative): The real label is 0, and the prediction label is 1;

FP (False Positive): The real label is 1 and the prediction label is 0;

TN (True Negative): The real label and the prediction label are 1;

The P-FS algorithm and GAFS algorithm are evaluated according to the following metrics.

Classification Accuracy: Classification Accuracy refers to the percentage of correctly classified samples compared to the total samples. Equation 6 is the classification accuracy of the method calculated according to the confusion matrix.

$$ Accuracy = \frac{TP + TN}{{TP + FN + FP + TN}} $$

(6)

Feature Reduction Rate (FRR): FRR refers to the percentage of the number of features that are not selected to the number of original features. Equation 7 is the formula for calculating the feature reduction rate.

$$ FRR = 1 - \frac{totalnumberofselectedfeatures}{{numberoforiginalfeatures}} $$

(7)

F1_Score: For a certain classifier, F1_Score is a judgment indicator that combines Precision and Recall. Equation 8, Eq. 9, and Eq. 10 are Precision, Recall, and F1_Score of the calculation formula respectively.

$$ Precision = \frac{TP}{{TP + FP}} $$

(8)

$$ Recall = Sensitivity = \frac{TP}{{TP + FN}} $$

(9)

$$ F1\_score = \frac{2 * Precision * Recall}{{Precision + Recall}} $$

(10)

Receiver Operation Characteristic Curve (ROC): ROC is a curve drawn according to (1-specificity) as abscissa and sensitivity as ordinate. Equation 11 is the calculation formula of specificity, and Eq. 9 is the calculation formula of sensitivity.

$$ Specificity = \frac{TN}{{TN + FP}} $$

(11)

4.3 Performance of the algorithm

The P-FS algorithm and GAFS algorithm are tested on six datasets. The parameters are shown in Table 2. To compare the performance of the two algorithms, the number of chromosomes, the number of iterations, and mutation rate are set to be the same.

Table 2 Parameters setting of two algorithms

Full size table

Table 3 shows the experimental results of the P-FS algorithm and GAFS algorithm. The number of internal iterations is 100, and run 100 times. Table 3 includes the optimal accuracy (OPA), worst accuracy (WOA), average accuracy (AVA), and standard deviation (STD) in 100 times. Figure 4 shows the iterative diagram of the optimal accuracy of the P-FS algorithm and GAFS algorithm.

Table 3 Experimental results of the P-FS algorithm and GAFS algorithm

Full size table

From Table 3, it can be concluded that the optimal accuracy of the P-FS algorithm and the GAFS algorithm is the same on the Gas, Musk, Sonar, and Wine datasets, and the P-FS algorithm is higher on the Oil and Ulc datasets. The average accuracy of the P-FS algorithm is higher than that of the GAFS algorithm on six datasets, proving that the performance of the P-FS algorithm is better. The worst accuracy of the P-FS algorithm is higher than that of the GAFS algorithm on six datasets, indicating that the P-FS algorithm improves the lower limit of the algorithm. The standard deviation of the P-FS algorithm is lower than the GAFS algorithm on six datasets, indicating that the stability of the P-FS algorithm is better than that of the GAFS algorithm.

Figure 4 shows that both the P-FS algorithm and the GAFS algorithm have convergence. Compared with the GAFS algorithm, the P-FS algorithm converges faster. In addition to Musk dataset, the P-FS algorithm can find the optimal value first on the remaining five datasets, indicating that the generalization ability of the P-FS algorithm is stronger than the GAFS algorithm.

Table 4 is the feature reduction rate (FRR) of the P-FS algorithm and GAFS algorithm under the optimal accuracy. Table 4 includes the total number of selected features and features reduction rate by the P-FS algorithm and the GAFS algorithm.

Table 4 FRR of P-FS algorithm and GAFS algorithm

Full size table

Table 4 shows that both the P-FS algorithm and the GAFS algorithm can effectively reduce the original features and achieve the purpose of data processing. Except for the Gas dataset, the FRR of the P-FS algorithm is higher than that of the GAFS algorithm, which shows that the P-FS algorithm has a stronger ability to remove redundant information than the GAFS algorithm.

Table 5 shows the F1_Score of the P-FS algorithm and GAFS algorithm, and the value range of F1_Score is [0,1]. F1_Score of 0 indicates the worst performance of the algorithm, and F1_Score of 1 indicates that the algorithm has the best performance.

Table 5 F1_Score of P-FS algorithm and GAFS algorithm

Full size table

In Table 5, the F1_Score of the P-FS algorithm and GAFS algorithm is equal on Musk, Sonar, and Wine. F1_Score of the P-FS algorithm is higher on the remaining three datasets, which proves that the performance of the P-FS algorithm is better than the GAFS algorithm.

Figure 5 shows the ROC of the P-FS algorithm and GAFS algorithm, AUC in the figure is the area of ROC, and the value range of AUC is [0.5,1]. The higher the AUC value, the better the performance of the algorithm. In Fig. 5, the AUC of the P-FS algorithm and GAFS algorithm is equal on Musk, Sonar, and Wine, and the AUC of the P-FS algorithm is higher on the remaining three datasets. At the same time, the AUC of the P-FS algorithm on six datasets is greater than 0.9, which proves that the accuracy of the P-FS algorithm is better than the GAFS algorithm.

P-FS has the parallel capability, and can support two GAFS to calculate and exchange information at the same time. From the above experimental results, it can be seen that the average accuracy of P-FS is higher than that of GAFS, indicating that P-FS has stronger search ability. At the same time, the standard deviation of P-FS is lower than that of GAFS, indicating that P-FS has better stability. And P-FS is better than GAFS in other performance indexes, which shows that the parallelism of P-FS improves the performance of the algorithm.

5 Discussion

In this section, we mainly discuss the advantages and applicability of the P-FS algorithm. We have shown the advantages of our P-FS algorithm in feature extraction. From experiments, on the one hand, we can see that the P-FS algorithm uses the parallel processing ability of cell-like P systems to expand the searchability of feature space, and at the same time uses the communication between membranes to search the optimal region faster. On the other hand, the mutation factor can help the algorithm jump out of the local optimum and improve the ability of feature space search. In addition to the advantages mentioned above, our P-FS algorithm also has some limitations. Firstly, the initialization of the population has a great influence on the search results; Secondly, the algorithm takes a lot of computing resources and takes a long time; Finally, the algorithm only uses the fitness value as the evaluation standard, and the number of features selected is more, but the less the number of features selected, the better.

In future work, we plan to optimize our P-FS algorithm from the following aspects: first, the initialization method. In the experiment, we find that the initial subset has a great influence on the final result of the algorithm. Later, the author wants to initialize the population of the algorithm by the filtering method to improve the stability of the algorithm. Secondly, the algorithm is only tested on six datasets and only compared with the performance of the GAFS algorithm, the work is relatively small. In the future, the author will use more public datasets and fitness functions to verify the feasibility of the algorithm as comprehensively as possible and compare the performance with other feature selection algorithms to study the advantages of the proposed algorithm. Finally, the kernel of the method proposed by the author is GA. Later, the author hopes to design a feature selection algorithm to simulate the membrane structure, which provides a new idea for the application research of membrane computing.

References

Ab A, Oaa B, Ar C (2020) A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 112(1):114–126
Article Google Scholar
Al-Ani A, Alsukker A, Khushaba RN (2013) Feature subset selection using differential evolution and a wheel based search strategy. Swarm Evol Comput 9:15–26
Article Google Scholar
Amoozegar M, Minaei-Bidgoli B (2018) Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism. Expert Syst Appl 113:499–514
Article Google Scholar
Aziz M, Ewees AA, Hassanien AE (2018) Hassanien. Multi-objective whale optimization algorithm for content-based image retrieval. Multimedia Tools Appl 77:26135–26172
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector network. Mach Learn 20(3):273–297
Article MATH Google Scholar
Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl-Based Syst 123:116–127
Article Google Scholar
Dheeru D, Karra T E (2017) UCI Machine learning repository
Dong H, Li T, Ding R, Sun J (2018) A novel hybrid genetic algorithm with granular information for feature selection and optimization Appl. Soft Comput 65:33–46
Article Google Scholar
Dong H, Sun J, Sun X, Ding R (2020) A many-objective feature selection for multi-label classification. Knowl-Based Syst 208(7):106456
Article Google Scholar
Emiro DLH, Eduardo DLH et al (2014) Feature selection by multi-objective optimization: application to network anomaly detection by hierarchical self-organizing maps. Knowl-Based Syst 71:322–338
Article Google Scholar
Farahat AK, Ghodsi A, Kamel MS (2013) Efficient greedy feature selection for unsupervised learning. Knowl Inf Syst 35(2):285–310
Article Google Scholar
Ghosh M, Guha R, Sarkar R, Abraham A (2019) A wrapper-filter feature selection technique based on ant colony optimization. Neural Computing and Applications. pp1–19
Krishna SN (2007) Universality results for P systems based on brane calculi operations. Theoret Comput Sci 371(1–2):83–105
Article MathSciNet MATH Google Scholar
Lin CH, Chen HY, Wu YS (2014) Study of image retrieval and classification based on adaptive features using genetic algorithm feature selection. Expert Syst Appl 41(15):6611–6621
Article Google Scholar
Marie-Sainte SL, Alalyani N (2020) Firefly algorithm based feature selection for Arabic text classification. J King Saud Univ—Comput Inform Sci 32(3):320–328
Google Scholar
Mlakar U, Fister I et al (2017) Multi-objective differential evolution for feature selection in facial expression recognition systems. Expert Syst Appl 89:129–137
Article Google Scholar
Nishida TY (2006) Membrane Algorithms. Springer, Berlin Heidelberg
Book MATH Google Scholar
Niu B, Yi W, Tan L et al (2021) A multi-objective feature selection method based on bacterial foraging optimization. Nat Comput 20:63–76
Article MathSciNet Google Scholar
Pan H, Zhu Y, Xia L (2013) Efficient and accurate face detection using heterogeneous feature descriptors and feature selection. Comput vis Image Underst 117(1):12–28
Article Google Scholar
Prabukumar M, Agilandeeswari L, Ganesan K (2019) An intelligent lung cancer diagnosis system using cuckoo search optimization and support vector machine classifier. J Ambient Intell Humaniz Comput 10(1):267–293
Article Google Scholar
Păun G (2000) Computing with membranes. Comput Syst Sci 61(1):108–143
Article MathSciNet MATH Google Scholar
Raman MG, Somu N, Kirthivasan K, Liscano R, Sriram VS (2017) An efficient intrusion detection system based on hypergraph—Genetic algorithm for parameter optimization and feature selection in support vector machine. Knowl-Based Syst 134:1–12
Article Google Scholar
Singh U, Singh SN (2019) A new optimal feature selection scheme for classification of power quality disturbances based on ant colony framework. Appl Soft Comput 74:216–225
Article Google Scholar
Tu Q, Chen X, Liu X (2019) Multi-strategy ensemble grey wolf optimizer and its application to feature selection. Appl Soft Comput 76:16–30
Article Google Scholar
Vivekanandan T, Ch SNIN (2017) Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease. Comput Biol Med 90:125–136
Article Google Scholar
Welikala RA et al (2015) Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy. Comput Med Imaging Graph 43:64–77
Article Google Scholar
Xiao J, Huang Y, Cheng Z et al (2014) A hybrid membrane evolutionary algorithm for solving constrained optimization problems. Optik 125(2):897–902
Article Google Scholar
Xiao JH, Zhang XY, Xu J (2012) A membrane evolutionary algorithm for DNA sequence design in DNA computing. Chin Sci Bull 57(6):698–706
Article Google Scholar
Xue B, Hancer E, Zhang M, Karaboga D, Akay B (2018) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479
Article Google Scholar
Xue Y, Xue B, Zhang M (2019) Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans Knowl Discov Data 13(5):1–27
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant no. 61772033).

Funding

The authors have no relevant financial or non-financial interests to disclose.

Author information

Yourui Huang, Qi Song, Tao Han and Shanyong Xu have contributed equally to this work.

Authors and Affiliations

School of Electrical and Information Engineering, Anhui University of Science and Technology, 168 Taifeng Street, Huainan, 232001, Anhui, China
Hongping Song, Yourui Huang, Qi Song, Tao Han & Shanyong Xu

Authors

Hongping Song
View author publications
You can also search for this author in PubMed Google Scholar
Yourui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Song
View author publications
You can also search for this author in PubMed Google Scholar
Tao Han
View author publications
You can also search for this author in PubMed Google Scholar
Shanyong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongping Song.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Song, H., Huang, Y., Song, Q. et al. Feature selection algorithm based on P systems. Nat Comput 22, 149–159 (2023). https://doi.org/10.1007/s11047-022-09912-3

Download citation

Accepted: 25 July 2022
Published: 16 August 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11047-022-09912-3

Feature selection algorithm based on P systems

Abstract

Similar content being viewed by others

Multi-Objective Feature Selection Algorithm Based on Mutual Information and NSGA-II

Multi-objective Optimization Based Feature Selection Using Correlation

Feature Subset Selection Approach by Gray-Wolf Optimization

1 Introduction