CN107016073B

CN107016073B - A kind of text classification feature selection approach

Info

Publication number: CN107016073B
Application number: CN201710181572.8A
Authority: CN
Inventors: 张晓彤; 余伟伟; 刘喆; 王璇
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2019-06-28
Anticipated expiration: 2037-03-24
Also published as: CN107016073A

Abstract

The present invention provides a kind of text classification feature selection approach, can reduce characteristic dimension and complicated classification degree and improves classification accuracy.The described method includes: obtaining feature set S and target category C, each feature x in feature set S is calculated⁽ⁱ⁾Degree of association R between target category C_c(x⁽ⁱ⁾), and according to degree of association R_c(x⁽ⁱ⁾) size to feature set S carry out descending sort；Calculate the redundancy R in feature set S between every two feature_xWith collaborative e-commerce S_x, degree of association R between binding characteristic and target category_c(x⁽ⁱ⁾) the sensitivity S en that calculates feature, and by it compared with preset threshold value th, in conjunction with the descending sort to feature set S as a result, feature set S is divided into Candidate Set S according to threshold value th_selCollect S with exclusion_exc；Calculate Candidate Set S_selCollect S with exclusion_excIn feature between sensitivity S en, and by it compared with preset threshold value th, according to threshold value th to Candidate Set S_selCollect S with exclusion_excIt is adjusted.The present invention is suitable for machine learning text classification field.

Description

Text classification feature selection method

Technical Field

The invention relates to the field of machine learning text classification, in particular to a text classification feature selection method.

Background

With the continuous expansion of the internet scale, the information resources gathered in the internet are also increasing. Content-based information retrieval and data mining have been of interest for effective management and convenient utilization of these information resources. Text classification techniques are an important basis for information retrieval and text data mining, with the main task of discriminating unknown classes of words and documents into one or more of a predetermined class based on their content. However, the two characteristics of large training sample number and high vector dimension determine that text classification is a machine learning problem with high operation time and space complexity. Therefore, feature selection is required to reduce feature dimension while ensuring classification performance as much as possible.

Feature selection is an important data preprocessing process, and in a common text classification feature selection method, a Chi-Square test (Chi-Square) selects words with large deviation from hypothesis as features by establishing a null hypothesis and assuming that the words are not related to target categories. It only counts whether a word appears in the document, regardless of the number of occurrences, which makes it biased towards low-frequency words. Mutual Information (Mutual Information) methods select features by the amount of Information brought to the target class by the presence of quantifier words. But it only considers the degree of association between the words and the target categories, neglecting possible dependencies between the words. The TF-IDF (Term Frequency-Inverse Document Frequency) method comprehensively considers the Frequency of the words appearing in the files and the distribution of the words in all the files to evaluate the importance degree of the words, so as to select the characteristics. But it is simply the more important it is to think that words with small text frequency are and the less useful it is that words with large text frequency are, so the accuracy is not very high. In addition, feature selection methods such as information gain, dominance rate, text evidence weight, expected cross entropy and the like are also provided, most of the methods only consider the correlation degree between words and target categories or the correlation degree between words, and the problems of insufficient dimension reduction degree or low classification precision are easily caused.

Disclosure of Invention

The invention aims to provide a text classification feature selection method to solve the problems of high feature dimension or low classification precision in the prior art.

In order to solve the above technical problem, an embodiment of the present invention provides a text classification feature selection method, including:

step 1: acquiring a feature set S and a target class C, and calculating each feature x in the feature set S⁽ⁱ⁾Degree of association R with object class C_c(x⁽ⁱ⁾) And according to the degree of association R_cSorting the feature set S in a descending order according to the size;

step 2: calculating the redundancy R between every two features in the feature set S_xAnd degree of synergy S_xAssociation degree R between combined features and object classes_c(x⁽ⁱ⁾) Calculating the sensitivity Sen of the features, comparing the sensitivity Sen with a preset threshold th, combining the descending sorting result of the feature set S, and dividing the feature set S into a candidate set S according to the threshold th_selAnd exclusion set S_exc；

And step 3: computing a candidate set S_selAnd exclusion set S_excThe sensitivity Sen between the features in (1) is compared with a preset threshold th, and the candidate set S is subjected to comparison according to the threshold th_selAnd exclusion set S_excAnd (6) adjusting.

Further, the step 1 comprises:

step 11, for each of the feature sets SA feature x⁽ⁱ⁾According to the formula R_c(x⁽ⁱ⁾)＝I(x⁽ⁱ⁾(ii) a C) Computing feature x⁽ⁱ⁾Degree of association R with object class C_c(x⁽ⁱ⁾) Wherein, I (x)⁽ⁱ⁾(ii) a C) Represents a feature x⁽ⁱ⁾Mutual information with the target class C;

step 12, according to the relevance R_c(x⁽ⁱ⁾) The size of the feature set S is determined by the size of the feature set S;

wherein x is⁽ⁱ⁾Represents the ith feature, R, in the feature set S_c(x⁽ⁱ⁾) Represents a feature x⁽ⁱ⁾Degree of association with the target class C.

Further, the I (x)⁽ⁱ⁾(ii) a C) Expressed as:

wherein, c_kThe kth class, p (x), representing the target class C⁽ⁱ⁾，c_k) Represents a feature x⁽ⁱ⁾And class c_kProbability of co-occurrence, p (x)⁽ⁱ⁾|c_k) Is shown at c_kFeature in class x⁽ⁱ⁾Probability of occurrence, p (x)⁽ⁱ⁾) Represents a feature x⁽ⁱ⁾Probability of occurrence in the feature set S.

Further, the redundancy R_xExpressed as:

R_x(x⁽ⁱ⁾；x^(j))＝min(0，IG(x⁽ⁱ⁾；x^(j)；C))，i≠j

wherein, IG (x)⁽ⁱ⁾；x^(j)(ii) a C) Representing the ith feature x in the feature set S⁽ⁱ⁾With the jth feature x^(j)Gain of correlation between, R_x(x⁽ⁱ⁾；x^(j)) Represents a feature x⁽ⁱ⁾And feature x^(j)Redundancy between, R_x(x⁽ⁱ⁾；x^(j)) Is the smaller of 0 and the correlation gain.

Further, the degree of agreement S_xExpressed as:

S_x(x⁽ⁱ⁾；x^(j))＝max(0，IG(x⁽ⁱ⁾；x^(j)；C))，i≠j

wherein, IG (x)⁽ⁱ⁾；x^(j)(ii) a C) Representing the ith feature x in the feature set S⁽ⁱ⁾With the jth feature x^(j)Gain of correlation between, S_x(x⁽ⁱ⁾；x^(j)) Represents a feature x⁽ⁱ⁾And feature x^(j)Degree of coordination between, S_x(x⁽ⁱ⁾；x^(j)) Is 0 and the larger of the correlation gains.

Further, the IG (x)⁽ⁱ⁾；x^(j)(ii) a C) Expressed as:

IG(x⁽ⁱ⁾；x^(j)；C)＝I[(x⁽ⁱ⁾，x^(j))；C]-I(x⁽ⁱ⁾；C)-I(x^(j)；C)

wherein, I (x)⁽ⁱ⁾(ii) a C) Represents a feature x⁽ⁱ⁾Mutual information with the target class C; i (x)^(j)(ii) a C) Represents a feature x^(j)Mutual information with the target class C; i ((x)⁽ⁱ⁾，x^(j)(ii) a C) Represents a feature x⁽ⁱ⁾Characteristic x^(j)And the target class C.

Further, the I ((x)⁽ⁱ⁾，x^(j)(ii) a C) Expressed as:

wherein, c_kThe kth class, p (x), representing the target class C⁽ⁱ⁾，x^(j)，c_k) Represents a feature x⁽ⁱ⁾Characteristic x^(j)And class c_kProbability of co-occurrence, p ((x)⁽ⁱ⁾，x^(j))|c_k) Is shown at c_kFeature in class x⁽ⁱ⁾And feature x^(j)Probability of co-occurrence, p (x)⁽ⁱ⁾，x^(j)) Represents a feature x⁽ⁱ⁾And feature x^(j)Along with the probability of occurrence in the feature set S.

Further, the step 2 comprises:

step 21: adding the first feature in the feature set S to the candidate set S_selWill exclude the set S_excSet to an empty set, i.e. S_sel＝{x⁽¹⁾}，S_exc-the first feature corresponds to a degree of association R { }_c(x⁽ⁱ⁾) Maximum;

step 22: starting with the second feature in the feature set S, with x⁽ⁱ⁾Representing said second feature, calculating feature x⁽ⁱ⁾And candidate set S_selRedundancy R between all features in_xAnd degree of synergy S_xAnd combining the relevance R between the characteristics and the target category_c(x⁽ⁱ⁾) Computing feature x⁽ⁱ⁾Sensitivity Sen (x) of (2)⁽ⁱ⁾)；

Step 23: sensitivity Sen (x)⁽ⁱ⁾) Comparing with a preset threshold th, if Sen (x)⁽ⁱ⁾) > th, then the feature x⁽ⁱ⁾Add candidate set S_sel(ii) a Else, the feature x⁽ⁱ⁾Addition of exclusion set S_exc；

Step 24: if x⁽ⁱ⁾If the last feature in the feature set S is adopted, the division is ended; otherwise, x is⁽ⁱ⁾Set to the next feature in the set S and go back to step 22.

Further, the sensitivity Sen (x)⁽ⁱ⁾) Expressed as:

Sen(x⁽ⁱ⁾)＝R_c(x⁽ⁱ⁾)+αmin(R_x(x⁽ⁱ⁾；x^(j)))

+βmax(S_x(x⁽ⁱ⁾；x^(j)))，j≠i

wherein α and β are redundancies R, respectively_xAnd degree of synergy S_xWeight of (3), min (R)_x(x⁽ⁱ⁾；x^(j)) ) represents a feature x⁽ⁱ⁾Minimum of redundancy with the remaining features, max (S)_x(x⁽ⁱ⁾；x^(j)) ) represents a feature x⁽ⁱ⁾Maximum value of degree of agreement with the remaining features, Sen (x)⁽ⁱ⁾) Represents a feature x⁽ⁱ⁾Sensitivity to target class C, R_c(x⁽ⁱ⁾) Represents a feature x⁽ⁱ⁾Degree of association with the target class C.

Further, the step 3 comprises:

step 31: order to be collected S_tbdIs empty, i.e. S_tbdX is { } given^(k)To exclude the set S_excIs given by x^(m)As a candidate set S_selThe first feature of (1);

step 32: for exclusion set S_excFeature x of (1)^(k)Calculating a candidate set S_selFeature x of (1)^(m)And x in the feature set S^(m)Maximum value of the degree of cooperation between all the features except the above, namely max (S)_x(x^(m)；x⁽ⁱ⁾))，x⁽ⁱ⁾∈S，i≠m；

Step 33: if the feature x^(m)Is x^(k)Then x is^(m)Adding pending set S_tbd；

Step 34: if the feature x^(m)Is the candidate set S_selLast feature in and S to be set_tbdIf it is empty, go to step 36; if to be set S_tbdIf not empty, let x^(j)For a pending set S_tbdStep 35; if the feature x^(m)Not the candidate set S_selIn the last feature, the feature x is then set^(m)Set as candidate set S_selTo the next feature, go back to step 32;

step 35: for pending set S_tbdFeature x of (1)^(j)Updating the feature x according to the following formula^(j)Sensitivity of (2):

Sen(x^(j))＝R_c(x^(j))+αmin(R_x(x^(j)；x⁽ⁿ⁾))

+βmax(S_x(x^(j)；x⁽ⁿ⁾))，x⁽ⁿ⁾∈S，n≠j，n≠k

will be characteristic x^(j)Sensitivity Sen (x) of (2)^(j)) Comparing with a preset threshold th, if Sen (x)^(j)) < th andthen the feature x^(k)From the exclusion set S_excRemoving, adding to the candidate set S_selEntering step 36; otherwise, if feature x^(j)Is the pending set S_tbdThe last element, then go directly to step 36; otherwise, the feature x is used^(j)Set to be set S_tbdGo to the next element, go back to step 35;

step 36: if the feature x^(k)Is the exclusion set S_excThe last element in the list is returned to the current candidate set S_selAnd exclusion set S_excAs a result of the final feature selection; otherwise, the feature x is used^(k)Set as exclusion set S_excGo to the next element and go back to step 31.

The technical scheme of the invention has the following beneficial effects:

in the scheme, the relevance R between the features and the target classes is calculated through the feature set S and the target classes C_c(x⁽ⁱ⁾) And redundancy R between features_xAnd degree of synergy S_xThereby calculating the sensitivity Sen of the characteristics; and screening the features according to a preset threshold th, dividing the feature set into a candidate set and an exclusion set, and continuously adjusting and optimizing the candidate set and the exclusion set in the subsequent process. Thus, the characteristics andthe mutual relations among the target categories and the characteristics are selected through the relevance, the redundancy and the cooperation, the characteristics which play a key role in classification are reserved, the characteristic dimensionality and the classification complexity are reduced, and the classification accuracy can be improved.

Drawings

Fig. 1 is a schematic flowchart of a text classification feature selection method according to an embodiment of the present invention;

fig. 2 is a detailed flowchart of a text classification feature selection method according to an embodiment of the present invention;

fig. 3 is a schematic flow chart illustrating a feature selection method for partitioning a candidate set and an exclusion set according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of adjusting a candidate set and an exclusion set by a feature selection method according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

The invention provides a text classification feature selection method aiming at the problems of high feature dimension or low classification precision in the prior art.

As shown in fig. 1, the method for selecting text classification features provided in the embodiment of the present invention includes:

step 1: acquiring a feature set S and a target class C, and calculating each feature x in the feature set S⁽ⁱ⁾Degree of association R with object class C_c(x⁽ⁱ⁾) And according to the degree of association R_c(x⁽ⁱ⁾) Sorting the feature set S in a descending order according to the size;

step 2: calculating characteristicsRedundancy R between every two features in the syndrome S_xAnd degree of synergy S_xAssociation degree R between combined features and object classes_c(x⁽ⁱ⁾) Calculating the sensitivity Sen of the features, comparing the sensitivity Sen with a preset threshold th, combining the descending sorting result of the feature set S, and dividing the feature set S into a candidate set S according to the threshold th_selAnd exclusion set S_exc；

The text classification feature selection method provided by the embodiment of the invention calculates the association degree R between the features and the target class through the feature set S and the target class C_c(x⁽ⁱ⁾) And redundancy R between features_xAnd degree of synergy S_xThereby calculating the sensitivity Sen of the characteristics; and screening the features according to a preset threshold th, dividing the feature set into a candidate set and an exclusion set, and continuously adjusting and optimizing the candidate set and the exclusion set in the subsequent process. Therefore, the mutual relations between the features and the target categories and between the features are comprehensively considered, the features are selected through the relevance degree, the redundancy degree and the cooperation degree, the features playing a key role in classification are reserved, the feature dimensionality and the classification complexity are reduced, and the classification accuracy can be improved.

In this embodiment, as shown in fig. 2, to acquire the feature set S and the target class C, the feature set S needs to be input first (x)⁽¹⁾，x⁽²⁾，...，x⁽ⁿ⁾) And a target class C.

In this embodiment, the feature set S represents all features (a single feature is represented by x) in the text classification process⁽ⁱ⁾A set of representations, i.e. word vectors), i.e. S ═ x⁽¹⁾，x⁽²⁾，...，x⁽ⁿ⁾) N represents the number of features in the feature set S; characteristic x⁽ⁱ⁾The word corresponding to the representation featureColumn vectors formed by the number of occurrences in each text file, i.e.The target category C represents a column vector formed by categories corresponding to each text file, and the target category C is a category set.

In this embodiment, the feature x⁽ⁱ⁾Degree of association R with object class C_c(x⁽ⁱ⁾) Is a characteristic x⁽ⁱ⁾And the target class C.

In this embodiment, as an optional embodiment, each feature x in the set of computing features S is calculated⁽ⁱ⁾Degree of association R with object class C_c(x⁽ⁱ⁾) And according to the degree of association R_c(x⁽ⁱ⁾) Sorting by size the set of features S in descending order (step 1) comprises:

step 11, for each feature x in the feature set S⁽ⁱ⁾According to the formula R_c(x⁽ⁱ⁾)＝I(x⁽ⁱ⁾(ii) a C) Computing feature x⁽ⁱ⁾Degree of association R with object class C_c(x⁽ⁱ⁾) Wherein, I (x)⁽ⁱ⁾(ii) a C) Represents a feature x⁽ⁱ⁾Mutual information with the target class C;

In this embodiment, the

Wherein, I (x)⁽ⁱ⁾(ii) a C) Represents a feature x⁽ⁱ⁾Mutual information with the object class C, C_kRepresenting the object class CThe kth class, p (x)⁽ⁱ⁾，c_k) Represents a feature x⁽ⁱ⁾And class c_kProbability of co-occurrence, p (x)⁽ⁱ⁾|c_k) Is shown at c_kFeature in class x⁽ⁱ⁾Probability of occurrence, p (x)⁽ⁱ⁾) Represents a feature x⁽ⁱ⁾Probability of occurrence in the feature set S.

In this embodiment, preferably, the feature x⁽ⁱ⁾And class c_kProbability of simultaneous occurrence p (x)⁽ⁱ⁾，c_k) From c_kFeatures x in Category File⁽ⁱ⁾The frequency of occurrence of the corresponding word in all files is approximated, i.e.:

wherein,represents a feature x⁽ⁱ⁾The jth element of (i.e., feature x)⁽ⁱ⁾The number of times the corresponding word appears in the jth file);represents a feature x⁽ⁱ⁾Wherein the corresponding object class is c_kM (i.e. feature x)⁽ⁱ⁾Corresponding word is at m-th c_kNumber of occurrences in category file).

In this embodiment, preferably, said_kFeature in class x⁽ⁱ⁾Probability of occurrence p (x)⁽ⁱ⁾|c_k) From the feature x⁽ⁱ⁾Corresponding word is in c_kThe frequency of occurrence in the category file is approximated, i.e.:

in the present embodiment, preferably, the characteristicsSign x⁽ⁱ⁾Probability p (x) of occurrence in feature set S⁽ⁱ⁾) From the feature x⁽ⁱ⁾The frequency of occurrence of the corresponding word in all files is approximated, namely:

in this embodiment, as a further alternative, as shown in fig. 3, the redundancy R between every two features in the feature set S is calculated_xAnd degree of synergy S_xAssociation degree R between combined features and object classes_c(x⁽ⁱ⁾) Calculating the sensitivity Sen of the feature, comparing the sensitivity Sen with a preset threshold th, and dividing the feature set S into candidate sets S according to the threshold th_selAnd exclusion set S_exc(step 2) comprising:

Step 24: if x⁽ⁱ⁾If the last feature in the feature set S is adopted, the division is ended; otherwise, x is⁽ⁱ⁾Is arranged asThe next feature in the set S is returned to step 22.

In an embodiment of the foregoing text classification feature selection method, further, the redundancy R_xExpressed as:

R_x(x⁽ⁱ⁾；x^(j))＝min(0，IG(x⁽ⁱ⁾；x^(j)；C))，i≠j

In an embodiment of the foregoing text classification feature selection method, further, the degree of agreement S_xExpressed as:

S_x(x⁽ⁱ⁾；x^(j))＝max(0，IG(x⁽ⁱ⁾；x^(j)；C))，i≠j

In an embodiment of the foregoing text classification feature selection method, further, the IG (x)⁽ⁱ⁾；x^(j)(ii) a C) Expressed as:

wherein, I (x)⁽ⁱ⁾(ii) a C) And I (x)^(j)(ii) a C) And the feature x⁽ⁱ⁾Same formula for mutual information calculation between object classes C, I (x)⁽ⁱ⁾(ii) a C) Represents a feature x⁽ⁱ⁾Mutual information with the target class C; i (x)^(j)(ii) a C) Represents a feature x^(j)Mutual information with the target class C; i ((x)⁽ⁱ⁾，x^(j)(ii) a C) Represents a feature x⁽ⁱ⁾Characteristic x^(j)And the target class C.

In an embodiment of the foregoing text classification feature selection method, further, the I ((x)⁽ⁱ⁾，x^(j)(ii) a C) Expressed as:

wherein, c_kThe kth class, p (x), representing the target class C⁽ⁱ⁾，x^(j)Ck) denotes the feature x⁽ⁱ⁾Characteristic x^(j)And class c_kProbability of co-occurrence, p ((x)⁽ⁱ⁾，x^(j))|c_k) Is shown at c_kFeature in class x⁽ⁱ⁾And feature x^(j)Probability of co-occurrence, p (x)⁽ⁱ⁾，x^(j)) Represents a feature x⁽ⁱ⁾And feature x^(j)Along with the probability of occurrence in the feature set S.

In this embodiment, preferably, the feature x⁽ⁱ⁾Characteristic x^(j)And class c_kProbability of simultaneous occurrence p (x)⁽ⁱ⁾，x^(j)，c_k) From c_kFeatures x in Category File⁽ⁱ⁾And feature x^(j)The frequency of the corresponding word appearing in all the files at the same time is approximated, namely:

wherein,represents a feature x⁽ⁱ⁾And feature x^(j)Wherein the corresponding object class is c_kOf the m-th element (i.e., feature x)⁽ⁱ⁾And feature x^(j)The m-th c of the word corresponding to the two_kThe smaller value of the number of occurrences in the category file).

In this embodiment, preferably, said_kFeature in class x⁽ⁱ⁾And feature x^(j)Probability of simultaneous occurrence p ((x)⁽ⁱ⁾，x^(j))|c_k) From the feature x⁽ⁱ⁾And feature x^(j)Corresponding word is in c_kThe frequency of simultaneous occurrences in the category file is approximated as:

in this embodiment, preferably, the feature x⁽ⁱ⁾And feature x^(j)Probability p (x) of simultaneous occurrence in feature set S⁽ⁱ⁾) From the feature x⁽ⁱ⁾And feature x^(j)The frequency of the corresponding word appearing in all the files at the same time is approximated, namely:

in an embodiment of the foregoing text classification feature selection method, further, the sensitivity Sen (x)⁽ⁱ⁾) Expressed as:

Sen(x⁽ⁱ⁾)＝R_c(x⁽ⁱ⁾)+αmin(R_x(x⁽ⁱ⁾；x^(j)))

+βmax(S_x(x⁽ⁱ⁾；x^(j)))，j≠i

In this embodiment, as shown in fig. 4, as an alternative embodiment, the candidate set S is calculated_selAnd exclusion set S_excThe sensitivity Sen between the features in (1) is compared with a preset threshold th, and the candidate set S is subjected to comparison according to the threshold th_selAnd exclusion set S_excThe adjustment (step 3) includes:

Step 33: if the feature x^(m)Is x^(k)Then x is^(m)Adding pending set S_tbd；

Step 34: if the feature x^(m)Is the candidate set S_selLast feature in and S to be set_tbdIf it is empty, go to step 36; if to be set S_tbdIf not empty, let x^(j)For a pending set S_tbdStep 35; if the feature x^(m)Not the candidate set S_selLast inA feature x is then set^(m)Set as candidate set Sse_lTo the next feature, go back to step 32;

Sen(x^(j))＝R_c(x^(j))+αmin(R_x(x^(j)；x⁽ⁿ⁾))

+βmax(S_x(x^(j)；x⁽ⁿ⁾))，x⁽ⁿ⁾∈S，n≠j，n≠k

In this embodiment, according to steps 31-36, a candidate set S is calculated_selAnd exclusion set S_excThe sensitivity Sen between the features in (1) is compared with a preset threshold th, and the candidate set S is subjected to comparison according to the threshold th_selAnd exclusion set S_excAdjusting to obtain a new candidate set S_selAnd exclusion set S_excThe effect of the removal or addition of features on the classification result can be reduced.

In this embodiment, the redundancy R_xThe weight α may have a default value of 0.5, the degree of agreement S_xThe weight β can be set to 0.5 as a default, the preset threshold th can be set to 0.01 as a default, and the redundancy R_xWeight α, degree of synergy S_xThe weight β and the preset threshold th are optimized and updated through a genetic algorithm in the subsequent training and testing processes.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A text classification feature selection method, comprising:

And step 3: computing a candidate set S_selAnd exclusion set S_excThe sensitivity Sen between the features in (1) is compared with a preset threshold th, and the candidate set S is subjected to comparison according to the threshold th_selAnd exclusion set S_excAdjusting;

wherein the redundancy R_xExpressed as:

R_x(x⁽ⁱ⁾；x^(j))＝min(0，IG(x⁽ⁱ⁾；x^(j)；C))，i≠j

wherein, IG (x)⁽ⁱ⁾；x^(j)(ii) a C) Representing the ith feature x in the feature set S⁽ⁱ⁾With the jth feature x^(j)Gain of correlation between, R_x(x⁽ⁱ⁾；x^(j)) Represents a feature x⁽ⁱ⁾And feature x^(j)Redundancy between, R_x(x⁽ⁱ⁾；x^(j)) Is 0 and the smaller of the correlation gain;

wherein the degree of agreement S_xExpressed as:

S_x(x⁽ⁱ⁾；x^(j))＝max(0，IG(x⁽ⁱ⁾；x^(j)；C))，i≠j

wherein, IG (x)⁽ⁱ⁾；x^(j)(ii) a C) Representing the ith feature x in the feature set S⁽ⁱ⁾With the jth feature x^(j)Gain of correlation between, S_x(x⁽ⁱ⁾；x^(j)) Represents a feature x⁽ⁱ⁾And feature x^(j)Degree of coordination between, S_x(x⁽ⁱ⁾；x^(j)) Is 0 and the larger of the correlation gain;

wherein the sensitivity Sen (x)⁽ⁱ⁾) Expressed as:

Sen(x⁽ⁱ⁾)＝R_c(x⁽ⁱ⁾)+αmin(R_x(x⁽ⁱ⁾；x^(j)))+βmax(S_x(x⁽ⁱ⁾；x^(j)))，j≠i

2. The text classification feature selection method according to claim 1, wherein the step 1 comprises:

3. The text classification feature selection method of claim 2, wherein I (x)⁽ⁱ⁾(ii) a C) Expressed as:

wherein, c_kThe kth class, p (x), representing the target class C⁽ⁱ⁾，c_k) Represents a feature x⁽ⁱ⁾And class c_kThe probability of the simultaneous occurrence of the two,p(x⁽ⁱ⁾|c_k) Is shown at c_kFeature in class x⁽ⁱ⁾Probability of occurrence, p (x)⁽ⁱ⁾) Represents a feature x⁽ⁱ⁾Probability of occurrence in the feature set S.

4. The text classification feature selection method of claim 1, wherein the IG (x)⁽ⁱ⁾；x^(j)(ii) a C) Expressed as:

IG(x⁽ⁱ⁾；x^(j)；C)＝I((x⁽ⁱ⁾，x^(j))；C)-I(x⁽ⁱ⁾；C)-I(x^(j)；C)

wherein, I (x)⁽ⁱ⁾(ii) a C) Represents a feature x⁽ⁱ⁾Mutual information with the target class C; i (x)^(j)(ii) a C) Represents a feature x^(j)Mutual information with the target class C; i ((x)⁽ⁱ⁾，x^(j)) (ii) a C) Represents a feature x⁽ⁱ⁾Characteristic x^(j)And the target class C.

5. The method of claim 4, wherein I ((x) is selected⁽ⁱ⁾，x^(j)) (ii) a C) Expressed as:

6. The text classification feature selection method according to claim 1, wherein the step 2 includes:

7. The text classification feature selection method according to claim 1, wherein the step 3 comprises:

Step 33: if the feature x^(m)Is x^(k)Then x is^(m)Adding pending set S_tbd；

Sen(x^(j))＝R_c(x^(j))+αmin(R_x(x^(j)；x⁽ⁿ⁾))+βmax(S_x(x^(j)；x⁽ⁿ⁾))，x⁽ⁿ⁾∈S，n≠j，n≠k