CN108038511A - Cluster hypothesis is corrected to be unified into constraining semisupervised classification method - Google Patents
Cluster hypothesis is corrected to be unified into constraining semisupervised classification method Download PDFInfo
- Publication number
- CN108038511A CN108038511A CN201711421475.8A CN201711421475A CN108038511A CN 108038511 A CN108038511 A CN 108038511A CN 201711421475 A CN201711421475 A CN 201711421475A CN 108038511 A CN108038511 A CN 108038511A
- Authority
- CN
- China
- Prior art keywords
- function
- membership
- semi
- membership function
- samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
Cluster hypothesis is corrected the invention discloses one kind to be unified into constraining semisupervised classification method, it is related to a kind of semi-supervised learning algorithm.Its step of is:The class degree of membership of unmarked sample is initialized by FCM methods, selects appropriate parameter lambda1, λ2And the α, membership function v (x), fresh target function M of initialization are calculated according to formula, judge whether to reach stopping criterion for iteration, if, then return to membership function v (x), and categorised decision function f (x) is obtained according to α, if it is not, then recalculating the α of initialization, membership function v (x), fresh target function M and judging.The cluster hypothesis of the invention that will correct utilizes combination to unlabelled exploration and paired constraint to supervision message, collectively constitute more perfect empiric risk item, so as to further excavate the knowledge that supervision message is included, achieve the purpose that algorithm performance improves, possess the validity and correctness of higher.
Description
Technical Field
The invention relates to a semi-supervised learning algorithm, in particular to a modified clustering hypothesis joint pairwise constraint semi-supervised classification method.
Background
Semi-supervised learning is a learning mode between supervised learning and unsupervised learning, and the basic premise of learning is as follows: in addition to the large number of unlabeled specimens, supervisory information such as class labels is provided for labeled specimens; semi-supervised learning differs from supervised learning in that it can augment the training data set with a large number of unlabeled samples. The main mode of semi-supervised learning is from the perspective of supervised learning, and when labeled samples with supervised information are not enough to train a good model, how to automatically utilize information of a large number of unlabeled samples to assist in improving the performance of the classifier.
Semi-supervised classification generally improves the performance of the classifier from two aspects: on one hand, for marked samples, some efficient learning means are often used to mine knowledge such as supervision information contained in a small amount of marked samples, wherein the method is mainly completed by a method of recommending supervised learning; on the other hand, the unsupervised learning method is used for acquiring data distribution information contained in a large number of unlabeled samples. From the perspective of the exploitation of supervisory information, data class labels are widely used as one of the most common and straightforward a priori knowledge. Paired constraints, also known as must-associate and impossible-associate constraints, are another type of supervisory information that has the advantage of being more flexible and more practical than other supervisory information. In some practical cases, only the pair constraint is given, but the class label data of the sample is not given, in which case the pair constraint is converted from the data label; the data distribution information contained in the unmarked samples is reversely mined mainly by depending on basic assumptions of manifold assumption, clustering assumption and smooth assumption of the three semi-supervised learning. The main idea of the clustering assumption is that "when the sample data are relatively close to each other, they have the same class", and according to this assumption, the classification boundary must pass through the sparse (low density) data as much as possible to avoid dividing the dense sample data points to both sides of the classification decision boundary. On the premise of this assumption, the learning algorithm can analyze the distribution of sample data in the sample space by using a large amount of unlabeled sample data, so as to guide the learning algorithm to adjust the classification boundary, so that the classification boundary passes through an area where the sample data is sparse as much as possible, and finally, the learning performance is very good.
The core idea of the semi-supervised learning method is how to utilize knowledge contained in a small number of marked samples and a large number of unmarked samples to improve the learning capacity of the algorithm, the currently mainstream semi-supervised learning algorithm mainly obtains the knowledge from the unmarked samples to mine the distribution information of data to improve the performance of the classifier, but ignores the deep ploughing utilization of the monitoring information such as the marked samples and the like, loses important information contained in the marked samples to a certain extent, does not realize the maximized utilization of the knowledge, lacks effectiveness and correctness, and has low algorithm performance. For example, an improved clustering hypothesis idea modifies a clustering hypothesis by introducing a membership concept, improves a common clustering hypothesis that samples in the same class cluster have a larger possibility of having the same class label into samples in the same class cluster having a similar membership, and provides a new semi-supervised classification method, namely a semi-supervised classification method (SSCCM) based on class membership on the basis of the common clustering hypothesis, but it can be seen that the SSCCM algorithm is a new semi-supervised classification method, mainly depends on modifying the clustering hypothesis, and does not utilize supervision information. Based on the method, the novel modified clustering hypothesis and pairwise constraint combined semi-supervised classification method is designed by combining the modified clustering hypothesis and the pairwise constraint.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a modified clustering hypothesis combined pairwise constraint semi-supervised classification method, which has higher effectiveness and correctness, further excavates the knowledge contained in the supervision information, improves the algorithm performance, is practical and reliable, and is easy to popularize and use.
In order to achieve the purpose, the invention is realized by the following technical scheme: a modified clustering hypothesis joint pairwise constraint semi-supervised classification method comprises the following steps:
inputting: l marked samplesu unlabeled samplesAn iteration termination threshold epsilon and a maximum iteration time Maxiter;
and (3) outputting: a classification decision function f (x) and a membership function v (x);
(1) Initializing class membership of unlabeled samples by an FCM method, and selecting an appropriate parameter lambda 1 ,λ 2 And calculating initialized alpha and M according to a formula;
(2) Passing through type
Updating alpha;
(3) According to the formula
Updating the membership function v (x);
(4) According to the formula
Updating the target function M;
(5) Judging whether an iteration termination condition is reached, if so, executing the step (6), and if not, returning to the step (2);
(6) And returning the membership function v (x), and obtaining a classification decision function f (x) according to alpha.
The invention has the beneficial effects that: (1) The advantages brought by modifying the clustering hypothesis model are inherited, namely after a sample membership degree concept is introduced, the problem of hard classification of classification boundary cross data by a general classification method is solved by converting the classification boundary cross data into a fuzzy problem, and the boundary cross data can have better fuzzy classification capability.
(2) The pair-wise constraint and the decision function jointly form a more complete experience risk term for the predicted value of the marked sample, and the supervision information is fully utilized. By converting the sample class labels into a pair-wise constraint form and combining the expanded knowledge with the loss function of the modified clustering hypothesis framework, the knowledge contained in the supervision information is further mined, the purpose of improving the performance of the algorithm is achieved, and higher effectiveness and correctness are achieved.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
The technical scheme adopted by the specific implementation mode is as follows: a modified clustering hypothesis joint pairwise constraint semi-supervised classification method comprises the following steps:
inputting: l marked samplesu unlabelled samplesAn iteration termination threshold epsilon and a maximum iteration number Maxiter;
and (3) outputting: a classification decision function f (x) and a membership function v (x);
(1) initializing class membership of unlabeled samples by FCM method, and selecting appropriate parameter lambda 1 ,λ 2 And calculating the initialized alpha according to a formula toAnd M;
(2) passing through type
Updating alpha;
(3) according to the formula
Updating the membership function v (x);
(4) according to the formula
Updating the target function M;
(5) judging whether an iteration termination condition is reached, if so, executing the step (6), and if not, returning to the step (2);
(6) and returning the membership function v (x), and obtaining a classification decision function f (x) according to alpha.
In the specific implementation mode, the exploration of the modified clustering hypothesis on the unmarked condition and the utilization of the paired constraint on the monitoring information are combined, a modified clustering hypothesis and paired constraint semi-monitoring classification method is provided, so that the performance of the classifier can be improved, in the aspect of monitoring information utilization, similar to many mainstream semi-monitoring methods, the SSCCM algorithm utilizes an experience risk control item obtained by a data label through an optimization problem, and if the experience risk control item is combined with the paired constraint, a new experience risk item is formed:
the further utilization of the supervision information can be realized; meanwhile, in the aspect of utilization of unsupervised information, because the SSCCM algorithm introduces a fuzzy membership concept and is used as a weight coefficient of label prediction difference in a specific optimization problem, namely, the third item of the SSCCM algorithm can be regarded as an unsupervised means, the form of the SSCCM algorithm is similar to that of an FCM (fuzzy C-means) target function, the difference measurement in the FCM is mainly based on the weight sum of the distances from the calculated samples to the class centers, the SSCCM algorithm is slightly different, and the weight sum of the sample label prediction difference is mainly calculated, so that the fuzzy division capability of boundary crossing data is realized.
Then, the specific optimization problem of the modified clustering hypothesis joint pairwise constraint semi-supervised classification method is as follows:
in equation (2), the first term mainly controls classifier complexity, the second term controls empirical risk by learning labeled samples and pairwise constraints, and the third term is to explore unlabeled samples in an unsupervised way by introducing the concept of label membership.
Since the variables in equation (2) exist mainly in vector form, their identities can be transformed into matrix form here:
wherein α = [ α = 1 ,α 2 ,…α n ]∈R C×n Is a matrix of lagrange multipliers,is a kernel function matrix, K ll =<φ(X l ),φ(X l )> H ,K lu =<φ(X l ),φ(X u )> H ,K ul =<φ(X u ),φ(X l )> H ,K uu =<φ(X u ),φ(X u )> H 。L k Is Cxn u Matrix and k row elements are all 1-nThe balance being 0,V k Is the label membership matrix for the unlabeled exemplar for the kth class,is a membership matrix V k The diagonal matrix of (a) is,
derivation and proof of a modified clustering hypothesis joint pairwise constraint semi-supervised classification algorithm:
using Lagrange multiplier method, M is divided 1 Taking 0 for the partial derivative of α, the following form is obtained:
solving formula (4) can yield the form α:
thus, the device is provided withThen the modified clustering hypothesis is fixed, and the optimized form of the joint pairwise constrained semi-supervised classification becomes:
0≤v k (x j )≤1,k=1,...,C,j=n l +1,n l +2,...,n.
(6)
similarly, using the lagrange multiplier method, the following equation is obtained:
m in the formula (7) 2 To v is to v k (x j ) The partial derivative takes 0, i.e.:
obtaining:
because ofThen:
thus, for any sample X, its membership function is of the form:
from the optimization problem solution, the data prediction of the modified clustering hypothesis joint pairwise constraint semi-supervised classification algorithm can be utilizedObtaining a decision function, or obtaining the decision function bySolving the membership function to obtain the result. Specifically, if X ∈ X is obtained from f (X) k Then must satisfyIf X ∈ X is to be obtained from V (X) k Then must satisfy
With respect to the consistency of the data predictions for the decision function f (x) and the membership function V (x), the following is set forth: for arbitrary sample x i Its class prediction is obtained by a classification decision functionThe optimization is solved, and can obtainThis means x i Belong to the kth class; and the process of predicting data labels by membership functions is,the solution form of the membership function is as shown in formula (11), x i Belonging to class k is also equivalent to | | | f (x) i )-r k || 2 <||f(x i )-r j || 2 I.e. f (x) i ) T r k >f(x i ) T r j Equivalent to the previousTherefore, for label prediction of any sample, the prediction results obtained by the classification decision function and the membership function are consistent.
The specific implementation mode is based on the utilization of the monitoring information, and based on the modified clustering hypothesis model, the modified clustering hypothesis is combined with the utilization of the monitoring information, a modified clustering hypothesis combined pairwise constraint semi-monitoring classification algorithm is provided, on one hand, the algorithm inherits the fuzzy partition capacity of a membership function to boundary cross data brought by the modified clustering hypothesis and can have better fuzzy partition capacity to the boundary cross data, on the other hand, by converting a sample class label into a pairwise constraint form, combining the expanded knowledge with a loss function of a modified clustering hypothesis frame, combining pairwise constraint information with a predicted value of a decision function to a labeled sample, and jointly forming a more complete experience risk item, so that the knowledge contained in the monitoring information is further mined, the purpose of improving the performance of the algorithm is achieved, and the algorithm has higher effectiveness and correctness and has a wide market application prospect.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (1)
1. A modified clustering hypothesis joint pairwise constraint semi-supervised classification method is characterized by comprising the following steps:
inputting: l marked samplesu unlabeled samplesAn iteration termination threshold epsilon and a maximum iteration time Maxiter;
and (3) outputting: a classification decision function f (x) and a membership function v (x);
(1) Initializing class membership of unlabeled samples by an FCM method, and selecting an appropriate parameter lambda 1 ,λ 2 And calculating initialized alpha and M according to a formula;
(2) Passing through type
Updating alpha;
(3) According to the formula
Updating the membership function v (x);
(4) According to the formula
Updating the target function M;
(5) Judging whether an iteration termination condition is reached, if so, executing the step (6), and if not, returning to the step (2);
(6) And returning the membership function v (x), and obtaining a classification decision function f (x) according to alpha.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711421475.8A CN108038511A (en) | 2017-12-25 | 2017-12-25 | Cluster hypothesis is corrected to be unified into constraining semisupervised classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711421475.8A CN108038511A (en) | 2017-12-25 | 2017-12-25 | Cluster hypothesis is corrected to be unified into constraining semisupervised classification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108038511A true CN108038511A (en) | 2018-05-15 |
Family
ID=62101070
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711421475.8A Pending CN108038511A (en) | 2017-12-25 | 2017-12-25 | Cluster hypothesis is corrected to be unified into constraining semisupervised classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108038511A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612735A (en) * | 2020-04-08 | 2020-09-01 | 杭州电子科技大学 | Lung nodule image classification method based on information fusion safety semi-supervised clustering |
CN111695612A (en) * | 2020-05-26 | 2020-09-22 | 东南大学 | Semi-supervised identification method based on clustering |
CN112766382A (en) * | 2021-01-22 | 2021-05-07 | 南京航空航天大学 | Method for representing and correcting condition distribution deviation between two groups of data |
CN113052512A (en) * | 2021-05-12 | 2021-06-29 | 中国工商银行股份有限公司 | Risk prediction method and device and electronic equipment |
CN113435900A (en) * | 2021-07-12 | 2021-09-24 | 中国工商银行股份有限公司 | Transaction risk determination method and device and server |
CN114266321A (en) * | 2021-12-31 | 2022-04-01 | 广东泰迪智能科技股份有限公司 | Weak supervision fuzzy clustering algorithm based on unconstrained prior information mode |
CN113435900B (en) * | 2021-07-12 | 2024-11-15 | 中国工商银行股份有限公司 | Transaction risk determination method, device and server |
-
2017
- 2017-12-25 CN CN201711421475.8A patent/CN108038511A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612735A (en) * | 2020-04-08 | 2020-09-01 | 杭州电子科技大学 | Lung nodule image classification method based on information fusion safety semi-supervised clustering |
CN111695612A (en) * | 2020-05-26 | 2020-09-22 | 东南大学 | Semi-supervised identification method based on clustering |
CN112766382A (en) * | 2021-01-22 | 2021-05-07 | 南京航空航天大学 | Method for representing and correcting condition distribution deviation between two groups of data |
CN113052512A (en) * | 2021-05-12 | 2021-06-29 | 中国工商银行股份有限公司 | Risk prediction method and device and electronic equipment |
CN113435900A (en) * | 2021-07-12 | 2021-09-24 | 中国工商银行股份有限公司 | Transaction risk determination method and device and server |
CN113435900B (en) * | 2021-07-12 | 2024-11-15 | 中国工商银行股份有限公司 | Transaction risk determination method, device and server |
CN114266321A (en) * | 2021-12-31 | 2022-04-01 | 广东泰迪智能科技股份有限公司 | Weak supervision fuzzy clustering algorithm based on unconstrained prior information mode |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108038511A (en) | Cluster hypothesis is corrected to be unified into constraining semisupervised classification method | |
Dong-DongChen et al. | Tri-net for semi-supervised deep learning | |
Zhang et al. | A novel automatic modulation classification scheme based on multi-scale networks | |
Lin et al. | Ru-net: Regularized unrolling network for scene graph generation | |
CN113378913B (en) | Semi-supervised node classification method based on self-supervised learning | |
CN110851662B (en) | Heterogeneous information network link prediction method based on meta-path | |
CN103489033A (en) | Incremental type learning method integrating self-organizing mapping and probability neural network | |
CN105404783A (en) | Blind source separation method | |
Li et al. | Class balanced adaptive pseudo labeling for federated semi-supervised learning | |
Wu et al. | Automl with parallel genetic algorithm for fast hyperparameters optimization in efficient iot time series prediction | |
Takezoe et al. | Deep active learning for computer vision: Past and future | |
Feng et al. | A novel community detection method based on whale optimization algorithm with evolutionary population | |
CN115311605B (en) | Semi-supervised video classification method and system based on neighbor consistency and contrast learning | |
CN117272195A (en) | Block chain abnormal node detection method and system based on graph convolution attention network | |
Ghorpade-Aher et al. | PSO based multidimensional data clustering: A survey | |
Guo et al. | THGNCDA: circRNA–disease association prediction based on triple heterogeneous graph network | |
Shi et al. | Competitive ensembling teacher-student framework for semi-supervised left atrium MRI segmentation | |
Yan et al. | Federated learning model training method based on data features perception aggregation | |
Xu et al. | Implementation and performance optimization of dynamic random forest | |
Tian et al. | Distributed learning over networks with graph-attention-based personalization | |
CN115761654B (en) | Vehicle re-identification method | |
Banerjee et al. | Boosting exploration in actor-critic algorithms by incentivizing plausible novel states | |
CN107679326A (en) | A kind of two-value FPRM circuit areas and delay comprehensive optimization method | |
Sun et al. | Robust multi‐user detection based on hybrid Grey wolf optimization | |
CN111860755A (en) | Improved particle swarm algorithm based on regression of support vector machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180515 |