CN111738343A

CN111738343A - An image annotation method based on semi-supervised learning

Info

Publication number: CN111738343A
Application number: CN202010589985.1A
Authority: CN
Inventors: 宫恩来; 杭丽君; 熊攀; 何远彬; 沈磊; 丁明旭; 张尧
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-10-02
Anticipated expiration: 2040-06-24
Also published as: CN111738343B

Abstract

The invention discloses an image labeling method based on semi-supervised learning, which is characterized in that different classifiers are designed aiming at different types of samples, the classifiers are trained by using the labeled partial samples, the results of the different classifiers are voted, and the type with the highest accuracy is selected, so that an unknown sample is labeled. In order to reduce the influence caused by error classification, random linear mixing operation is carried out on the samples in each category obtained by the classifier and the samples in the labeled corresponding categories, so that the error classification result also contains the characteristics of the corresponding category, and a new thought is provided for the field of deep learning and machine learning of semi-supervised learning.

Description

An image annotation method based on semi-supervised learning

技术领域technical field

本发明属于半监督学习领域，涉及一种基于半监督学习进行图像标注方法。The invention belongs to the field of semi-supervised learning, and relates to an image labeling method based on semi-supervised learning.

背景技术Background technique

近年来深度学习技术的引进和机器学习技术的成熟使得计算机视觉领域取得了突破性的进展，诸多计算机视觉领域的传统问题，例如，分类问题，检测问题，语义分割问题得到了更好的解决。然而大部分的计算机视觉任务都是有监督的学习任务，这意味着所有输入数据需要进行人为的标注，人们需要采集相关的图片，并且利用相关的软件对图片进行仔细的标注，这个过程非常耗费人力物力，在目标检测的领域，如coco数据集中包含着八十个类，总共十七万张图片，这些图片每一张都需要人为的逐个找出需要标注的目标并且用矩形框标注出相应的目标，此类大型数据集的标注一般需要几百人花费数个礼拜甚至几个月的时间。人为标注时难免会出现错误，标注中的错误对于之后的训练会造成一些影响，并且纠错也较为困难，因此需要有一种方法能够改善数据集标注的问题，提高标注的准确率并且节省一些人工成本。半监督学习在机器学习领域中一直是一个研究热点，这是一种结合监督学习与无监督学习的方法，这种方法通过一部分的已标注以及大量未标注的文件来对模型进行训练，能够很大程度上减轻工作人员的负担，但是目前的半监督学习方法还无法使得模型训练的结果达到与有监督训练结果相同，因而有监督的学习依然是计算机视觉领域的主流，半监督学习的过程与有监督学习有着一些共通之处，半监督学习需要对部分的数据进行标注，这部分标注好的少量数据可以用于对未标注的数据来进行分类与标注，并且通过目前的技术，少量样本训练出来的模型也拥有较好的分类效果，因此本文提出了一种利用半监督学习方法来进行辅助标注的方法。In recent years, the introduction of deep learning technology and the maturity of machine learning technology have made breakthroughs in the field of computer vision. Many traditional problems in the field of computer vision, such as classification problems, detection problems, and semantic segmentation problems, have been better solved. However, most computer vision tasks are supervised learning tasks, which means that all input data needs to be manually labeled. People need to collect relevant pictures and use relevant software to carefully label the pictures. This process is very expensive. Human and material resources, in the field of target detection, such as the coco data set contains 80 categories, a total of 170,000 pictures, each of these pictures needs to manually find out the targets to be labeled one by one and mark the corresponding objects with a rectangular frame It usually takes several weeks or even months for hundreds of people to label such large datasets. It is inevitable that mistakes will occur during human labeling. Errors in labeling will have some impact on subsequent training, and error correction is also difficult. Therefore, there is a need for a method that can improve the problem of data set labeling, improve the accuracy of labeling and save some labor. cost. Semi-supervised learning has always been a research hotspot in the field of machine learning. It is a method that combines supervised learning and unsupervised learning. This method trains the model through a part of annotated and a large number of unlabeled files. To a large extent, the burden of the staff is reduced, but the current semi-supervised learning methods cannot make the results of model training equal to those of supervised training. Therefore, supervised learning is still the mainstream in the field of computer vision. The process of semi-supervised learning is similar to that of supervised learning. Supervised learning has some commonalities. Semi-supervised learning needs to label part of the data. This part of the labeled data can be used to classify and label the unlabeled data, and through the current technology, a small number of samples are trained. The resulting model also has a good classification effect, so this paper proposes a method for auxiliary labeling using a semi-supervised learning method.

发明内容SUMMARY OF THE INVENTION

为解决上述问题，本发明的技术方案为一种基于半监督学习进行图像标注方法，包括以下步骤：In order to solve the above problems, the technical solution of the present invention is an image labeling method based on semi-supervised learning, which includes the following steps:

包括以下步骤：Include the following steps:

S10，增加背景类：目标为分为A类和B类，先引入由非A或B类的其他类别随机抽样构成的背景类；S10, add background class: the goal is to divide into class A and class B, first introduce the background class composed of random sampling of other classes other than class A or class B;

S20，构建交叉网络分类模型：对M个分类的图片数据进行标注，每两个类之间都要利用深度学习网络训练一个模型，所以一共有M*(M-1)个模型，在不同顺序A、B类训练时选择不同的网络，将已标注数据通过M*(M-1)个模型进行训练，某一类的M-1个独立分类模型记做一个类学习器，共有M个类学习器；S20, build a cross-network classification model: label the M classified image data, and use a deep learning network to train a model between every two classes, so there are a total of M*(M-1) models in different orders. Select different networks for class A and B training, train the labeled data through M*(M-1) models, and record M-1 independent classification models of a certain class as a class learner, with a total of M classes learner;

S30，构成子投票器：将未标注数据通过M个类学习器对类别进行预测，将包含某一类的所有结果构成一个投票子集和，根据类别数量，一共有M个子投票器，每个子投票器里包含2M-2组不同的预测结果，其中每一组都包含对某一确定类的预测，当出现类Avs类B和类Bvs类A重叠情况时，两者训练的网络不同，预测亦不同；S30, constitute a sub-voter: the unlabeled data is used to predict the category through M class learners, and all the results including a certain category are formed into a voting subset sum. According to the number of categories, there are a total of M sub-voters. The voter contains 2M-2 groups of different prediction results, each of which contains a prediction for a certain class. When the class Avs class B and class Bvs class A overlap, the networks trained by the two are different. also different;

S40，根据互斥投票器投票：M个子投票器中的每个子投票器对同一张图片产生2M-2个预测概率，设置一系列规则和阈值，只要子投票器产生的投票结果超过阈值，即认为该样本属于这个集合对应的类别，投票结果只有一个预测类别的图片保留下来，并打上预测标注。S40, vote according to the mutually exclusive voter: each of the M sub-voters generates 2M-2 prediction probabilities for the same picture, and sets a series of rules and thresholds, as long as the voting results generated by the sub-voters exceed the threshold, that is, It is considered that the sample belongs to the category corresponding to this set, and the voting results have only one predicted category of pictures retained, and marked with prediction labels.

S50，基于随机线性混合对错误标签进行修正：新标注的样本与原始同标签样本随机比例混合，抑制错误标注的样本对网络训练带来的干扰。S50, correcting the wrong labels based on random linear mixing: the newly labeled samples are randomly mixed with the original samples with the same label, so as to suppress the interference caused by the wrongly labeled samples to the network training.

优选地，所述投票包括以下步骤：Preferably, the voting includes the following steps:

S41，每个子投票器计算准确度的加权平均，每个分类网络的准确度不同，根据准确度计算相应的权重系数，使得准确度高的模型预测出的结果对最终标注影响更大；S41, each sub-voter calculates the weighted average of the accuracy, the accuracy of each classification network is different, and calculates the corresponding weight coefficient according to the accuracy, so that the result predicted by the model with high accuracy has a greater impact on the final labeling;

S42，每个子投票器集合对图片打分，统计超过阈值的子投票器个数N；S42, each sub-voter set scores the picture, and counts the number N of sub-voters that exceed the threshold;

S43，只保留N＝1的数据，并为其打上预测标签。S43, only keep the data of N=1, and add a prediction label to it.

投票过程中，当出现两个子投票器的概率均超过阈值时，代表样本可能属于两个不同的标签，子投票器判定投票失败，放弃该样本。During the voting process, when the probability of the occurrence of two sub-voters exceeds the threshold, the representative sample may belong to two different labels, and the sub-voter determines that the voting fails and discards the sample.

优选地，所述随机线性混合对同类数据进行融合，不产生新的标签，计算公式为：Preferably, the random linear mixing fuses similar data without generating new labels, and the calculation formula is:

其中x_i为标注好的与x_j同类别的图片，x_j为模型输出的标注文件，β为随机参数，取值范围0～1。Among them, x _i is the labeled image of the same category as x _j , x _j is the labeled file output by the model, and β is a random parameter, ranging from 0 to 1.

优选地，所述阈值为0.95。Preferably, the threshold value is 0.95.

本发明的有益效果如下：本发明针对图片标注中的同时存在少量已标注数据和大量未标注数据，提出一种采用半监督集成学习对大量未标注数据进行预测的方法，实现了对大量未标注数据精准的类别预测，根据深度学习网络对图片数据进行学习时的过程，提出了随机线性混合方法，实现了更加精准的图片标注。The beneficial effects of the present invention are as follows: the present invention proposes a method for predicting a large amount of unlabeled data by adopting semi-supervised ensemble learning for the existence of a small amount of labeled data and a large amount of unlabeled data in image labeling, thereby realizing the For accurate category prediction of data, according to the process of deep learning network learning image data, a random linear hybrid method is proposed to achieve more accurate image annotation.

附图说明Description of drawings

图1为本发明方法具体实施例的基于半监督学习进行图像标注方法的步骤流程图；1 is a flow chart of the steps of an image labeling method based on semi-supervised learning according to a specific embodiment of the method of the present invention;

图2为本发明方法具体实施例的基于半监督学习进行图像标注方法的示意图。FIG. 2 is a schematic diagram of an image labeling method based on semi-supervised learning according to a specific embodiment of the method of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

相反，本发明涵盖任何由权利要求定义的在本发明的精髓和范围上做的替代、修改、等效方法以及方案。进一步，为了使公众对本发明有更好的了解，在下文对本发明的细节描述中，详尽描述了一些特定的细节部分。对本领域技术人员来说没有这些细节部分的描述也可以完全理解本发明。On the contrary, the present invention covers any alternatives, modifications, equivalents and arrangements within the spirit and scope of the present invention as defined by the appended claims. Further, in order to give the public a better understanding of the present invention, some specific details are described in detail in the following detailed description of the present invention. The present invention can be fully understood by those skilled in the art without the description of these detailed parts.

参见图1，为本发明实施例的基于半监督学习进行图像标注方法的步骤流程图，包括以下步骤：Referring to FIG. 1, it is a flowchart of steps of an image labeling method based on semi-supervised learning according to an embodiment of the present invention, including the following steps:

首先为每两个类别的基础分类器引入背景类，来对抗未知类别样本的预测干扰问题。这样，基学习器的二分类模型就转化为三分类模型，三类A-B-背景类的比例为1:1:1，背景类图片的构成方法是从其他类中以等比例随机抽取；First, a background class is introduced for each two-class base classifier to combat the prediction interference problem of unknown class samples. In this way, the two-class model of the base learner is transformed into a three-class model. The ratio of the three types of A-B-background classes is 1:1:1, and the composition method of background class images is randomly selected from other classes in equal proportions;

以M取20为具体实施例，构建该数据标注系统是基于集成学习，对每两个不同类A-B之间用深度学习训练的弱分类器进行集成。最终用已经集成学习后的模型为基础对图片进行标注。对20个分类的图片数据进行标注，每两个类之间都要利用深度学习网络训练一个模型，所以一共有380个模型(20*19)，参与集成的弱分类器之间要尽可能的独立，为了避免出现A-B，B-A类别重复时用同一网络进行训练，对类别建立索引，当A的索引大于B,选择深度学习模型PnasNet(batch_size＝64)，当X的索引小于Y,选择深度学习模型SeNet(batch_size＝32)。不同的batch_size可以使得网络学习到的特征不一致，所以为以上两种情况选择了不同的batch_size。将少量的已标注数据通过380个模型进行训练，某一类的19个独立分类模型记做一个类学习器，所以一共有20个类学习器Taking M as 20 as a specific example, the construction of the data labeling system is based on ensemble learning, which integrates the weak classifiers trained by deep learning between every two different classes A-B. Finally, the image is annotated based on the model that has been integrated and learned. Annotate 20 classified image data, and use a deep learning network to train a model between every two classes, so there are a total of 380 models (20*19), and the weak classifiers involved in the integration should be as much as possible. Independence, in order to avoid the repetition of A-B and B-A categories, the same network is used for training, and the category is indexed. When the index of A is greater than B, the deep learning model PnasNet (batch_size=64) is selected. When the index of X is smaller than Y, deep learning is selected. Model SeNet (batch_size=32). Different batch_sizes can make the features learned by the network inconsistent, so different batch_sizes are selected for the above two cases. A small amount of labeled data is trained through 380 models, and 19 independent classification models of a certain class are recorded as a class learner, so there are a total of 20 class learners

S30，构成子投票器：将未标注数据通过M个类学习器对类别进行预测，将包含某一类的所有结果构成一个投票子集和，根据类别数量，一共有M个投票子集和，每个投票子集和里包含2M-2组不同的预测结果，其中每一组都包含对某一确定类的预测，当出现类Avs类B和类Bvs类A重叠情况时，两者训练的网络不同，预测亦不同；S30, form a sub-voter: predict the category of the unlabeled data through M class learners, and form a voting subset sum with all the results of a certain category. According to the number of categories, there are a total of M voting subset sums, Each voting subset sum contains 2M-2 groups of different prediction results, each of which contains predictions for a certain class. When there is an overlap between class Avs class B and class Bvs class A, the two trained Different networks have different predictions;

将大量的未标注数据通过20个类学器去对类别进行预测。将包含某一类的所有结果构成一个子投票器。所以根据类别数量，一共有20个子投票器，每个子投票器里包含38组不同的预测结果，这38组预测结果每一组都包含着对确定某一类的预测。如上文所述，当出现“类Avs类B”和“类Bvs类A”这种重叠情况时，两者训练的网络是不同的，所以预测也是不同的。A large amount of unlabeled data is passed through 20 classifiers to predict categories. All results containing a class form a subvoter. Therefore, according to the number of categories, there are a total of 20 sub-voters, each sub-voter contains 38 sets of different prediction results, and each of these 38 groups of prediction results contains predictions for a certain category. As mentioned above, when there is an overlap of "class Avs class B" and "class B vs class A", the networks trained by the two are different, so the predictions are also different.

S40，根据互斥投票器投票：M个子投票器中的每个子投票器对同一张图片产生2M-2个预测概率，设置一系列规则和阈值，只要子投票器产生的投票结果超过阈值，即认为该样本属于这个子投票器集合对应的类别，投票结果只有一个预测类别的图片保留下来，并打上预测标注；S40, vote according to the mutually exclusive voter: each of the M sub-voters generates 2M-2 prediction probabilities for the same picture, and sets a series of rules and thresholds, as long as the voting results generated by the sub-voters exceed the threshold, that is, It is considered that the sample belongs to the category corresponding to this sub-voter set, and the voting result has only one predicted category of pictures retained, and marked with the predicted label;

每个子投票器集中都包含对同一张图片38个预测概率，每个子投票器计算准确度的加权平均，每个分类网络的准确度不同，根据准确度计算相应的权重系数，使得准确度高的模型预测出的结果对最终标注影响更大；接着，每个子投票器集合对图片打分，统计超过阈值0.95的子投票器个数N。最后，只保留N＝1的数据，并为其打上预测标签，完成互斥投票预测；Each sub-voter set contains 38 prediction probabilities for the same picture. Each sub-voter calculates the weighted average of the accuracy. The accuracy of each classification network is different. The result predicted by the model has a greater impact on the final annotation; then, each sub-voter set scores the image, and counts the number N of sub-voters that exceed the threshold of 0.95. Finally, only keep the data of N=1, and label it with prediction to complete the prediction of mutually exclusive voting;

S50，基于随机线性混合对错误标签进行修正：新标注的样本与原始同标签样本随机比例混合，抑制错误标注的样本对网络训练带来的干扰；S50, correcting the wrong labels based on random linear mixing: the newly labeled samples are randomly mixed with the original samples with the same label, so as to suppress the interference of the wrongly labeled samples on network training;

新产生的样本标签的正确率能达到94％以上，但仍然有部分标签是错误的，为了修正小部分错误标签对未来可能的网络训练带来的不利影响，该发明提出了RLB数据混合增强算法(Random linear blending)，新标注的样本与原始同标签样本随机比例混合，抑制错误标注的样本对网络训练带来的干扰。The correct rate of newly generated sample labels can reach more than 94%, but some labels are still wrong. In order to correct the adverse effects of a small number of wrong labels on possible future network training, the invention proposes RLB data hybrid enhancement algorithm. (Random linear blending), the newly labeled samples are randomly mixed with the original samples with the same label to suppress the interference caused by the wrongly labeled samples to network training.

投票包括以下步骤：Voting includes the following steps:

随机线性混合对同类数据进行融合，不产生新的标签，计算公式为：Random linear mixing fuses similar data without generating new labels. The calculation formula is:

其中x_i为标注好的与x_j同类别的图片，x_j为模型输出的标注文件，β为随机参数，取值范围0～1。参见图2，示出本发明的方法操作的具体效果。Among them, x _i is the labeled image of the same category as x _j , x _j is the labeled file output by the model, and β is a random parameter, ranging from 0 to 1. Referring to Fig. 2, the specific effect of the method operation of the present invention is shown.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims

1. An image labeling method based on semi-supervised learning is characterized by comprising the following steps:

s10, add background class: the target is divided into A type and B type, and a background type formed by random sampling of other types other than A type or B type is introduced;

s20, constructing a cross network classification model: labeling M classified picture data, wherein a deep learning network is used for training a model between every two classes, so that the number of M (M-1) models is total, different networks are selected during training of A, B classes in different sequences, the labeled data are trained through the M (M-1) models, the M-1 independent classification models of a certain class are labeled as a class learner, and the number of M class learners is total;

s30, forming a sub-voter: predicting classes of unlabelled data through M class learners, forming a sub-voter by using all results containing a certain class, wherein according to the number of classes, the total number of the M sub-voters is M, the M sub-voters form a voter, each sub-voter contains 2M-2 groups of different prediction results, each group of the prediction results contains prediction of a certain class, and when the overlapping condition of Avs class B and Bvs class A occurs, the training networks of the two classes are different, and the predictions are also different;

s40, voting according to the mutual exclusion voter: each sub-voter in the M sub-voters generates 2M-2 prediction probabilities for the same picture, a series of rules and thresholds are set, as long as the voting result generated by the sub-voter exceeds the threshold, the sample is considered to belong to the category corresponding to the set, the voting result is only the picture of one prediction category is reserved, and a prediction label is marked;

s50, correcting the error label based on the random linear mixture: and mixing the newly marked sample with the original sample with the label in a random proportion, and inhibiting the interference of the wrongly marked sample on network training.

2. The method of claim 1, wherein the voting comprises the steps of:

s41, each sub-voter calculates the weighted average of the accuracy, the accuracy of each classification network is different, and the corresponding weight coefficient is calculated according to the accuracy, so that the influence of the result predicted by the model with high accuracy on the final labeling is larger;

s42, scoring the picture by each sub-voter set, and counting the number N of the sub-voters exceeding a threshold value;

s43, only keeping the data with N being 1, and marking a prediction label for the data;

in the voting process, when the probabilities of two sub-voters exceed the threshold, the representative sample may belong to two different labels, and the voter judges that the voting fails and abandons the sample.

3. The method of claim 1, wherein the random linear mixture fuses homogeneous data without generating new labels, and the calculation formula is:

wherein x_iAre marked with x_jPictures of the same category, x_jAnd β is a random parameter which is an annotation file output by the model and has a value range of 0-1.

4. The method of claim 1, wherein the threshold is 0.95.