CN116206106A - Semi-supervised image segmentation method based on uncertain pseudo tag correction - Google Patents
Semi-supervised image segmentation method based on uncertain pseudo tag correction Download PDFInfo
- Publication number
- CN116206106A CN116206106A CN202310042767.XA CN202310042767A CN116206106A CN 116206106 A CN116206106 A CN 116206106A CN 202310042767 A CN202310042767 A CN 202310042767A CN 116206106 A CN116206106 A CN 116206106A
- Authority
- CN
- China
- Prior art keywords
- labeling
- pseudo
- uncertainty
- training
- mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7753—Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a semi-supervised image segmentation method based on uncertain pseudo tag correction. Training an initialized semantic segmentation model by using a labeling data set to obtain a teacher model; generating a pseudo tag for the non-labeling data set by using a teacher model, forming the non-labeling data set with the pseudo tag, and mixing the non-labeling data set with the original labeling data set to obtain a mixed labeling training set; training a student model by using the mixed labeling training set and the labeling data set through an uncertainty pseudo-label correction algorithm, inputting an original image to be classified into the trained student model, generating a mask segmentation result by the student model, and classifying the original image according to the mask segmentation result. The method of the invention introduces the marked image and the reliable marked information thereof to effectively remove the noise in the pseudo tag, thereby realizing the generation of the training sample of the unmarked image with different scale effects and inhibiting the noise and the error in the pseudo tag from different area scales.
Description
Technical Field
The invention relates to the technical field of image segmentation, in particular to a semi-supervised image segmentation method based on uncertain pseudo label correction.
Background
Image semantic segmentation (semantic segmentation) is an important technical branch in image processing and machine vision technology regarding image understanding, and also in the AI (Artificial Intelligence ) field. The semantic segmentation is to classify each pixel point in the image and determine the class of each image pixel point so as to realize region division. The rapid development of deep learning technology in the past decade has also led to advances in image semantic segmentation technology. A major key to image semantic segmentation technology in the age of deep learning is the subsequent emergence of large-scale fine pixel-level annotation datasets. However, the cost of labeling a large-scale pixel-level labeling dataset is very high, limiting the multi-scene generalization capability of the image semantic segmentation method. The semi-supervised semantic segmentation method aims at alleviating the data dilemma, and data dependence is relieved by using large-scale unlabeled data.
The recent semi-supervised semantic segmentation algorithms can be mainly categorized into two types of methods: consistency regularization (consistency regularization) and entropy minimization (entropy). The consistency regularization method generates similar prediction output for copies of the same unlabeled picture under different transformations through the constraint network, and realizes effective utilization of the unlabeled picture. The consistency regularization method needs complex regularization technology to realize, such as contrast learning and class balancing strategies. And the entropy minimization method is realized by a self-training method (self-training), and the model retraining is carried out by using the pseudo tag without the labeling data. However, this retraining method is prone to overfitting noise in the pseudo tag, resulting in performance degradation.
Currently, a method in the prior art for alleviating the above pseudo tag noise effect includes: and erasing the noise area. The disadvantages of this method are: noise in the pseudo tag is difficult to directly locate, meanwhile, random erasure can also generate erasure of an effective labeling area, effective supervision information in the pseudo tag is lost, and performance is reduced.
Disclosure of Invention
The embodiment of the invention provides a semi-supervised image segmentation method based on uncertain pseudo tag correction, which is used for realizing a semi-supervised image semantic segmentation algorithm with robustness and high generalization on noise.
In order to achieve the above purpose, the present invention adopts the following technical scheme.
A semi-supervised image segmentation method based on uncertain pseudo label correction, comprising:
training the initialized semantic segmentation model by using the labeling data set to obtain a teacher model;
generating a pseudo tag for the non-labeling data set by using the teacher model to form a non-labeling data set with the pseudo tag, and mixing the non-labeling data set with the pseudo tag with the original labeling data set to obtain a mixed labeling training set;
training a student model by using the mixed labeling training set and the labeling data set through an uncertainty pseudo-label correction algorithm to obtain a trained student model;
and inputting the original images to be classified into the trained student model, generating a mask segmentation result by the student model, and classifying the original images according to the mask segmentation result.
Preferably, the training the student model by using the mixed labeling training set and the labeling data set through an uncertainty pseudo-label correction algorithm to obtain a trained student model includes:
sampling a certain number of images from a labeling data set as labeling samples, wherein the labeling samples use artificial labeling masks as supervision information, sampling the same number of images from a non-labeling data set as non-labeling samples, the non-labeling samples use pseudo labels generated by the mixed labeling training set as supervision information, enhancing the non-labeling samples and the corresponding pseudo labels by using an uncertainty pseudo label correction algorithm, calculating pixel-level uncertainty scores of the pseudo labels by using the uncertainty pseudo label correction algorithm, merging the pixel-level uncertainty scores in each region to obtain the uncertainty of each region of the pseudo labels, substituting the pseudo label region with the highest uncertainty, and introducing the accurate mask labels of the labeling samples to generate enhanced samples;
and performing supervision training on the student model by using the labeling sample and the enhancement sample, and restricting training of the student model by using a cross entropy loss function to obtain a trained student model.
Preferably, the enhancing change is performed on the unlabeled exemplar and the corresponding pseudo tag by using an uncertainty pseudo tag correction algorithm, the uncertainty pseudo tag correction algorithm calculates a pixel level uncertainty score for the pseudo tag, combines the pixel level uncertainty scores in each region to obtain the uncertainty of each region of the pseudo tag, and introduces the accurate mask label of the labeling exemplar to replace the region of the pseudo tag with the highest uncertainty, so as to generate the enhanced exemplar, including:
each batch of input of the training student model comprises a labeling sample and a non-labeling sample in a ratio of 1:1, the non-labeling sample uses a noisy pseudo labeling mask generated by the mixed labeling training set as supervision information, an uncertainty pseudo-label correction algorithm calculates uncertainty of each pixel on a pseudo labeling mask matrix of each non-labeling sample and the pseudo labeling mask to obtain an uncertainty matrix, the uncertainty pseudo-label correction algorithm carries out the same block processing on the non-labeling image matrix, the pseudo labeling mask matrix and the uncertainty matrix, each matrix is divided into 16 sub-areas of 4x4 in the same way, the length and the width of each sub-area are 1/4 of the original matrix, and the uncertainty of all pixels in the integrated sub-area is obtained for each sub-area of the uncertainty matrix;
randomly scaling the image of the marked sample and the marked mask with the multiplying power of 0.5-1.0 by the uncertainty pseudo-tag correction algorithm, randomly selecting k non-overlapping areas with the same size scale from the scaled marked sample, and replacing non-marked image sample areas and pseudo-marked mask areas corresponding to k sub-areas with the highest uncertainty by using the k non-overlapping areas with the same size scale;
and selecting different labeling samples for each k value, and generating a plurality of groups of different enhanced non-labeling samples.
Preferably, the inputting the original image to be classified into the trained student model, the student model generating a mask segmentation result, classifying the original image according to the mask segmentation result, includes:
and storing a trained student model, inputting an original image to be classified into the trained student model, generating a mask segmentation result for the input original image by the trained student model, and classifying the original image according to the mask segmentation result. According to the technical scheme provided by the embodiment of the invention, the method of the embodiment of the invention utilizes the uncertainty-guided image content splicing enhancement method to introduce the marked image and the reliable marked information thereof to effectively remove the noise in the pseudo tag; through a plurality of groups of transformation combinations, the method and the device realize the generation of the unlabeled image training samples with different scale effects, and suppress noise and errors in the pseudo tag from different area scales.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a semi-supervised image segmentation method based on uncertain pseudo label correction according to an embodiment of the present invention.
FIG. 2 is a flow chart of a process for modifying UAC algorithm based on uncertainty pseudo tags according to an embodiment of the present invention.
Fig. 3 is a semi-supervised segmentation method effect diagram based on an uncertainty pseudo tag correction method according to an embodiment of the present invention.
Fig. 4 is a comparison of the effects of the semi-supervised image segmentation method (denoted as UAC in the figure) and the fully supervised segmentation method based on uncertain pseudo label correction according to the embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the purpose of facilitating an understanding of the embodiments of the invention, reference will now be made to the drawings of several specific embodiments illustrated in the drawings and in no way should be taken to limit the embodiments of the invention.
The embodiment of the invention provides a Semi-supervised image segmentation method based on uncertain pseudo tag correction (UAC), which has the processing flow shown in figure 1 and comprises the following processing steps:
and step S1, training the initialized semantic segmentation model by using the labeling data set to obtain a teacher model.
The invention is not limited to the specific structure of the semantic segmentation model. In a specific method effect verification part, two kinds of semantic segmentation networks, namely a common deep Lab v & lt3+ & gt and a Mask2Former, are selected. Both semantic segmentation models are encoder-decoder structures, the encoder sections each use a Resnet-101 network, and the decoder sections differ from each other. The deep lab v3+ encoder uses a decoder design based on hole convolution, while the Mask2Former encoder uses a decoder structural design of a transducer structure.
And S2, generating a pseudo tag for the non-labeled data set by using the teacher model in the step S1, forming the non-labeled data set with the pseudo tag, and mixing the non-labeled data set with the pseudo tag with the original labeled data set to obtain a mixed labeled training set.
The mixed labeling training set provides pseudo labels generated by teacher model prediction for original non-labeling data, and the pseudo labels are used as supervision information. These pseudo tags have a high information content but also have a significant noise in a partial area. In the subsequent semi-supervised training process, unlabeled data with pseudo labels in the mixed labeling training set is utilized in the supervised training form of labeling samples, so that training of the student model is realized.
And step S3, training a student model by using the mixed labeling training set.
In each training step, a certain number of image training samples are sampled from the labeling data set to obtain labeling samples, and the labeling samples use artificial labeling masks as supervision information. The same number of unlabeled samples are sampled from the unlabeled dataset, and the unlabeled samples use the pseudo tags generated by the mixed labeling training set as supervision information. The uncertainty pseudo tag correction algorithm UAC provided by the invention is used for carrying out enhancement change on the unlabeled image sample and the corresponding pseudo tag. The UAC algorithm calculates a pixel level uncertainty score for the pseudo tag mask, and combines the pixel level uncertainty scores in each region to obtain the uncertainty of each region of the pseudo tag mask. And replacing the accurate mask mark of the introduced mark sample in the false label mask area with the highest uncertainty, and generating an enhanced sample with less noise. The above-mentioned change process also occurs on the image without the marked sample at the same time, and the replacement is realized on both the image and the pseudo-marked mask.
The enhanced non-labeling sample and the pseudo-labeling mask thereof obtained by the method have less noise information and higher reliability, and can be used for carrying out full-supervision semantic segmentation model training together with the labeling sample. When training the student model, the labeling samples in the original batch input and the non-labeling samples obtained by the enhancement are used for supervision training, the cross entropy loss function is used for constraint, the training of the student model is realized, and the trained student model is obtained.
And S4, saving the trained student model. The student model is the final output model of the semi-supervised semantic segmentation algorithm. And the trained student model generates a mask segmentation result for the input original image, classifies the original image according to the mask segmentation result and can be deployed in various application scenes. In step S1, when the teacher model is trained using the labeling data set, the labeling data sample does not need to use the uncertainty guide image stitching enhancement algorithm UAC;
in step S2, when generating a pseudo tag without label data by using a teacher model, in order to improve the accuracy of the pseudo tag and reduce the influence of overfitting of a segmentation algorithm, scaling and turning processes with different sizes are performed on each non-label image to form a plurality of image samples with different pixels and turned contents. And respectively carrying out category prediction on the plurality of image samples by using a teacher model, and averaging the category prediction results of the plurality of image samples to generate the pseudo tag corresponding to the non-labeling image.
In step S3, a flowchart of a process for correcting a UAC algorithm based on an uncertainty pseudo tag according to an embodiment of the present invention is shown in fig. 2. In this process, each batch of inputs for training the student model includes labeled samples and unlabeled samples in a 1:1 ratio, the unlabeled samples using the noisy pseudo-label mask in S2 as supervisory information. For each unlabeled sample and its pseudo-label mask, the UAC algorithm calculates the uncertainty for each pixel on its pseudo-label mask matrix, resulting in an uncertainty matrix. The UAC algorithm performs the same block processing on the non-labeling image matrix, the pseudo-labeling mask matrix and the uncertainty matrix, each matrix is divided into 16 sub-areas of 4x4 in the same method, the length and the width of each sub-area are 1/4 of the original matrix, and the non-labeling image matrix, the pseudo-labeling mask matrix and the uncertainty matrix are in one-to-one correspondence. And for each subarea of the uncertainty matrix, synthesizing the uncertainty of all pixels in the subarea to obtain the uncertainty of the subarea. The unmarked image sample area, the pseudo-annotation mask area, corresponding to the k sub-areas with the highest uncertainty will be removed in an alternative manner.
The UAC randomly selects a labeling sample from the input labeling samples, the labeling sample uses an artificial labeling mask as supervision information, the image of the labeling sample and the labeling mask are subjected to random scaling with the multiplying power of 0.5-1.0, k non-coincident areas with the same size scale are randomly selected from the scaled labeling sample, and the k areas without the labeling mask and the image in the labeling sample are respectively replaced. Further, the invention uses a plurality of groups of UAC enhancement algorithms with different k values for the unlabeled image samples, and selects different labeled samples for each k value to generate a plurality of groups of different enhanced unlabeled samples. This multiple set of enhancement strategies is more effective for improving the performance of the student model. A set of values for k used in the present invention is 2,3,5.
The effect of the enhanced procedure described above is shown in figure 3. The error operation is that during the training of the image semantic segmentation network, the operation process of the loss function, namely, the error processing is carried out on the class probability distribution map generated by the final prediction and the mask mark of the input sample, and the error value is minimized by back propagation, so that the optimal training is achieved. This error calculation is typically accomplished by a cross entropy loss function (Cross entropy loss).
The general expression for the cross entropy loss function is:
y ic tag for sample i class c, p ic Probability of sample i class c.
In the invention, the error between the prediction of the student model and the pseudo label of the corrected enhancement sample is calculated by using the cross entropy function, and the error value is optimized by training the student model through back propagation, so that the training of the student model is realized.
The UAC semi-supervised image segmentation algorithm is a plug-and-play image data enhancement method, is not limited by a specific network structure, and can achieve an excellent semi-supervised image segmentation effect only by enhancing input data. Meanwhile, repeated training and complex structural design for multiple redundancies are not needed in the algorithm, and a concise and efficient design principle is realized.
Experimental results
(1) Training and testing procedures
Experiments were performed on the Pytorch framework. The basic split network uses the usual deeplabv3+ and mask2former. The experiment data set is based on the same distribution and different distribution of the unlabeled data set, and the experiment is performed on the visual challenge data set PascalVOC and the street view scene identification data set Cityscapes under the same distribution unlabeled data set setting. The Pascal VOC dataset initially consisted of 1464 images for training and $ 1449 images for verification. As with the previous semi-supervised image segmentation algorithm, the present invention uses the SBD edge dataset as an enhanced set, containing 9118 training images, forming a total of 10582 annotated training images. The method not only samples the annotation data from the original annotation training set, but also extracts the annotation data sample from the mixed training set containing 10582 images, and the rest images in the 10582 training set are used as non-annotation data. The Cityscapes dataset contains 2975 training set images, and 500 verification set images. For the two co-distributed data sets, 1/2,1/4,1/8 and 1/16 are respectively extracted from the gorgeous training set in the data set as labeling samples, and a co-distributed semi-supervised image semantic segmentation experiment is carried out. Experimental comparison algorithm an evaluation comparison was made on the paspalvoc and Cityscapes test dataset.
Under the setting of the non-labeling data sets with different distributions, the experiment uses the complete training set of the PascalVOC as the labeling training set, and uses the large-scale image data set MSCOCO training set with different distributions as the large-scale non-labeling data set. Under this heterodistribution semi-supervised semantic segmentation experimental setup, the experiment used 10582 annotated images and 118000 unlabeled images. Under this setting, the experiments were evaluated against the paccalvoc test dataset.
In detail, in the aspect of using a network structure, the invention uses two semantic segmentation network structures of deeplabv3+ and mask2former, and a ResNet-101 network is used as a basic backbone network, so that the fairness comparison of other methods is realized. The experiment adopts the average intersection ratio (MeanIntersection of the union, mIoU) as an evaluation index, and the coincidence degree of the mask outputted by the model and the mask of the real annotation, namely the accuracy, is measured.
(2) Comparison of experimental results
First, we compare the UAC method with the most recently advanced semi-supervised image segmentation algorithm, the comparison method including Mean-Teacher, CCT, GCT, pseudoSeg, CPS, PC 2 Seg,AEL,U 2 PL, st++ (published time from early to late). Wherein, pseudoSeg and ST++ adopt entropy minimization methods, and the rest methods adopt consistency regularization methods. In this section we use ResNet-101 as the underlying backbone network and DeepLab v3+ as the underlying split network structure, achieving a fair contrast with the above approach. We also provide a comparison of Mask2former (M2F) as the basis for partitioning the network。
First, semi-supervised algorithm performance comparison experiments on a PascalVOC dataset, in Table 1, we give an objective performance comparison of the mIoU values, with the best results highlighted in bold. 1/16 (92) in the first row of Table 1 indicates that 1/16 is extracted from the annotation data source as annotation data, the number is 92, the rest are the same, and the following tables 2,3 and 4 also use this representation.
Table 1 semi-supervised image semantic segmentation Performance comparison of Pascal VOC raw annotation dataset as annotation data Source
As shown in Table 1, UAC has obvious leading advantage in the scene of various marked data samples of PascalVOC, which is not realized only under the semi-supervision experimental setting of 1/16 and is slightly behind U 2 PL. Meanwhile, the UAC method has obvious leading advantages under two different network structures, and the effectiveness of the method is verified. Notably, the UAC method is designed by a simple and effective method, not only surpasses a self-training method ST++ with complex training process and time consumption, but also realizes a consistency regularization method CPS, U for complex structural design 2 PL,PC 2 Surmounting of Seg. It is verified that image denoising achieved by the uncertainty-guided image enhancement method can effectively achieve excellent semi-supervised image segmentation performance.
Table 2 semi-supervised image semantic segmentation performance comparison of Pascal VOC hybrid annotation data set as annotation data source
As shown in Table 2, the UAC method also outperforms the multiple comparison method on multiple semi-supervised experimental setups when using a hybrid annotation dataset as the annotation data source. This advantage is consistent with that exhibited in table 1. Notably, at this pointIn a set of semi-supervised experiments, the UAC method uses Mask2former (M2F) as the basis for splitting the network beyond the near-term best method U in all settings 2 PL and st++.
UAC is also connected with U on the street view data set cityscapes data set 2 The PL, st++ method was compared. In this set of comparisons, UAC, ST++, U 2 The PL three uses ResNet-101 as basic backbone network, mask2Former (M2F) as basic split network, and performance comparison is performed on three semi-supervised experimental settings of 1/8,1/4 and 1/2.
TABLE 3 comparison of the semantic segmentation Performance of the Cityscapes dataset with distributed data semi-supervised images
As shown in Table 4, on the Cityscapes dataset, the UAC method leads the comparative method U under three semi-supervised experimental settings of 1/8,1/4, and 1/2 2 PL, st++. Meanwhile, the leading amplitude of the UAC shows a trend of increasing along with the increase of the marked data quantity, and under the 1/2 semi-supervision data setting, the UAC leads U 2 mIoU performance at PL 2.18.
Table 1, table 2 and table 3 show the semantic segmentation performance of the semi-supervised image under the same distribution data, and the UAC has excellent performance on the same distribution non-labeling data. Table 4 shows the performance comparison under the more complex semi-supervised image segmentation task with heterogeneous distribution of unlabeled data.
Table 4 comparison of semantic segmentation Performance of Pascal VOC and MSCOCO heterogeneous distribution non-labeling data semi-supervised image
As shown in Table 4, the annotated training data is from the Pascal VOC training set, the unlabeled training data is from the unevenly distributed MSCOCO data set, and the unlabeled data set has a size of about 10 times the size of the annotated data set. All methods are on top of the validation set of PascalVOCsAnd (5) testing rows. UAC was compared with PseduoSeg method using DeepLab v3+ network and Mask2Former network and U, respectively 2 PL and ST++ are compared, UAC greatly surpasses a comparison method under two networks, and the superior anti-noise and strengthening performances of the method under the condition of non-labeling data of different distribution are verified.
As shown in fig. 4, we show a comparison of the predicted effect of the UAC method proposed by the present invention with the effect of the full-supervision algorithm, the true annotation mask. In fig. 4, the input images, the true annotation mask, the full supervision algorithm effect, the UAC method identical distribution data prediction result and the UAC method abnormal distribution data prediction result are sequentially arranged from left to right. Comparing the predicted mask of the UAC method with the predicted effect and the real mask of the full-supervision algorithm, the UAC method can be found to have better performance on class misclassification and contour smoothness, and compared with the full-supervision algorithm, the UAC method has more obvious promotion.
In summary, the method of the embodiment of the invention utilizes the uncertainty-guided image content stitching enhancement method to introduce the labeling image and the reliable labeling information thereof to effectively remove the noise in the pseudo tag; through a plurality of groups of transformation combinations, the method and the device realize the generation of the unlabeled image training samples with different scale effects, and suppress noise and errors in the pseudo tag from different area scales.
By accurately and effectively replacing noise in the pseudo tag, the method and the device ensure effective information of the pseudo tag, reduce the overfitting phenomenon caused by the pseudo tag noise, and realize stronger anti-noise performance. The method is superior to the classical semi-supervised image segmentation algorithm in various semi-supervised image segmentation performance measurement schemes on the same-distribution non-labeling data and the different-distribution non-labeling data. Meanwhile, the invention is a general semi-supervised image segmentation technology, does not need to modify an image segmentation network, only operates on data, and is theoretically applicable to all current segmentation models.
The method of the embodiment of the invention realizes a heuristic denoising mechanism from the viewpoint of pseudo-tag noise elimination, and effectively relieves the pseudo-tag noise influence in semi-supervised image segmentation; the algorithm is simple and efficient, and can realize efficient semi-supervised image segmentation without complex hyper-parameter adjustment and repeated redundant training processes. Excellent performance is realized in the scene of non-annotation data with the same distribution and different distribution, and the similar method is advanced by the advantage of larger performance. Thanks to the simple and efficient design, the method of the invention realizes the plug-and-play effect and is suitable for various image segmentation networks.
Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.
From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (4)
1. A semi-supervised image segmentation method based on uncertain pseudo label correction, comprising:
training the initialized semantic segmentation model by using the labeling data set to obtain a teacher model;
generating a pseudo tag for the non-labeling data set by using the teacher model to form a non-labeling data set with the pseudo tag, and mixing the non-labeling data set with the pseudo tag with the original labeling data set to obtain a mixed labeling training set;
training a student model by using the mixed labeling training set and the labeling data set through an uncertainty pseudo-label correction algorithm to obtain a trained student model;
and inputting the original images to be classified into the trained student model, generating a mask segmentation result by the student model, and classifying the original images according to the mask segmentation result.
2. The method of claim 1, wherein training the student model using the hybrid annotation training set and the annotation data set by an uncertainty pseudo-tag correction algorithm, resulting in a trained student model, comprises:
sampling a certain number of images from a labeling data set as labeling samples, wherein the labeling samples use artificial labeling masks as supervision information, sampling the same number of images from a non-labeling data set as non-labeling samples, the non-labeling samples use pseudo labels generated by the mixed labeling training set as supervision information, enhancing the non-labeling samples and the corresponding pseudo labels by using an uncertainty pseudo label correction algorithm, calculating pixel-level uncertainty scores of the pseudo labels by using the uncertainty pseudo label correction algorithm, merging the pixel-level uncertainty scores in each region to obtain the uncertainty of each region of the pseudo labels, substituting the pseudo label region with the highest uncertainty, and introducing the accurate mask labels of the labeling samples to generate enhanced samples;
and performing supervision training on the student model by using the labeling sample and the enhancement sample, and restricting training of the student model by using a cross entropy loss function to obtain a trained student model.
3. The method of claim 2, wherein the enhancing the unlabeled exemplar and the corresponding dummy label using an uncertainty dummy label correction algorithm, the uncertainty dummy label correction algorithm calculating a pixel level uncertainty score for the dummy label, merging the pixel level uncertainty scores in each region to obtain the uncertainty of each region of the dummy label, replacing the dummy label region with the highest uncertainty by introducing an accurate mask label of the label exemplar, and generating the enhanced exemplar, comprising:
each batch of input of the training student model comprises a labeling sample and a non-labeling sample in a ratio of 1:1, the non-labeling sample uses a noisy pseudo labeling mask generated by the mixed labeling training set as supervision information, an uncertainty pseudo-label correction algorithm calculates uncertainty of each pixel on a pseudo labeling mask matrix of each non-labeling sample and the pseudo labeling mask to obtain an uncertainty matrix, the uncertainty pseudo-label correction algorithm carries out the same block processing on the non-labeling image matrix, the pseudo labeling mask matrix and the uncertainty matrix, each matrix is divided into 16 sub-areas of 4x4 in the same way, the length and the width of each sub-area are 1/4 of the original matrix, and the uncertainty of all pixels in the integrated sub-area is obtained for each sub-area of the uncertainty matrix;
randomly scaling the image of the marked sample and the marked mask with the multiplying power of 0.5-1.0 by the uncertainty pseudo-tag correction algorithm, randomly selecting k non-overlapping areas with the same size scale from the scaled marked sample, and replacing non-marked image sample areas and pseudo-marked mask areas corresponding to k sub-areas with the highest uncertainty by using the k non-overlapping areas with the same size scale;
and selecting different labeling samples for each k value, and generating a plurality of groups of different enhanced non-labeling samples.
4. A method according to claim 1, 2 or 3, wherein said inputting the original image to be classified into the trained student model, the student model generating a mask segmentation result, classifying the original image according to the mask segmentation result, comprises:
and storing a trained student model, inputting an original image to be classified into the trained student model, generating a mask segmentation result for the input original image by the trained student model, and classifying the original image according to the mask segmentation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310042767.XA CN116206106A (en) | 2023-01-28 | 2023-01-28 | Semi-supervised image segmentation method based on uncertain pseudo tag correction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310042767.XA CN116206106A (en) | 2023-01-28 | 2023-01-28 | Semi-supervised image segmentation method based on uncertain pseudo tag correction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116206106A true CN116206106A (en) | 2023-06-02 |
Family
ID=86508799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310042767.XA Pending CN116206106A (en) | 2023-01-28 | 2023-01-28 | Semi-supervised image segmentation method based on uncertain pseudo tag correction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116206106A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740117A (en) * | 2023-06-09 | 2023-09-12 | 华东师范大学 | Stomach cancer pathological image segmentation method based on unsupervised domain adaptation |
CN117036372A (en) * | 2023-07-24 | 2023-11-10 | 河北大学 | Robust laser speckle image blood vessel segmentation system and segmentation method |
CN117077673A (en) * | 2023-07-17 | 2023-11-17 | 南京工业大学 | Semi-supervised entity alignment method based on noise student self-training |
CN117437426A (en) * | 2023-12-21 | 2024-01-23 | 苏州元瞰科技有限公司 | Semi-supervised semantic segmentation method for high-density representative prototype guidance |
CN117830332A (en) * | 2024-01-09 | 2024-04-05 | 四川大学 | Medical image segmentation method based on weak supervision |
CN117975241A (en) * | 2024-03-29 | 2024-05-03 | 厦门大学 | Directional target segmentation-oriented semi-supervised learning method |
CN118037651A (en) * | 2024-01-29 | 2024-05-14 | 浙江工业大学 | Medical image segmentation method based on multi-teacher network and pseudo-label contrast generation |
CN118115516A (en) * | 2024-03-20 | 2024-05-31 | 山东大学 | Semi-supervised medical image segmentation method and system based on visual language model |
CN118262117A (en) * | 2024-05-30 | 2024-06-28 | 贵州大学 | Semi-supervised medical image semantic segmentation method based on hybrid enhancement and cross EMA |
-
2023
- 2023-01-28 CN CN202310042767.XA patent/CN116206106A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740117A (en) * | 2023-06-09 | 2023-09-12 | 华东师范大学 | Stomach cancer pathological image segmentation method based on unsupervised domain adaptation |
CN116740117B (en) * | 2023-06-09 | 2024-02-06 | 华东师范大学 | Stomach cancer pathological image segmentation method based on unsupervised domain adaptation |
CN117077673A (en) * | 2023-07-17 | 2023-11-17 | 南京工业大学 | Semi-supervised entity alignment method based on noise student self-training |
CN117036372A (en) * | 2023-07-24 | 2023-11-10 | 河北大学 | Robust laser speckle image blood vessel segmentation system and segmentation method |
CN117036372B (en) * | 2023-07-24 | 2024-02-06 | 河北大学 | Robust laser speckle image blood vessel segmentation system and segmentation method |
CN117437426A (en) * | 2023-12-21 | 2024-01-23 | 苏州元瞰科技有限公司 | Semi-supervised semantic segmentation method for high-density representative prototype guidance |
CN117830332A (en) * | 2024-01-09 | 2024-04-05 | 四川大学 | Medical image segmentation method based on weak supervision |
CN118037651A (en) * | 2024-01-29 | 2024-05-14 | 浙江工业大学 | Medical image segmentation method based on multi-teacher network and pseudo-label contrast generation |
CN118115516A (en) * | 2024-03-20 | 2024-05-31 | 山东大学 | Semi-supervised medical image segmentation method and system based on visual language model |
CN118115516B (en) * | 2024-03-20 | 2024-08-30 | 山东大学 | Semi-supervised medical image segmentation method and system based on visual language model |
CN117975241A (en) * | 2024-03-29 | 2024-05-03 | 厦门大学 | Directional target segmentation-oriented semi-supervised learning method |
CN118262117A (en) * | 2024-05-30 | 2024-06-28 | 贵州大学 | Semi-supervised medical image semantic segmentation method based on hybrid enhancement and cross EMA |
CN118262117B (en) * | 2024-05-30 | 2024-08-02 | 贵州大学 | Semi-supervised medical image semantic segmentation method based on hybrid enhancement and cross EMA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116206106A (en) | Semi-supervised image segmentation method based on uncertain pseudo tag correction | |
Song et al. | Bottleneck feature supervised U-Net for pixel-wise liver and tumor segmentation | |
Gao et al. | A mutually supervised graph attention network for few-shot segmentation: The perspective of fully utilizing limited samples | |
Cen et al. | Deep feature augmentation for occluded image classification | |
Liu et al. | Partial convolution based padding | |
Yi et al. | Masked image modeling with denoising contrast | |
Li et al. | Towards better long-tailed oracle character recognition with adversarial data augmentation | |
CN116363489A (en) | Copy-paste tampered image data detection method, device, computer and computer-readable storage medium | |
Alawieh et al. | GAN-SRAF: subresolution assist feature generation using generative adversarial networks | |
Kim et al. | Bridging the gap between model explanations in partially annotated multi-label classification | |
CN114357200A (en) | Cross-modal Hash retrieval method based on supervision graph embedding | |
CN116977844A (en) | Lightweight underwater target real-time detection method | |
Wu et al. | Exploring better target for shadow detection | |
Wu et al. | How many annotations do we need for generalizing new-coming shadow images? | |
CN114818963B (en) | Small sample detection method based on cross-image feature fusion | |
CN114399640B (en) | Road segmentation method and device for uncertain region discovery and model improvement | |
CN113780365B (en) | Sample generation method and device | |
CN115240028A (en) | Small intestinal stromal tumor target self-training detection method utilizing CAM and SAM in parallel | |
Dao et al. | Class enhancement losses with pseudo labels for open-vocabulary semantic segmentation | |
Yang et al. | A novel transformer model with multiple instance learning for diabetic retinopathy classification | |
Tsaniya et al. | Automatic radiology report generator using transformer with contrast-based image enhancement | |
Zhao et al. | FFD Augmentor: Towards Few-Shot Oracle Character Recognition from Scratch | |
CN112785559B (en) | Bone age prediction method based on deep learning and formed by mutually combining multiple heterogeneous models | |
Bai et al. | Robustness-Guided Image Synthesis for Data-Free Quantization | |
Lin et al. | Attention Enhanced Network with Semantic Inspector for Medical Image Report Generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |