CN115861333A

CN115861333A - Medical image segmentation model training method and device based on doodling annotation and terminal

Info

Publication number: CN115861333A
Application number: CN202211489694.0A
Authority: CN
Inventors: 王毅; 杨泽帆
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-03-28

Abstract

The invention discloses a medical image segmentation model training method, a medical image segmentation model training device and a medical image segmentation model training terminal based on graffiti labeling. The input medical image is processed by an enhancement operation to obtain a standard enhanced view and a composite enhanced view, the standard enhanced view and the composite enhanced view are respectively used as the input of a double-network structure, and a pseudo label and a prediction label are obtained. A consistency regularization loss that measures the similarity between the pseudo label and the predicted label is computed. During model training, a loss function including consistency regularization is minimized to optimize model parameters. Compared with the prior art, the method has the advantages that the weight-sharing dual-network structure is used for carrying out consistency regularization training, the predicted similarity of different enhanced views of the same image is maximized, the optimization effect is good, network parameter optimization can be completed in one-time training, repeated iterative training is not needed, and the training efficiency is high.

Description

Medical image segmentation model training method and device based on doodling annotation and terminal

Technical Field

The invention relates to the technical field of medical image segmentation, in particular to a medical image segmentation model training method, a medical image segmentation model training device and a medical image segmentation model training terminal based on graffiti labeling.

Background

At present, a data-driven deep learning method is usually adopted during medical image segmentation, a large amount of completely labeled data (namely complete labeling of the shape of a target organ) is needed to train a medical image segmentation model, however, the completely labeled data of the three-dimensional medical image needs to be labeled by professionals in the department of imaging layer by layer, and a large amount of data labeling is expensive, time-consuming and labor-consuming. The doodle labeling has high labeling freedom degree, contains accurate labeling pixels, does not introduce labeling noise, and is suitable for labeling medical anatomical structures. The medical image segmentation model is supervised and trained by using the doodle annotation data, so that the manual annotation burden can be effectively reduced.

At present, there are a variety of doodle annotation-based deep learning training methods for image segmentation, which mainly include: an iterative training method, a training method based on regularization constraint and a training method based on generation of a confrontation network.

The iterative training method can cause the network model to be fitted to wrong labeling information and needs multiple rounds of training, so that the efficiency is low and the optimization effect is poor; the condition that the pixel value difference between adjacent tissues of the medical image is small is very common, and the optimization effect of the training method based on the regularization constraint on the medical image is limited; training methods based on generation of countermeasure networks still require the use of fully labeled data as positive samples for discriminant training and are prone to training collapse.

Therefore, the optimization effect is not good when the medical image segmentation model is trained based on the doodle annotation data in the prior art.

Disclosure of Invention

The invention mainly aims to provide a medical image segmentation model training method based on graffiti annotation, and aims to solve the problem that the optimization effect is poor when a medical image segmentation model is trained based on graffiti annotation data in the prior art.

In order to achieve the above object, the present invention provides a medical image segmentation model training method based on graffiti annotation, wherein the training method comprises:

performing a first image enhancement operation on the medical image subjected to the doodling annotation to obtain a standard enhanced view, and performing a second image enhancement operation on the standard enhanced view to obtain a composite enhanced view;

inputting the standard enhanced view and the composite enhanced view into one network in a double-network structure respectively to obtain a pseudo label and a prediction label, wherein each network in the double-network structure is a medical image segmentation model, and the two medical image segmentation models are shared in weight;

calculating the similarity between the pseudo label and the predicted label to obtain consistency regularization loss;

constructing a penalty function comprising the consistency regularization penalty;

and training the medical image segmentation model according to the loss function until the loss function is converged to obtain the trained medical image segmentation model.

Optionally, the method further includes calculating an entropy minimization loss of the pseudo tag, and constructing the loss function further includes the entropy minimization loss.

Optionally, when constructing the loss function, the weighting processing is performed on the consistency regularization loss and the entropy minimization loss according to an exponential-form ascending function, where an expression of the ascending function is: r (T) = exp (- η (1-T/T)), where T is the training round and T and η are the hyperparameters.

Optionally, the method further includes calculating cross entropy loss of the extraction features of the labeled pixels in the standard enhanced view to obtain partial cross entropy loss; the loss function is constructed to also include partial cross entropy loss.

Optionally, the performing a first image enhancement operation on the medical image after the doodling annotation to obtain a standard enhanced view, and performing a second image enhancement operation on the standard enhanced view to obtain a composite enhanced view includes:

performing geometric enhancement operation and noise enhancement operation on the medical image subjected to the doodling marking to obtain a standard enhanced view;

and carrying out color distortion enhancement operation on the standard enhanced view to obtain a composite enhanced view.

Optionally, a feature pool for storing features of the labeled pixels is further provided, and the training method further includes:

performing feature extraction on marked pixels in the standard enhanced view to obtain hidden features;

obtaining the characteristic of the marked pixel based on the hidden characteristic and the weight of the marked pixel;

dynamically updating the feature pool based on the features of the marked pixels;

based on the same mapping module, calculating partial cross entropy between the hidden features and the doodle labels and cross entropy between a unit matrix formed by the labels of the feature pool and the feature pool, and respectively obtaining auxiliary loss and feature pool loss; the auxiliary penalty and the feature pool penalty are also included in constructing the penalty function.

Optionally, the dynamically updating the feature pool based on the feature of the labeled pixel includes:

and updating the feature pool by adopting a momentum moving average method according to the set momentum coefficient and the feature of the marked pixel.

In order to achieve the above object, the present invention further provides a medical image segmentation model training apparatus based on graffiti annotation, the apparatus comprising:

the enhancement operation module is used for performing first image enhancement operation on the medical image subjected to the doodling annotation to obtain a standard enhancement view and performing second image enhancement operation on the standard enhancement view to obtain a composite enhancement view;

the network mapping module is used for inputting the standard enhanced view and the composite enhanced view into one network in a double-network structure respectively to obtain a pseudo label and a prediction label respectively, each network in the double-network structure is a medical image segmentation model, and the two medical image segmentation models are shared in weight;

the consistency regularization loss module is used for calculating the similarity between the pseudo label and the predicted label to obtain consistency regularization loss;

a loss function module to construct a loss function comprising the consistency regularization loss;

and the network parameter updating module is used for training the medical image segmentation model according to the loss function until the loss function is converged to obtain the trained medical image segmentation model.

In order to achieve the above object, the present invention further provides an intelligent terminal, where the intelligent terminal includes a memory, a processor, and a scribble annotation based medical image segmentation model training program stored in the memory and executable on the processor, and when the scribble annotation based medical image segmentation model training program is executed by the processor, the method implements any one of the steps of the scribble annotation based medical image segmentation model training method.

In order to achieve the above object, the present invention further provides a computer-readable storage medium, where a scribble annotation based medical image segmentation model training program is stored, and when executed by a processor, the scribble annotation based medical image segmentation model training program implements any one of the steps of the scribble annotation based medical image segmentation model training method.

It can be seen from the above that, the invention uses two medical image segmentation models to construct a dual-network structure, weight sharing is performed between the two medical image segmentation models, a standard enhancement view and a composite enhancement view are obtained by performing enhancement operation on a medical image after doodling annotation, then the enhancement views are respectively input into one medical image segmentation model to obtain a pseudo label and a prediction label, consistency regularization loss for measuring the similarity between the pseudo label and the prediction label is calculated, and the medical image segmentation model is trained and optimized according to a loss function including consistency regularization. Compared with the prior art, the method has the advantages that the weight-sharing dual-network structure is used for carrying out consistency regularization training, the predicted similarity of different enhanced views of the same image is maximized in the training process, the optimization effect is good, network parameter optimization is completed in one-time training, repeated iterative training is not needed, and the training efficiency is high.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic illustration of graffiti labeling of abdominal organs and cardiac structures;

FIG. 2 is a schematic flowchart of an embodiment of a medical image segmentation model training method based on graffiti annotation according to the present invention;

FIG. 3 is an architectural diagram of the embodiment of FIG. 2;

FIG. 4 is a schematic diagram of a flow chart for obtaining feature pool loss in the embodiment of FIG. 2;

FIG. 5 is a schematic structural diagram of a medical image segmentation model training device based on graffiti annotation provided by an embodiment of the present invention;

fig. 6 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when …" or "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted depending on the context to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Due to the large amount of data annotation, it is expensive, time consuming and labor intensive. The medical image segmentation model is supervised and trained by weak annotation data, so that the manual annotation burden can be effectively reduced. Statistically, the time for labeling the complete labeling data of each image is about 240 seconds, while the weak labeling data of the labeled image only needs 20 to 35 seconds, thereby saving about 9 to 12 times of the labeling time.

As shown in fig. 1, the graffiti label is a weak label, and compared with a label corresponding to all pixels in the complete label data, the graffiti label data only contains a small number of label pixels, and most pixels have no corresponding label; the doodle labeling has high labeling freedom degree, contains accurate labeling pixels, does not introduce labeling noise, and is suitable for labeling medical anatomical structures.

Because the difference of pixel values between adjacent tissues of the medical image is small, organs or structures with similar pixel value distribution are difficult to distinguish. When a medical image segmentation model is trained by utilizing scribble data, the existing training method has the problems that the fitting to wrong information, the optimization effect is poor and the like due to the fact that wrong labeled pixels are included during iterative updating.

The invention provides a medical image segmentation model training method based on graffiti labeling, which is used for supervising the training of a medical image segmentation model without using completely labeled data and can effectively reduce the cost of medical image data labeling. By using a weight-shared dual-network structure, consistency regularization training is performed to maximize the predicted similarity of different views of the same image during the training process. The optimization effect is good, the network parameter optimization is completed in one training, repeated iterative training is not needed, and the training efficiency is high. The method is suitable for segmenting various medical anatomical structures, such as abdominal organs, head and neck organs, prostate and heart structures and the like.

Exemplary method

The embodiment of the invention provides a medical image segmentation model training method based on graffiti annotation, which is deployed on an intelligent terminal and used for carrying out image segmentation on a medical image of an abdominal organ. Specifically, as shown in fig. 2, the training method includes the following steps:

step S100: performing a first image enhancement operation on the medical image subjected to the doodling annotation to obtain a standard enhanced view, and performing a second image enhancement operation on the standard enhanced view to obtain a composite enhanced view;

in particular, due to the small data set of medical images, there is a general problem of insufficient training samples. In order to expand the training data set, firstly, the medical image after the doodling marking is subjected to image enhancement operation.

In the prior art, for example, an iterative training method, a training method based on regularization constraint, and a training method based on generation of a countermeasure network all perform an image enhancement operation on the same medical image to obtain a single enhanced view. Under the concept of a double-network structure, the invention adopts two different image enhancement operations on the same medical image and limits the semantic consistency between different views, namely, the enhanced views obtained by the two different image enhancement operations can output the same semantic label during the subsequent feature extraction, thereby ensuring the consistency regularization and effectively standardizing the optimization of network parameters.

The following types of image enhancement operations are common: geometric enhancement, noise enhancement, pixel enhancement. Geometric enhancement, also known as spatial enhancement, is used to perform geometric transformation operations on images, such as: zoom, flip, rotate, shear, translate, etc. Pixel enhancement is used to adjust the pixel properties of an image, such as: brightness, contrast, saturation, hue, etc. Noise enhancement is used to add noise points in an image. When the image is subjected to enhancement operation, the image enhancement operation can be combined by multiple types of enhancement operation, and can also be carried out for multiple times under the same type.

In the embodiment, standard enhancement operation is performed on the medical image after the doodling marking, wherein the standard enhancement operation comprises geometric enhancement operation and noise enhancement operation, and a standard enhancement view is obtained; and performing pixel enhancement operation, namely composite enhancement operation on the medical image after the standard enhancement operation to obtain a composite enhancement view. That is, the composite enhancement operation is to apply an additional enhancement operation on the standard enhancement view ω (x).

Referring to fig. 3, the standard enhancement function ω (-) represents a standard enhancement operation, and the image enhancement steps are as follows: firstly, normalizing the medical image to have a mean value of 0 and a variance of 1; then, a plurality of operations are sequentially performed on the image, and the parameter value of each operation is subjected to uniform distribution in a certain range and occurs according to a certain probability, and the method specifically comprises the following steps: scaling, elastic deformation, rotation, horizontal and vertical mirroring, and gaussian noise.

The composite enhancement function β (·) represents a composite enhancement operation, and the present embodiment adopts a color distortion enhancement operation, which applies color distortion enhancement on the consistency training based on the medical image. The method comprises the following specific steps: adjusting the brightness of the image, i.e. adding a certain random value to the pixel intensity value of the image, the random value obeying a uniform distribution U (-0.8, + 0.8); adjusting the contrast of the image, namely multiplying the pixel intensity value of the image by a random value which obeys uniform distribution U (1-0.8, 1+ 0.8), and then limiting the pixel intensity value to the upper limit and the lower limit of the original pixel intensity value; the image is gamma enhanced, i.e. minimum-maximum normalization is applied first, and then the pixel values are raised to the power of a random number that obeys a uniform distribution U (1-0.8, 1+ 0.8). Where δ =1 represents the enhancement intensity. The three data enhancement operations described above are performed sequentially, with 0.8 probability of each operation occurring.

The input image x undergoes a standard data enhancement function ω (-) and a composite enhancement function β (-) to produce two enhanced views: a standard enhancement view ω (x) and a composite enhancement view β ° ω (x).

In the embodiment, the same image generates two different views through the standard enhancement function omega (-) and the composite enhancement function beta (-) and consistency regularization can be ensured to effectively standardize network parameter optimization.

Optionally, in the composite enhancement function, a composite form of one or more color enhancement functions (such as a gaussian blur function or a hybrid enhancement function, etc., or a composite thereof) may be added. The Gaussian blur function smoothes the image by using a Gaussian convolution kernel; the blend enhancement function linearly weights the target image and another random image in the training data set.

Step S200: inputting the standard enhanced view and the composite enhanced view into one network in a double-network structure respectively to obtain a pseudo label and a prediction label, wherein each network in the double-network structure is a medical image segmentation model, and the two medical image segmentation models are shared in weight;

specifically, in the prior art, the step of iteratively optimizing the labeled data is as follows: fixing the labeling data and updating the network parameters; fixing network parameters and updating the marking data. Even if the prediction label of the training image is optimized by the traditional image processing algorithm, the prediction label still contains wrong labeled pixels, so that the network is fitted to wrong information; and two steps of updating network parameters and optimizing predicted labels are required to be carried out for multiple times, processing and accessing operations of multiple training and predicted labels are involved, and the process is complex and time-consuming.

In view of the above problem, the training framework of the present embodiment includes two medical image segmentation models, forming a dual-network structure, where each network includes one medical image segmentation model, as shown in f in fig. 3 _θ (. Cndot.). The network parameter optimization is completed in one training by using a double-network structure, non-iterative training can be realized, the network is allowed to be generalized to multiple labeled data (namely, each training image corresponds to multiple labeled data), and the labeled information of network fitting errors is effectively limited.

In the training process, the network weight parameters of the two medical image segmentation models are the same, namely the weights of the two medical image segmentation models are shared. Inputting the standard enhanced view omega (x) into a medical image segmentation model, wherein the generated prediction result is a pseudo label (namely predicted complete annotation data); and inputting the composite enhanced view beta degrees omega (x) into another medical image segmentation model, and generating a prediction result as a prediction label.

It should be noted that the network type and parameters of the medical image segmentation model are not specifically limited, and may be adjusted according to the attributes of different medical image data sets, and the applicable network types include: "U" shaped network (UNet), "V" shaped network (VNet), residual error network (ResNet), etc.; modifiable network parameters include: the number of network layers, the number of channels of a feature, the convolution kernel size and step size, the normalization algorithm (e.g., batch, layer, or group normalization, etc.), the rectification function (e.g., reLU or LeakyReLU, etc.), the downsampling algorithm (e.g., maximum pooling or step convolution) and scale, the upsampling algorithm (e.g., linear interpolation or transposed convolution) and scale, etc.

Step S300: calculating the similarity between the pseudo label and the predicted label to obtain consistency regularization loss;

step S400: constructing a loss function comprising a consistency regularization loss;

specifically, in the prior art, a training method based on regularization constraint, such as Graph Cut (Graph Cut) and Dense Conditional Random Field (Dense Conditional Random Field), adds a regularization term to a loss function of network training by constructing the regularization term. In the loss function, a partial cross entropy function penalizes the prediction of marked pixels (i.e. graffiti marked pixels) and limits the prediction of unlabeled pixels. Wherein the regularization term mainly adopts: constructing a Normalized Cut (Normalized Cut) as a regularization term; or constructing a dense conditional random field and a segmentation technique (Kernel Cut) combining the dense conditional random field and the normalized Cut as regularization terms. Since the regularization term depends on the regional characteristics of the image (such as the probability density function of pixel values) to distinguish different targets, it is difficult to distinguish organs or structures with similar pixel value distribution; and the smoothness of the prediction labels is ensured by depending on the pixel value difference between the adjacent pixels, namely when the adjacent pixels have different labels, the penalty obtained by the pixel value difference is small, and the penalty obtained by the pixel value difference is large. However, it is very common that the difference of pixel values between adjacent tissues in the medical image is small, and therefore, the regularization term based on graph cut may cause overflow or defect of the edge region of the prediction label (the regularization term is excessively smooth due to the small difference of pixel values). The regularization item based on the dense conditional random field algorithm is optimized on the basis of the prediction label of the network, and the optimization effect of the regularization item is limited due to the fact that the prediction label of the network has error information.

The present embodiment uses the consistency regularization penalty

To predict the similarity between the tag and the pseudo tag. Loss-based on using consistency regularization in network optimization>

To maximize pseudo label->

And the cross entropy similarity of the predicted value of the composite enhancement view is directly used for monitoring the semantic label of the predicted pixel and is superior to a regularization item (the semantic label of the predicted pixel is limited according to the characteristic of the pixel value, and the semantic property of the predicted label is not directly punished). Consistency regularization loss->

Expressed as:

wherein β (·) is a complex enhancement function. In the above formula, the pseudo tag

The gradient may be propagated back for parameter updates.

At present, when consistency regularization is applied, the pseudo labels are detached from the calculation graph (namely, no edge points to the pseudo label in the calculation graph), and gradient cannot be reversely transmitted. The invention extends consistency regularization to an image segmentation algorithm based on graffiti labeling. And applying consistency regularization on each training image, allowing the pseudo label to reversely transfer the gradient, and updating the parameters of the corresponding network. The input of the cross entropy function (the prediction of the pseudo label and the composite enhanced view) in the consistency regularization loss can reversely transfer the gradient, so that the two inputs in the loss are consistent, and the consistency and the effectiveness of the network parameter updating are ensured.

Alternatively, consistency regularization loss may use l in addition to the form of cross-entropy similarity ₁ Loss, mean square error loss, and KL divergence.

Wherein l ₁ The formula for the loss is:

the formula for the mean square error loss is:

the loss of KL divergence is:

consistency regularization loss is a key for limiting semantic consistency of different predictions of the same image, updating network parameters and fitting multiple labels, so that the consistency regularization loss needs to be added into a loss function to train and optimize a network model. Of course, other loss terms may also be included in the loss function to construct the loss function along with the consistency regularization loss.

Further, the method also includes the steps of calculating cross entropy loss of the features extracted by the labeled pixels in the standard enhanced view, obtaining partial cross entropy loss, using the partial cross entropy loss in a loss function, and training a medical image segmentation model to generate a pseudo label by using the partial cross entropy loss.

Using partial cross entropy losses

And training a single network, namely a baseline algorithm of a segmentation technology based on graffiti labeling. Partial cross entropy loss->

The summation item of (1) includes marked pixels (i.e. pixels marked by graffiti), but the pixels without marks are erased (i.e. penalty item of pixels without marks is zero, and gradient reverse transmission is not performed). Partial cross entropy loss->

Expressed as:

wherein,

representing the input image and N the number of pixels in image x.

Representing a graffiti callout, where K is the number of categories (including background categories). In the graffiti annotation y, for a marked pixel, a pixel is flagged>

Is a one-hot coded vector; for unmarked pixels>

Is a zero vector. Accordingly, in this setting, ->

The cross entropy value of the unlabeled pixel in (a) is zero and does not propagate any gradients backwards.

Represents an output vector, where f _θ (. Cndot.) represents a network with a parameter θ, ω (-) is a standard enhancement function, prediction of a standard enhancement view

Is the generated pseudo label. CrossEntropy (p, q) = -p · logsoftmax (q) denotes a cross entropy function, where p and q are K-dimensional vectors. Let q be ^′ = softmax (q), then vector q ^′ Is represented as:

By using the weight-shared dual-network structure to simultaneously use the consistency regularization loss for network optimization and use part cross entropy loss training to generate the pseudo label, the method can well compensate the part cross entropy loss which is singly used

Training network f _θ Poor performance in (. Cndot.).

Further, unlike the low-entropy labels obtained by using predefined processing algorithms (such as setting threshold values and sharpening probability distribution) when consistency regularization is applied, the method uses entropy minimization loss to train the medical image segmentation model end-to-end, so that the medical image segmentation model outputs the low-entropy pseudo labels, and the validity of the pseudo labels is ensured. Under the supervision of the low-entropy label, the decision boundary of the medical image segmentation model is located in a low-density area, and classification is facilitated. Therefore, the present embodiment further comprises calculating an entropy minimization loss of the pseudo label

And entropy minimization of the loss->

For use in the loss function.

In particular, the low entropy pseudo label, i.e. the prediction vector of a pixel, is close to one-hot encoding. By using the low-entropy pseudo label for consistency training, the decision boundary of the model tends to be located in a low-density area, which makes the mapping vectors of different classes of pixels have large differences (i.e. obvious differences). To obtain low-entropy pseudo-tags, entropy minimization loss is used

Uncertainty in the measured probability distribution: when the probability distribution approaches one-hot encoding (i.e., strong certainty), entropy minimization loss @>

The penalty of (2) is small; when the probability distribution tends to be evenly distributed (i.e., the values of the channels tend to be equal and the certainty is poor), the entropy minimization loss->

The penalty of (2) is large. Entropy minimization loss->

Expressed as:

wherein,

is a pseudo label and->

Therefore, the penalty term of the embodiment to the pseudo label in the training process includes partial cross entropy loss

And entropy minimization loss>

A loss function of->

Where r (t) is the ascending function, the expression for the ascending function is: r (T) = exp (- η (1-T/T)), where T is the training round and T and η are the hyperparameters. In the training of the front T round, r (T) rises to 1 from a small positive value, and then the value is kept to be 1 until the training is finished; the speed of the ascending process is controlled by eta, and the ascending process is slow due to the larger eta. In this embodiment, T is set to 80 and η is set to 8.

Step S500: and training the medical image segmentation model according to the loss function until the loss function is converged, and obtaining the trained medical image segmentation model.

Specifically, after the loss function is constructed, a medical image segmentation model is trained by adopting a gradient descent method until the loss function is converged, and the trained medical image segmentation model is obtained. When the medical image segmentation model is used, the medical image of the specific anatomical structure is input into the trained medical image segmentation model, and then the segmentation result of the medical image can be obtained.

From the above, in the embodiment, the two medical image segmentation models are used to construct the dual-network structure, and the two medical image segmentation models are shared in weight, so that the medical image after the doodle annotation is subjected to two standard enhancement operations and a composite enhancement operation, and the same image generates two different views. On the basis, the two enhanced views are respectively input into a medical image segmentation model to obtain a pseudo label and a prediction label, consistency regularization loss used for measuring the similarity between the pseudo label and the prediction label is calculated, and the medical image segmentation model is trained and optimized according to the consistency regularization loss. Due to the adoption of a double-network structure, pseudo label generation and network optimization can be simultaneously carried out in the training process, the network is allowed to be generalized to multiple labels, and multiple iterative training is avoided; in addition, the input (prediction of a pseudo label and a composite enhanced view) of the cross entropy function in the consistency regularization loss can reversely transfer the gradient, so that the two inputs in the loss are relatively consistent, and the consistency and the effectiveness of network parameter updating are ensured; and a low-entropy pseudo label is output by the medical image segmentation model, and a decision boundary of the network is located in a low-density area under the supervision of the low-entropy label, so that the classification is favorably distinguished.

It should be noted that, the medical image segmentation model can be optimized well by using the weight-sharing dual-network structure and the consistency regularization loss of the present invention, wherein the entropy minimization loss, the partial cross entropy loss, and the composite enhancement operation can be combined or added arbitrarily (e.g., all penalty terms are applied or a certain penalty term is used alone), and the performance of training the medical image segmentation model can be improved.

Based on the fact that pixels with labels in the scrawling labeling data are rare and are easy to be over-fitted to the pixels during training, the shape of a final prediction label is influenced (namely, a decision boundary of a model is over-fitted to the scrawling labeling pixels, so that the prediction label of a training image only appears near the scrawling labeling and is irregular in shape).

Specifically, on the above consistency regularization training framework, as shown in fig. 3, the present embodiment further introduces a feature pool after the encoder output of the left network (or in the auxiliary path) to regularize feature learning. As shown in fig. 4, the method comprises the following steps:

step A100: performing feature extraction on marked pixels in the standard enhanced view to obtain hidden features;

step A200: obtaining the characteristic of the marked pixel based on the hidden characteristic and the weight of the marked pixel;

specifically, the features Chi Zhongbao contains an integrated feature vector for each semantic class. These feature vectors are moving averages of momentum with the labeled pixel features. Characteristic pool

Based on encoder characteristics>

Wherein f is _e (. -) represents an encoder.

First of all characteristic f _e (ω (x)) is upsampled and then mapped to hidden features

Consider a set of indices in the image annotation y that point to annotated pixels as @>

Then the feature of the pixel is labeled->

Calculated from the following equation:

wherein s is _ik Is a representation of a pixel

Weight scalar of importance. sim (p, q) = p/| p | q/| q | represents l ₂ Normalized inner product of p and q, i.e. cosine similarity. As can be seen from equation (5), s _ik And M _k And z _i Is inversely proportional to the cosine similarity of (c). This means that the feature M is compared with _k Feature vector z of pixel with low similarity _i With a higher weight.

Step A300: dynamically updating the feature pool based on the features of the marked pixels;

specifically, the formula for feature pool update is as follows:

M _k ←αM _k +(1-α)m _k

wherein

Initialized to zero vector, alpha =0.9 is momentum coefficient, m _k The pixel is labeled for features. Using m in the form of a moving average of momentum during training _k And updating the feature pool.

Step A400: based on the same mapping module, calculating partial cross entropy between hidden features and doodle labels and cross entropy between a unit matrix formed by the labels of the feature pool and the feature pool, and respectively obtaining auxiliary loss and feature pool loss; the loss function is constructed by including the auxiliary loss and the characteristic pool loss.

Specifically, the hidden feature z and the feature pool M are mapped by using the same mapping module g (-), then the mapping result is respectively input into two loss functions to respectively obtain auxiliary loss and feature pool loss, and the auxiliary loss and the feature pool loss are added into the loss functions to optimize the training of the medical image segmentation model.

The two loss functions are auxiliary loss functions

And a characteristic pool loss function>

Wherein +>

And the unit matrix represents the label of the feature pool. Characteristic pool loss>

The weights affecting g (-) and the output matrix g (z), ultimately affect the hidden features z to regularize feature learning.

Optionally, although the feature pool of this embodiment uses the output of the left backbone network encoder as a hidden feature, the feature pool may also use a multi-scale encoder feature at the same time; the regularized feature learning of a feature pool can also be added in the right-side backbone network; other similarity measures and feature pool update approaches may also be used.

After considering the auxiliary loss function and the feature pool loss function, the loss function of this embodiment is

Where r (t) is a rising function, λ ₁ And λ ₂ To balance the factors of importance of the loss term.

In view of the above, unlike the existing iterative training method, the training method based on regularization constraint, and the training method based on generation of a confrontation network, the feature pool introduced in this embodiment can store the integrated features in the form of momentum moving average, and reduce overfitting of network parameters to graffiti labeling pixels by using the feature learning of the integrated feature regularization encoder.

Exemplary device

As shown in fig. 5, corresponding to a medical image segmentation model training method based on a doodle annotation, an embodiment of the present invention further provides a medical image segmentation model training apparatus based on a doodle annotation, where the apparatus includes:

the enhancement operation module 600 is configured to perform a first image enhancement operation on the medical image subjected to the graffiti annotation to obtain a standard enhanced view, and perform a second image enhancement operation on the standard enhanced view to obtain a composite enhanced view;

a network mapping module 610, configured to input each of the standard enhanced view and the composite enhanced view into one network in a dual-network structure, and obtain a pseudo tag and a prediction tag respectively, where each network in the dual-network structure is a medical image segmentation model, and weights of the two medical image segmentation models are shared;

a consistency regularization loss module 620, configured to calculate a similarity between the pseudo label and the predicted label to obtain a consistency regularization loss;

a penalty function module 630 for constructing a penalty function comprising the consistency regularization penalty;

and the network parameter updating module 640 is configured to train the medical image segmentation model according to the loss function until the loss function converges, so as to obtain the trained medical image segmentation model.

In this embodiment, the medical image segmentation model training device based on the doodle annotation may refer to the corresponding description in the medical image segmentation model training method based on the doodle annotation, which is not described herein again.

Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 6. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a medical image segmentation model training program based on graffiti annotations. The internal memory provides an environment for running an operating system in a nonvolatile storage medium and a medical image segmentation model training program based on the doodle annotation. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. When being executed by a processor, the medical image segmentation model training program based on the doodle annotation realizes the steps of any one of the medical image segmentation model training methods based on the doodle annotation. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.

It will be understood by those skilled in the art that the block diagram shown in fig. 6 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.

In one embodiment, an intelligent terminal is provided, where the intelligent terminal includes a memory, a processor, and a graffiti annotation based medical image segmentation model training program stored in the memory and executable on the processor, and when executed by the processor, the graffiti annotation based medical image segmentation model training program performs the following operations:

based on the same mapping module, calculating partial cross entropy between the hidden features and the doodle labels and cross entropy between a unit matrix formed by the labels of the feature pool and the feature pool to respectively obtain auxiliary loss and feature pool loss; the auxiliary penalty and the feature pool penalty are also included in constructing the penalty function.

The embodiment of the invention also provides a computer-readable storage medium, wherein a medical image segmentation model training program based on the doodle annotation is stored in the computer-readable storage medium, and when being executed by a processor, the medical image segmentation model training program based on the doodle annotation realizes the steps of any medical image segmentation model training method based on the doodle annotation provided by the embodiment of the invention.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art would appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the above modules or units is only one type of logical function division, and the actual implementation may be implemented by another division manner, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The integrated modules/units described above, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the embodiments of the method when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying the above-described computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier signal, telecommunications signal, software distribution medium, and the like. It should be noted that the contents contained in the computer-readable storage medium can be increased or decreased as required by legislation and patent practice in the jurisdiction.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. The medical image segmentation model training method based on the doodle annotation is characterized by comprising the following steps:

2. The graffiti annotation based medical image segmentation model training method of claim 1, further comprising calculating an entropy-minimized loss of the pseudo labels, and constructing the loss function further comprises entropy-minimized loss.

3. The graffiti annotation based medical image segmentation model training method according to claim 2, wherein, when constructing the loss function, the consistency regularization loss and the entropy minimization loss are weighted according to an exponential-form ascending function, and the expression of the ascending function is: r (T) = exp (- η (1-T/T)), where T is the training round and T and η are the hyperparameters.

4. The graffiti annotation based medical image segmentation model training method of claim 1, further comprising calculating cross-entropy loss of extracted features of annotated pixels in the standard enhancement view, obtaining partial cross-entropy loss; the loss function is constructed to also include partial cross entropy loss.

5. The graffiti annotation based medical image segmentation model training method of claim 1, wherein the performing a first image enhancement operation on the graffiti annotated medical image to obtain a standard enhanced view, and performing a second image enhancement operation on the standard enhanced view to obtain a composite enhanced view comprises:

6. The graffiti annotation based medical image segmentation model training method according to claim 1, wherein a feature pool for storing features of annotation pixels is further provided, the training method further comprising:

performing feature extraction on the marked pixels in the standard enhanced view to obtain hidden features;

7. The graffiti-labeling-based medical image segmentation model training method of claim 6, wherein dynamically updating the feature pool based on the features of labeled pixels comprises:

8. Medical image segmentation model training device based on scribble mark, its characterized in that, the device includes:

9. An intelligent terminal, characterized in that the intelligent terminal comprises a memory, a processor and a graffiti annotation based medical image segmentation model training program stored on the memory and executable on the processor, wherein the graffiti annotation based medical image segmentation model training program, when executed by the processor, implements the steps of the graffiti annotation based medical image segmentation model training method according to any one of claims 1 to 7.

10. A computer-readable storage medium, wherein the computer-readable storage medium has stored thereon a graffiti annotation based medical image segmentation model training program, which when executed by a processor implements the steps of the graffiti annotation based medical image segmentation model training method according to any one of claims 1-7.