CN115018863A

CN115018863A - Image segmentation method and device based on deep learning

Info

Publication number: CN115018863A
Application number: CN202210677793.5A
Authority: CN
Inventors: 刘伟奇; 马学升; 陈金钢; 徐鹏; 赵友源; 陈韵如; 陈磊
Original assignee: Tongxin Zhiyi Technology Beijing Co ltd
Current assignee: Tongxin Zhiyi Technology Beijing Co ltd
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-09-06

Abstract

The invention discloses an image segmentation device and method based on deep learning, which are used for acquiring an original image according to image acquisition parameters through an image acquisition unit, wherein the image acquisition parameters comprise field intensity, slice thickness, voxel size, field of view, echo time, repetition time and diffusion gradient factors; the image preprocessing module is used for preprocessing the original image by utilizing an automatic image threshold algorithm and a linear interpolation algorithm, and taking a first image obtained by preprocessing as a training set, wherein the automatic image threshold algorithm comprises an Otsu binarization algorithm; and the segmentation module is used for training the deep learning neural network according to the training set and segmenting the target area based on the trained deep learning neural network. The invention can automatically and accurately divide the ischemic area, simultaneously avoids the artificial subjective errors of different users and ensures the consistency of evaluation.

Description

Image segmentation method and device based on deep learning

Technical Field

The present application relates to the field of image analysis processing technologies, and in particular, to an image segmentation method and apparatus based on deep learning, a computer device, and a storage medium.

Background

Acute ischemic stroke has high global morbidity and mortality, and about 29.3% of AIS cases are subjected to intracranial macrovascular occlusion to cause ischemia, which mostly occurs in the pre-cerebral circulation. Mortality in LVO increased 4.5-fold compared to other AIS cases. The existing prediction tools aim to identify LVO stroke patients, have low sensitivity and specificity, and can not accurately locate ischemic areas through a machine.

Early CT score (ASPECTS) of Alberta stroke program is currently the most widely used neuroimaging biomarker for assessing LVO, and can be used to assess ischemic lesions in 10 specific sites in the middle cerebral artery. Initially, the score was applied to Computed Tomography (CT) images, and later to Diffusion Weighted Imaging (DWI). The ASPECTS score can be used as a substitute index for ischemia area evaluation, has certain correlation with clinical outcome, can be conveniently and quickly evaluated by doctors, and is widely used for medical decision in clinical practice. However, the ASPECTS score requires a physician to manually evaluate the visualized intracranial images, and subjective variability is likely to exist among different users.

Aiming at the problems that the ischemic region cannot be accurately positioned through a machine, a doctor needs to manually analyze and evaluate a prognosis image, the bias caused by subjective difference is easy to generate, and the accurate evaluation is not facilitated in the related technology, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides an image segmentation device, an image segmentation method, computer equipment and a storage medium based on deep learning, which are used for solving the problems that in the related technology, an ischemic region cannot be accurately positioned through a machine, a doctor needs to manually analyze and evaluate a prognosis image, the bias caused by subjective difference is easy to generate, and the accurate evaluation is not facilitated.

In order to achieve the above object, a first aspect of embodiments of the present invention provides an image segmentation apparatus based on deep learning, including:

the image acquisition unit is used for acquiring an original image according to image acquisition parameters, wherein the image acquisition parameters comprise field intensity, slice thickness, voxel size, field of view, echo time, repetition time and diffusion gradient factors;

the image preprocessing module is used for preprocessing the original image by utilizing an automatic image threshold algorithm and a linear interpolation algorithm, and taking a first image obtained by preprocessing as a training set, wherein the automatic image threshold algorithm comprises an Otsu binarization algorithm;

and the segmentation module is used for training the deep learning neural network according to the training set and segmenting the target area based on the trained deep learning neural network.

Optionally, in a possible implementation manner of the first aspect, the apparatus further includes:

and the data expansion module is used for expanding the data of the training set by carrying out random transformation processing on each image in the training set by using three image transformation methods of scaling, translation and rotation, wherein the scaling and the translation are carried out between-10% and 10% of the image size, and the rotation is carried out between-5 degrees and 5 degrees.

Optionally, in one possible implementation manner of the first aspect, the deep learning neural network includes:

adopting a U-Net model as a segmentation network, wherein an encoder layer and a decoder layer in the U-Net model are transmitted through a jump connection;

replacing all two-dimensional convolution layers in the U-Net model with three-dimensional convolution layers with the filter size of 3 multiplied by 3, wherein the network has a four-stage structure and comprises three down-sampling layers and three up-sampling layers;

and performing residual block processing on input data of each stage, and replacing each coding convolution block and each decoding convolution block with a residual block.

Optionally, in a possible implementation manner of the first aspect, the residual block of the U-Net model includes:

F(x _l )＝f(x)+x _l

x _l+1 ＝F(x _l )

wherein x is _l As an input unit, x _l+1 For the output unit, f (x) is a non-linear mapping function, f (x) consists of 6 mapping layers: single image instance normalization with alpha value 0.2, leaky linear rectification activation, 3 x 3 convolution, instance normalization, further leaky ReLU activation, and 3 x 3 convolution.

In a second aspect of the embodiments of the present invention, there is provided an image classification device based on deep learning, including:

the classification module is used for classifying the target images by using the improved U-Net model as a classification network;

wherein the improved U-Net model comprises:

performing residual block processing on each level of input data, and replacing each coding convolution block and each decoding convolution block with a residual block;

extracting high-level image features from an intermediate layer of the U-Net model to form a path with a 2-layer neural network, adding the path to a bridge layer, processing the path into a preset number of neural network units through three-dimensional global average pooling, and connecting the flattening information of the preset number of neural network units to a classification unit for binary classification.

In a third aspect of an embodiment of the present invention, an image segmentation method based on deep learning is provided, where the method includes:

acquiring an original image according to image acquisition parameters, wherein the image acquisition parameters comprise field intensity, slice thickness, voxel size, field of view, echo time, repetition time and diffusion gradient factors;

preprocessing the original image by using an automatic image threshold algorithm and a linear interpolation algorithm, and taking a first image obtained by preprocessing as a training set, wherein the automatic image threshold algorithm comprises an Otsu binarization algorithm;

and training the deep learning neural network according to the training set, and segmenting the target ischemic region based on the trained deep learning neural network.

Optionally, in a possible implementation manner of the third aspect, after the preprocessing the original image by using an automatic image threshold algorithm and a linear interpolation algorithm, the method further includes:

and (3) carrying out random transformation processing on each image in the training set by using three image transformation methods of scaling, translation and rotation to expand the training set data, wherein the scaling and the translation are carried out between-10% and 10% of the image size, and the rotation is carried out between-5 degrees and 5 degrees.

In a fourth aspect of an embodiment of the present invention, there is provided an image classification method based on deep learning, including:

classifying the target images by using the improved U-Net model as a classification network;

wherein the improved U-Net model comprises:

In a fifth aspect of the embodiments of the present invention, a computer device is provided, which includes a memory and a processor, the memory stores a computer program that can be run on the processor, and the processor executes the computer program to implement the steps in the above method embodiments.

A sixth aspect of the embodiments of the present invention provides a readable storage medium, in which a computer program is stored, and the computer program is used for implementing the steps of the method according to the first aspect and various possible designs of the first aspect of the present invention when the computer program is executed by a processor.

The invention provides an image segmentation device, method, computer equipment and storage medium based on deep learning, which are used for acquiring an original image according to image acquisition parameters through an image acquisition unit, wherein the image acquisition parameters comprise field intensity, slice thickness, voxel size, visual field, echo time, repetition time and diffusion gradient factors; the image preprocessing module is used for preprocessing the original image by utilizing an automatic image threshold algorithm and a linear interpolation algorithm, and taking a first image obtained by preprocessing as a training set, wherein the automatic image threshold algorithm comprises an Otsu binarization algorithm; and the segmentation module is used for training the deep learning neural network according to the training set and segmenting the target area based on the trained deep learning neural network. The invention can automatically and accurately divide the ischemic area, avoid the artificial subjective errors of different users and ensure the consistency of evaluation.

Drawings

Fig. 1 is a structural diagram of an image segmentation apparatus based on deep learning according to an embodiment of the present invention;

fig. 2 is a flowchart of an image segmentation method based on deep learning according to an embodiment of the present invention;

FIG. 3 is a network architecture diagram of a deep learning neural network;

FIG. 4 is a schematic diagram of residual block processing;

fig. 5 is a network structure diagram of the classification network.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that any 1 or any 2 or 3 of A, B, C comprises.

It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Example 1:

the present invention provides an image segmentation device based on deep learning, as shown in figure 1, the structure diagram comprises:

Example 2:

the invention provides an image classification device based on deep learning, comprising:

and the classification module is used for classifying the target image by using the improved U-Net model as a classification network.

Example 3:

the invention provides an image segmentation method based on deep learning, which is shown as a flow chart in figure 2 and comprises the following steps:

and step S110, acquiring an original image according to the image acquisition parameters.

In this step, the original image refers to magnetic resonance diffusion weighted imaging, i.e. DWI image. The image acquisition parameters comprise field intensity of 1.5-3T; slice thickness, 3-6 mm; voxel size, 1.4-2.1 × 1.2-1.7 × 3.0-6.0 mm; field of view, 220 × 220 or 200 × 200; echo time, 77-97 milliseconds; repetition times 3500 and 5400 milliseconds; the diffusion gradient factor (b value) is 1000; calculating an ADC map from the b1000 and b0 images, manually segmenting each ischemic region on the original DWI image with reference to the ADC map, simulating a standard method for estimating the ischemic lesion volume with automatic segmentation software, creating a binary map of brain tissue with an ADC threshold of 620mm2/s, and calculating the ischemic lesion volume after manual artifact removal.

And S120, preprocessing the original image by using an automatic image threshold algorithm and a linear interpolation algorithm, and taking a first image obtained by preprocessing as a training set.

In step S120, after the DWI images are acquired and converted into a neuroimaging informatics technology standard format, b1000 images are separated from each DWI image using the greater body binarization algorithm (automatic image threshold algorithm determined by minimizing the inter-class variance), and then the affine centers of the images are maintained during resampling by resampling the images to a uniform field of view of 224 × 224 × 144mm and a matrix size of 128 × 128 × 32 by applying the linear interpolation algorithm. Thereafter, the signal intensity is concentrated on the average value in the area of the skull-peeled brain, and the data is scaled according to the standard deviation of the signal intensity in the area, thereby obtaining a data distribution in which the average value and the unit variance are 0. Negative voxels are set to zero and positive voxels are half-scaled.

And S130, training the deep learning neural network according to the training set, and segmenting the target ischemic region based on the trained deep learning neural network.

In this step, a training set is configured from the acquired DWI images, and an improved neural network is trained based on the training set, so that the target ischemic region is segmented according to the trained deep learning neural network. The network structure of the deep learning neural network is shown in fig. 3, and a U-Net model is adopted as a segmentation network, wherein an encoder layer and a decoder layer in the U-Net model are transmitted through jump connection (shown by arrows "3" in fig. 3); replacing all two-dimensional convolutional layers in the U-Net model with three-dimensional convolutional layers with the filter size of 3 x 3, wherein the network has a four-stage structure and comprises three down-samples (shown by a '2' arrow in figure 3) and three up-samples (shown by a '2' arrow in figure 3); performing residual block processing (as indicated by the arrow "1" in FIG. 3) on the input data of each stage, and replacing each of the encoded convolutional block and the decoded convolutional block with a residual block, wherein the residual block F (-) of U-Net is defined as follows, x _l Is input, x _l+1 For the ith output unit, f (x) is a nonlinear mapping function:

F(x _l )＝f(x)+x _l

x _l+1 ＝F(x _l )

in the model, f (x) consists of 6 mapping layers: single image instance normalization with an alpha value of 0.2, leaky linear unit (ReLU) activation, 3 x 3 convolution, instance normalization, further leaky ReLU activation, and 3 x 3 convolution, as shown in particular in fig. 4. The encoding uses a 3 × 3 × 3 convolutional layer, the step is 2 × 2 × 2, the decoding uses 2 × 2 × 2 3D upsampling, padding is performed to ensure that the input data and the output data are the same in size (128 × 128 × 32), and the final segmentation output is generated by performing 1 × 1 × 1 convolution using a sigmoid function, so that the number of channels is reduced to one.

In addition, the deep learning neural network can also be used as a classification network to accurately classify and evaluate the prognosis target image, and the classification network is different from the classification network in that: the segmentation network has the most context aggregation information in the middle layer of the U-Net model (i.e. the deepest convolution layer of U), and high-level image features are extracted from the context aggregation information to form a second shorter path with a 2-layer neural network, and the second shorter path is added to the bridging layer (positioned at the bottom of the U). The eigen-channels (size 16 × 16 × 4 × 128) are processed into 128 neural network elements by three-dimensional global mean pooling, and then the flattened information is connected to two elements of the neural network, thereby performing binary classification (i.e., it is pre-binarized to good (score 0-2) or bad (score 3-6) according to the patient's modified Rankin scale 90 days after ischemic stroke). The Softmax activation function was used between the connections and L2 regularization was performed to avoid overfitting.

In one embodiment, after the preprocessing the original image by using the automatic image threshold algorithm and the linear interpolation algorithm, the method further comprises:

and randomly transforming each image in the training set by using three image transformation methods of scaling, translation and rotation to expand the training set data, wherein the scaling and the translation are performed between 10 percent and 10 percent of the image size, and the rotation is performed between 5 degrees and 5 degrees, and the training data can be randomly transformed by the techniques, so that the size of the training sample in the training set is increased by 10 times.

In one embodiment, the method for image segmentation based on deep learning further includes: and judging the segmentation effect of the ischemic region, comparing the segmentation result obtained by using the deep learning neural network with the manual segmentation result, calculating a dice coefficient overlapped between the two segmentation results, and judging that the segmentation effect of the deep learning neural network is better when the dice coefficient is greater than a preset threshold (the threshold can be set autonomously according to actual requirements). The calculation formula is as follows:

where TP, FP and FN are the number of true positive, false positive and false negative voxels.

The image segmentation method based on the deep learning provided by the invention obtains an original image according to image acquisition parameters, wherein the image acquisition parameters comprise field intensity, slice thickness, voxel size, field of view, echo time, repetition time and diffusion gradient factors; preprocessing the original image by using an automatic image threshold algorithm and a linear interpolation algorithm, and taking a first image obtained by preprocessing as a training set, wherein the automatic image threshold algorithm comprises an Otsu binarization algorithm; and training the deep learning neural network according to the training set, and segmenting the target area based on the trained deep learning neural network. The invention can automatically and accurately divide the ischemic area, avoid the artificial subjective errors of different users and ensure the consistency of evaluation.

Example 4:

the invention provides an image classification method based on deep learning, which comprises the following steps:

and classifying the target images by using the improved U-Net model as a classification network.

Wherein the improved U-Net model comprises: replacing all two-dimensional convolution layers in the U-Net model with three-dimensional convolution layers with the filter size of 3 multiplied by 3, wherein the network has a four-stage structure and comprises three down-sampling layers and three up-sampling layers; performing residual block processing on each level of input data, and replacing each coding convolution block and each decoding convolution block with a residual block; extracting high-level image features from an intermediate layer of the U-Net model to form a path with a 2-layer neural network, adding the path to a bridge layer, processing the path into a preset number of neural network units through three-dimensional global average pooling, and connecting the flattening information of the preset number of neural network units to a classification unit for binary classification.

In this embodiment, the classification network is slightly different from the split network in embodiment 3, and is described with reference to fig. 5, specifically as follows:

the classification network and the segmentation network have the same point that: also, a U-Net model is adopted as a segmentation network, wherein an encoder layer and a decoder layer in the U-Net model are transmitted through a jump connection (indicated by a 3 arrow in FIG. 3); replacing all two-dimensional convolutional layers in the U-Net model with three-dimensional convolutional layers with the filter size of 3 x 3, wherein the network has a four-stage structure and comprises three down-samples (shown by a '2' arrow in figure 3) and three up-samples (shown by a '2' arrow in figure 3); each level of input data is subjected to residual block processing (indicated by the "1" arrow in fig. 3), and each of the encoded convolutional block and the decoded convolutional block is replaced with a residual block.

The difference lies in that: the segmentation network has the most context aggregation information in the middle layer of the U-Net model (i.e. the deepest convolution layer of U), and high-level image features are extracted from the context aggregation information to form a second shorter path with a 2-layer neural network, and the second shorter path is added to the bridging layer (positioned at the bottom of the U). The eigen-channels (size 16 × 16 × 4 × 128) are processed into 128 neural network elements by three-dimensional global mean pooling, and then the flattened information is connected to two elements of the neural network, thereby performing binary classification (i.e., it is pre-binarized to good (score 0-2) or bad (score 3-6) according to the patient's modified Rankin scale 90 days after ischemic stroke). The Softmax activation function was used between the connections and L2 regularization was performed to avoid overfitting.

The technical effects are as follows:

the application applies deep learning to obtain advanced imaging characteristics from preprocessing DWI, and utilizes the advanced imaging characteristics to accurately evaluate the prognosis image, and has the following advantages:

the method designs a dual-output deep learning neural network model, can simultaneously segment the ischemic region, further predict the prognosis effect, assist a doctor to quickly locate the ischemic region and accurately evaluate the prognosis.

The method can accurately predict the clinical prognosis effect through the high-level image characteristics obtained by deep learning, and can better provide the clinical long-term prognosis information of the LVO patient compared with the estimation based on the ASPECTS and the ischemic lesion area. The CNN model predicts clinical outcome by deep learning pathological damage characteristics of relatively limited samples, and the activation mode is helpful for inspiring further application of artificial intelligence algorithm in the clinical prediction model.

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to a processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of hardware and software modules.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image segmentation apparatus based on deep learning, comprising:

2. The apparatus according to claim 1, further comprising:

3. The apparatus according to claim 1, wherein the deep learning neural network comprises:

and performing residual block processing on input data of each stage, and replacing each coding convolution block and each decoding convolution block into a residual block.

4. The apparatus according to claim 3, wherein the residual block of the U-Net model comprises:

F(x _l )＝f(x)+x _l

x _l+1 ＝F(x _l )

wherein x is _l Is an input unit, x _l+1 For the output unit, f (x) is a non-linear mapping function, f (x) consists of 6 mapping layers: single image instance normalization with an alpha value of 0.2, leaky linear rectification activation, 3 x 3 convolution, instance normalization, further leaky ReLU activation, and 3 x 3 convolution.

5. An image classification device based on deep learning, comprising:

wherein the improved U-Net model comprises:

6. An image segmentation method based on deep learning is characterized by comprising the following steps:

acquiring an original image according to image acquisition parameters, wherein the image acquisition parameters comprise field intensity, slice thickness, voxel size, visual field, echo time, repetition time and diffusion gradient factors;

preprocessing the original image by using an automatic image threshold algorithm and a linear interpolation algorithm, and taking a first image obtained by preprocessing as a training set, wherein the automatic image threshold algorithm comprises a Dajin binarization algorithm;

7. The method of image segmentation based on deep learning of claim 6, wherein after the pre-processing of the original image by using an automatic image threshold algorithm and a linear interpolation algorithm, further comprising:

8. An image classification method based on deep learning is characterized by comprising the following steps:

wherein the improved U-Net model comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the method of any one of claims 6 to 7 or the steps of the method of claim 8 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 6 to 7 or the steps of the method of claim 8.