CN111369563B - Semantic segmentation method based on pyramid void convolutional network - Google Patents
Semantic segmentation method based on pyramid void convolutional network Download PDFInfo
- Publication number
- CN111369563B CN111369563B CN202010108637.8A CN202010108637A CN111369563B CN 111369563 B CN111369563 B CN 111369563B CN 202010108637 A CN202010108637 A CN 202010108637A CN 111369563 B CN111369563 B CN 111369563B
- Authority
- CN
- China
- Prior art keywords
- convolution
- image
- pyramid
- module
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 34
- 239000011800 void material Substances 0.000 title abstract description 6
- 238000011176 pooling Methods 0.000 claims abstract description 58
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000012360 testing method Methods 0.000 claims abstract description 5
- 230000002776 aggregation Effects 0.000 claims description 20
- 238000004220 aggregation Methods 0.000 claims description 20
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 230000017105 transposition Effects 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 230000000379 polymerizing effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 5
- 239000000284 extract Substances 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000004931 aggregating effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000013585 weight reducing agent Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a semantic segmentation method based on a pyramid hole convolution network, which comprises the following steps of: acquiring a medical image data set containing a real segmentation result, and performing preprocessing operations such as data enhancement on the data set; obtaining shallow image characteristics of the preprocessed image through a residual recursive convolution module and a pooling layer; obtaining deep image characteristics through a network formed by connecting a pyramid pooling module and a hole convolution module in parallel; decoding deep image features through an deconvolution layer, a jump connection and a residual recursive convolution module; inputting the decoding result into a softmax layer to obtain the category of each pixel; training a pyramid cavity convolution network, establishing a loss function, and determining network parameters through training samples; and inputting the test image into the trained pyramid hole convolution network to obtain a semantic segmentation result of the image. The method for pooling the void volume and the pyramid can effectively extract multi-scale semantic information and detail information and improve the network segmentation effect.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a semantic segmentation method based on a pyramid cavity convolution network.
Background
In recent years, with the rapid development of deep learning technology, the application of the deep learning technology in the field of medical image analysis is also becoming wider. Among them, the semantic segmentation technique plays a great role in various application scenarios such as treatment planning, disease diagnosis, pathological research, etc. For medical images, accurate identification of the type of each object in the image requires knowledge background in the professional domain and time consuming for the professional authority. Through the research on the semantic segmentation technology, the input medical image can be automatically and accurately segmented, so that a doctor can conveniently make more accurate judgment and a better treatment plan is designed.
The traditional semantic segmentation algorithm comprises a segmentation method based on watershed, a segmentation method based on clustering and a segmentation method based on statistical characteristics, but with the development of deep learning technology, the semantic segmentation method based on a CNNs model becomes mainstream, and especially with the proposal of an FCN (fuzzy C-channel network), a great deal of door is opened for the development of the semantic segmentation technology, and more researchers propose a plurality of improved semantic segmentation models based on the FCN model. Particularly, the U-Net model has the advantage that the model effect is still good under the condition that the training set is small, so that the U-Net model is widely used in the field of medical image semantic segmentation.
In the encoder structure of the U-Net model, downsampling is carried out in a maximum pooling mode, and pooling can increase the receptive field, so that deeper semantic information can be obtained. However, after pooling, the resolution of the feature map of the image may be reduced accordingly, resulting in loss of detail information. Although multi-scale detail information is acquired by means of hopping connection in the U-Net network, the loss of boundary position information and the reduction of the discrimination capability of the model space are still caused.
In the process proposed by the present invention, at least hole convolution has been found to be widely used because it has the advantage of being able to increase the field of view without causing a reduction in the resolution of the feature image. Meanwhile, in order to further improve the effect of the U-Net model, attention mechanism, pyramid pooling module, recursive convolution, residual connection, dense connection and other technologies are also used to combine with the U-Net model.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a semantic segmentation method based on a pyramid void convolution network, which extracts features of different scales by using a plurality of residual recursive convolution modules, a void convolution module and a pyramid pooling module, and then restores the size of a feature image by using multi-layer up-sampling and jump connection.
The technical purpose of the invention is realized by the following technical scheme:
a semantic segmentation method based on a pyramid hole convolution network comprises a first residual error recursive convolution module, a second residual error recursive convolution module, a pooling layer, a pyramid pooling module, a hole convolution module, an deconvolution layer, a third residual error recursive convolution module, a fourth residual error recursive convolution module and a softmax prediction layer, and the method is structurally connected in the following mode: the first residual recursive convolution module is sequentially connected with the pooling layer, the second residual recursive convolution module and the pooling layer in series, the pyramid pooling module and the cavity convolution module are connected in parallel and then connected with the pooling layer in series, and then the deconvolution layer, the third residual recursive convolution module, the deconvolution layer, the fourth residual recursive convolution module and the softmax prediction layer are sequentially connected in series; the semantic segmentation method comprises the following steps:
s1, acquiring a medical image data set containing a real segmentation result, and performing preprocessing operation on the data set to realize data enhancement;
s2, the preprocessed image sequentially passes through the first residual recursive convolution module, the pooling layer, the second residual recursive convolution module and the pooling layer, the semantic information of the image is extracted in a multi-scale mode, and the shallow image characteristics F are obtained respectively 11 、F 12 、F 21 、F 22 ;
S3, image feature F 22 By a network of pyramid pooling modules and hole convolution modules in parallel, wherein the image features F are 22 Obtaining image characteristics F through a pyramid pooling module 3 Image feature F 22 Obtaining image characteristics F through a hole convolution module 4 (ii) a Image feature F 3 、F 4 Performing aggregation operation channel by channel, and performing convolution layer with convolution kernel of 1 × 1 to obtain deep image feature F 5 Therefore, deep semantic information can be further extracted;
s4, image characteristicsF 5 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 21 Performing channel-by-channel aggregation operation to obtain image feature F 61 (ii) a Then image feature F 61 Obtaining image characteristics F through a third residual error recursive convolution module 62 Wherein, the jump connection directly transmits the shallow feature and carries out channel-by-channel aggregation with the result after passing through the reverse convolution layer; by using the skip connection, more detail information of the original image can be kept in the output image characteristics, so that the boundary of the predicted segmentation image is smoother.
S5, image feature F 62 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 11 Performing channel-by-channel aggregation operation to obtain image feature F 71 (ii) a Then image feature F 71 Obtaining image characteristics F through a fourth residual error recursive convolution module 72 ;
S6, image feature F 72 Inputting the image into a softmax prediction layer to obtain the category of each pixel in the original input image;
s7, training a pyramid cavity convolution network, establishing a loss function, and determining network parameters through training samples;
and S8, inputting the test image to be segmented into the trained pyramid cavity convolution network to obtain the semantic segmentation result of the image.
Further, the preprocessing operation in step S1 includes rotation, slicing, normalization, and adaptive histogram equalization.
Furthermore, the first residual error recursive convolution module, the second residual error recursive convolution module, the third residual error recursive convolution module and the fourth residual error recursive convolution module have the same structure, and each residual error recursive convolution module firstly passes through two recursive convolution layers which are connected in series and then is added with the input in a residual error mode to obtain the output; the recursive convolutional layer has the structure connection of conv, reLU, add, conv and ReLU in sequence, wherein conv is a convolutional layer with a convolution kernel of 3 x 3, and Add is pixel-by-pixel addition with input. The use of residual concatenation may help train deeper networks than the use of ordinary convolutional layers, while the use of recursive convolutional networks may better extract semantic information contained in the image.
Further, the pyramid pooling module in step S3 includes four adaptive average pooling layers with different pooling sizes, and is configured to obtain the image feature F obtained in step S2 from multiple scales 22 The four pooling layers adopt the pooling sizes of N, N/2, N/3 and N/6 respectively, wherein N represents the image feature F 22 The resolution of (2); then, the image features with different sizes obtained by different pooling layers are respectively passed through a convolution layer with convolution kernel of 1 × 1, and then the transposition convolution is carried out to obtain the image features F 22 Image features F of uniform size 31 、F 32 、F 33 、F 34 Then the up-sampling result of each scale and the input image characteristic F are combined 22 Aggregating, and passing the aggregated image features through a convolution layer with convolution kernel of 3 × 3 to obtain image features F 3 I.e. F 3 =Conv(Concatenate(F 22 ,F 31 ,F 32 ,F 33 ,F 34 ) Concatenate is an aggregation operation and Conv is a convolution operation of 3 x 3). By performing pooling operation of multiple scales, detailed information and deeper semantic information contained in the image can be better acquired.
Further, the cavity convolution module in step S3 is formed by connecting three cavity convolution units with different cavity factors in series, the cavity factors of the three cavity convolution units are 1, 2 and 4, respectively, and the sizes of the cavity convolution kernels are all 3 × 3; input image feature F 22 Then, the image characteristics obtained by the three cavity convolution units are respectively F 41 、F 42 、F 43 (ii) a The cavity convolution units are connected in a dense connection mode, wherein the dense connection mode is that the input of each cavity convolution unit is added with the output of the cavity convolution unit to be used as output; after passing through the cavity convolution module, a resolution and an image characteristic F can be obtained 22 Equal image features F 4 ,F 4 =Add(F 22 ,F 41 ,F 42 ,F 43 ) Where Add is a pixel-by-pixel addition operation. By using the hole convolution instead of the common convolution and pooling, deeper semantic information can be acquired by increasing the receptive field, and the problem that the detail information is lost due to the reduction of the resolution ratio caused by the pooling operation can be solved.
Further, the deconvolution layer in step S4 and step S5 is a transposed convolution.
Further, in step S6, end-to-end training is performed on the established pyramid hole convolution network, a random gradient descent algorithm is adopted for the training strategy, and a loss function uses categorical _ cross, and the formula is as follows:
wherein l c Representing a segmented feature map F s Class cross entropy loss of f s Representation feature mapping F s M is a feature map F s K is the number of classes,representing a voxel f s Whether or not it belongs to the category k,representing a voxel f s The probability of belonging to class k.
Compared with the prior art, the invention has the following advantages and effects:
(1) The method adopts the hole convolution module to extract the deep semantic information, and compared with the traditional convolution and pooling mode, the hole convolution module can increase the receptive field and simultaneously can not cause the reduction of the resolution. Meanwhile, the cavity convolution module comprises three cavity convolution layers with different cavity factors, and the three cavity convolution layers are connected in a dense connection mode, so that semantic information can be acquired in multiple scales.
(2) The invention combines and uses the pyramid space pooling module to extract the information of a plurality of scales contained in the image, thereby effectively acquiring the semantic information of deep level and the detail information of shallow level contained in the image.
(3) The invention uses residual recursive convolution to replace the common convolution, thereby helping to train deeper network structure and obtain better feature representation capability of segmentation task.
(4) The residual error recursive convolution, the cavity convolution, the pyramid pooling and other modules contained in the method are an algorithm capable of performing end-to-end training, and compared with a two-stage algorithm, the method is smaller in parameter number and more convenient to train.
Drawings
FIG. 1 is a flow chart of a semantic segmentation method based on a pyramid hole convolution network disclosed by the invention;
FIG. 2 (a) is a schematic diagram of a residual recursive convolution module in an embodiment of the present invention, and FIG. 2 (b) is a schematic diagram of a recursive convolution unit used in FIG. 2 (a);
FIG. 3 is a schematic diagram of a spatial pyramid pooling module in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a hole convolution module according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Embodiment as shown in fig. 1, the present embodiment provides a semantic segmentation method based on a pyramid hole convolutional network, which specifically includes the following steps:
s1, acquiring a medical image data set containing a real segmentation result, and performing data enhancement and other preprocessing operations on the data set; since most medical image data sets have the characteristics of small capacity, low contrast and the like, images in the data sets are firstly rotated, sliced, standardized and subjected to adaptive histogram equalization.
S2, the preprocessed image sequentially passes through the first residual error recursive convolution module, the pooling layer, the second residual error recursive convolution module and the pooling layer, semantic information of the image is extracted in a multi-scale mode, and shallow image features F are obtained respectively 11 、F 12 、F 21 、F 22 . The method specifically comprises the following steps: as shown in fig. 2 (a), the residual recursive convolution module is to pass the input through two cascaded recursive convolution layers, and then add the input and the input in a residual manner to obtain an output; as shown in fig. 2 (b), the recursive convolution units are connected in sequence by conv, reLU, add, conv, and ReLU, where conv is a convolution layer with a convolution kernel of 3 × 3, and Add is a pixel-by-pixel addition to the input; the pooling layer adopts a maximum pooling layer with a step length of 2.
S3, image feature F 22 By a network of pyramid pooling modules and hole convolution modules in parallel, wherein the image features F are 22 Obtaining image characteristics F through a pyramid pooling module 3 Image feature F 22 Obtaining image characteristics F through a cavity convolution module 4 (ii) a Then image feature F 3 、F 4 Performing aggregation operation channel by channel, and performing convolution layer with convolution kernel of 1 × 1 to obtain deep image feature F 5 Therefore, deep semantic information can be further extracted; the method specifically comprises the following steps:
as shown in fig. 3, the pyramid pooling module includes four adaptive average pooling layers (i.e. avgpool in fig. 3) with different pooling sizes, and is used for obtaining the image feature F obtained in step S2 from multiple scales 22 The four pooling layers adopt the pooling sizes of N, N/2, N/3 and N/6 respectively, wherein N represents the image feature F 22 The resolution of (2); then to different pooling layersThe obtained image features with different sizes respectively pass through a convolution layer with convolution kernel of 1 × 1 (i.e. conv 1 × 1 in fig. 3), and then are subjected to transposition convolution (i.e. up-conv in fig. 3), so as to obtain the image features F 22 Image features F of uniform size 31 、F 32 、F 33 、F 34 Then the up-sampling result of each scale and the input image characteristic F are combined 22 Aggregating, and passing the aggregated image features through a convolution layer with convolution kernel of 3 × 3 to obtain image features F 3 I.e. F 3 =Conv(Concatenate(F 22 ,F 31 ,F 32 ,F 33 ,F 34 ) Concatenate is an aggregation operation and Conv is a convolution operation of 3 x 3).
As shown in fig. 4, the hole convolution module is formed by connecting three hole convolution units with different hole factors in series, the hole factors of the three hole convolution units are 1, 2 and 4 respectively, and the sizes of the hole convolution kernels are all 3 × 3; input image feature F 22 Then, the image characteristics obtained by the three cavity convolution units are respectively F 41 、F 42 、F 43 (ii) a The hole convolution units are connected in a dense connection mode, and the dense connection mode is that the input of each hole convolution unit is added with the output of the hole convolution unit to be used as output; after passing through the cavity convolution module, a resolution and an image characteristic F can be obtained 22 Equal image features F 4 ,F 4 =Add(F 22 ,F 41 ,F 42 ,F 43 ) Where Add is a pixel-by-pixel addition operation.
In this embodiment, the channel-by-channel aggregation operation means that the aggregation operation is performed in the channel dimension, that is, the image feature F is assumed 3 The number of channels of is C 1 Image feature F 4 The number of channels is C 2 If the number of channels of the image features obtained after the aggregation is C 1 +C 2 。
S4, image feature F 5 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 21 Conducting a channel-by-channel polymerizationOperating to obtain image characteristics F 61 (ii) a Then image feature F 61 Obtaining image characteristics F through a third residual error recursive convolution module 62 (ii) a The method specifically comprises the following steps: the deconvolution layer adopts a transposition convolution; the jump connection is to directly transfer the shallow feature and perform channel-by-channel aggregation with the result after passing through the deconvolution layer, and the channel-by-channel aggregation operation is as described in the previous step S3.
S5, image feature F 62 Through an inverse convolution layer and then with the shallow image features F transmitted through the skip connection 11 Performing channel-by-channel aggregation operation to obtain image feature F 71 (ii) a Then image feature F 71 Obtaining image characteristics F through a fourth residual error recursive convolution module 72 (ii) a The method specifically comprises the following steps: the deconvolution layer adopts a transposition convolution; the jump connection is to directly transfer the shallow feature and perform channel-by-channel aggregation with the result after passing through the deconvolution layer, and the channel-by-channel aggregation operation is as described in the previous step S3.
S6, image feature F 72 And inputting the image into a softmax prediction layer to obtain the category to which each pixel in the original input image belongs.
S7, training the pyramid cavity convolution network, establishing a loss function, and determining network parameters through training samples, wherein the network parameters specifically comprise a learning rate, weight reduction, momentum items and a training strategy. End-to-end training is carried out on the established pyramid cavity convolution network, a random gradient descent algorithm is adopted in a training strategy, the initial learning rate is set to be 0.001, and the weight descent is set to be 10 -4 Adding 0.9 momentum term momentum; the loss function uses the coordinated _ cross-entropy loss function, which differs from the original cross-entropy loss function in that coordinated _ cross-entropy for k th The loss function of class voxels is increased by the corresponding loss weight v k The weight size and the voxel belong to k th The categories are inversely proportional, and the formula is:
wherein l c Representing a segmented feature map F s Class cross entropy loss of f s Representation feature mapping F s M is a feature map F s K is the number of classes,representing a voxel f s Whether or not it belongs to the category k,representing a voxel f s The probability of belonging to class k.
And S8, inputting the test image to be segmented into the trained pyramid cavity convolution network to obtain the semantic segmentation result of the image.
In summary, the semantic segmentation method based on the pyramid hole convolutional network disclosed in this embodiment provides and trains a pyramid hole convolutional network, establishes a loss function, and determines network parameters through training samples; and inputting the test image into the trained pyramid hole convolution network to obtain a semantic segmentation result of the image. The method for the void convolution and pyramid pooling can effectively extract multi-scale semantic information and detail information, and improves the segmentation effect of the network.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (7)
1. The semantic segmentation method based on the pyramid hole convolution network is characterized in that the pyramid hole convolution network comprises a first residual error recursive convolution module, a second residual error recursive convolution module, a pooling layer, a pyramid pooling module, a hole convolution module, an deconvolution layer, a third residual error recursive convolution module, a fourth residual error recursive convolution module and a softmax prediction layer, and the structural connection mode is as follows: the pyramid pooling module and the cavity convolution module are connected in parallel and then connected in series with the pooling layer, and then sequentially connected in series with the deconvolution layer, the third residual recursive convolution module, the deconvolution layer, the fourth residual recursive convolution module and the softmax prediction layer; the semantic segmentation method comprises the following steps:
s1, acquiring a medical image data set containing a real segmentation result, and preprocessing the data set to realize data enhancement;
s2, the preprocessed image sequentially passes through the first residual recursive convolution module, the pooling layer, the second residual recursive convolution module and the pooling layer, the semantic information of the image is extracted in a multi-scale mode, and the shallow image characteristics F are obtained respectively 11 、F 12 、F 21 、F 22 ;
S3, image feature F 22 By a network of pyramid pooling modules and hole convolution modules in parallel, wherein the image features F are 22 Obtaining image characteristics F through a pyramid pooling module 3 Image feature F 22 Obtaining image characteristics F through a cavity convolution module 4 (ii) a Image feature F 3 、F 4 Performing aggregation operation channel by channel, and performing convolution layer with convolution kernel of 1 × 1 to obtain deep image feature F 5 Thereby further extracting deep semantic information;
s4, image feature F 5 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 21 Performing channel-by-channel aggregation operation to obtain image feature F 61 (ii) a Then image feature F 61 Obtaining image characteristics F through a third residual error recursive convolution module 62 Wherein, the jump connection directly transmits the shallow feature and carries out channel-by-channel aggregation with the result after passing through the reverse convolution layer;
s5, image feature F 62 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 11 Performing channel-by-channel aggregation operation to obtain image feature F 71 (ii) a Then image feature F 71 Obtaining image characteristics F through a fourth residual error recursive convolution module 72 ;
S6, image feature F 72 Inputting the image into a softmax prediction layer to obtain the category of each pixel in the original input image;
s7, training a pyramid cavity convolution network, establishing a loss function, and determining network parameters through training samples;
and S8, inputting the test image to be segmented into the trained pyramid cavity convolution network to obtain the semantic segmentation result of the image.
2. The method for semantic segmentation based on the pyramid hole convolutional network of claim 1, wherein the preprocessing operation in step S1 includes rotation, slicing, normalization, and adaptive histogram equalization.
3. The semantic segmentation method based on the pyramid hole convolution network according to claim 1, characterized in that the first residual recursive convolution module, the second residual recursive convolution module, the third residual recursive convolution module and the fourth residual recursive convolution module have the same structure, and each residual recursive convolution module is formed by first passing an input through two recursive convolution layers connected in series and then adding the input and the input in a residual manner to obtain an output; the structure connection of the recursive convolutional layer is conv, reLU, add, conv and ReLU in sequence, wherein conv is a convolutional layer with a convolution kernel of 3 multiplied by 3, and Add is pixel-by-pixel addition with input;
4. the semantic segmentation method based on the pyramid hole convolutional network of claim 1, wherein the pyramid pooling module in step S3 comprises four adaptive average pooling layers with different pooling sizes, and is used for obtaining the image feature F obtained in step S2 from multiple scales 22 The four pooling layers adopt the pooling sizes of N, N respectivelyN2, N/3, N/6, where N represents an image feature F 22 The resolution of (2); then, the image features with different sizes obtained by different pooling layers are respectively passed through a convolution layer with convolution kernel of 1 × 1, and then the transposition convolution is carried out to obtain the image features F 22 Image features F of uniform size 31 、F 32 、F 33 、F 34 Then the up-sampling result of each scale and the input image characteristic F are combined 22 Polymerizing, and passing the polymerized image features through a convolution layer with convolution kernel of 3 × 3 to obtain image features F 3 I.e. F 3 =Conv(Concatenate(F 22 ,F 31 ,F 32 ,F 33 ,F 34 ) Concatenate is an aggregation operation and Conv is a convolution operation of 3 x 3).
5. The semantic segmentation method based on the pyramid hole convolution network according to claim 1, wherein in step S3, the hole convolution module is formed by connecting three hole convolution units with different hole factors in series, the hole factors of the three hole convolution units are 1, 2 and 4, respectively, and the sizes of the hole convolution kernels are all 3 × 3; input image feature F 22 Then, the image characteristics obtained by the three cavity convolution units are respectively F 41 、F 42 、F 43 (ii) a The cavity convolution units are connected in a dense connection mode, wherein the dense connection mode is that the input of each cavity convolution unit is added with the output of the cavity convolution unit to be used as output; after passing through the cavity convolution module, a resolution and an image characteristic F can be obtained 22 Equal image features F 4 ,F 4 =Add(F 22 ,F 41 ,F 42 ,F 43 ) Where Add is a pixel-by-pixel addition operation.
6. The method as claimed in claim 1, wherein the deconvolution layer in step S4 and step S5 is a transposed convolution.
7. The method for semantic segmentation based on the pyramid hole convolutional network of claim 1, wherein in step S6, the established pyramid hole convolutional network is trained end to end, a random gradient descent algorithm is adopted as a training strategy, and a loss function uses catagorical _ cross, and the formula is as follows:
wherein l c Representing a segmented feature map F s Class cross entropy loss of f s Representation feature mapping F s M is a feature map F s K is the number of classes,representing a voxel f s Whether it belongs to the category k, or not>Representing a voxel f s The probability of belonging to class k. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010108637.8A CN111369563B (en) | 2020-02-21 | 2020-02-21 | Semantic segmentation method based on pyramid void convolutional network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010108637.8A CN111369563B (en) | 2020-02-21 | 2020-02-21 | Semantic segmentation method based on pyramid void convolutional network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111369563A CN111369563A (en) | 2020-07-03 |
CN111369563B true CN111369563B (en) | 2023-04-07 |
Family
ID=71208108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010108637.8A Active CN111369563B (en) | 2020-02-21 | 2020-02-21 | Semantic segmentation method based on pyramid void convolutional network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111369563B (en) |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111914853B (en) * | 2020-07-17 | 2023-10-31 | 三峡大学 | Feature extraction method for stereo matching |
CN111833343A (en) * | 2020-07-23 | 2020-10-27 | 北京小白世纪网络科技有限公司 | Coronary artery stenosis degree estimation method system and equipment |
KR20220013071A (en) * | 2020-07-24 | 2022-02-04 | 에스케이하이닉스 주식회사 | Device for generating a depth map |
CN114140683A (en) * | 2020-08-12 | 2022-03-04 | 天津大学 | Aerial image target detection method, equipment and medium |
CN112200006A (en) * | 2020-09-15 | 2021-01-08 | 青岛邃智信息科技有限公司 | Human body attribute detection and identification method under community monitoring scene |
CN112132813B (en) * | 2020-09-24 | 2022-08-05 | 中国医学科学院生物医学工程研究所 | Skin ultrasonic image segmentation method based on improved UNet network model |
CN114494266B (en) * | 2020-10-26 | 2024-05-28 | 中国人民解放军空军军医大学 | Cervical and peripheral multi-organ segmentation method adopting hierarchical cavity pyramid convolution |
CN112348839B (en) * | 2020-10-27 | 2024-03-15 | 重庆大学 | Image segmentation method and system based on deep learning |
CN112418228B (en) * | 2020-11-02 | 2023-07-21 | 暨南大学 | Image semantic segmentation method based on multi-feature fusion |
CN112381097A (en) * | 2020-11-16 | 2021-02-19 | 西南石油大学 | Scene semantic segmentation method based on deep learning |
CN112419267A (en) * | 2020-11-23 | 2021-02-26 | 齐鲁工业大学 | Brain glioma segmentation model and method based on deep learning |
CN112330662B (en) * | 2020-11-25 | 2022-04-12 | 电子科技大学 | Medical image segmentation system and method based on multi-level neural network |
CN112465834B (en) * | 2020-11-26 | 2024-05-24 | 中科麦迪人工智能研究院(苏州)有限公司 | Blood vessel segmentation method and device |
CN112614107A (en) * | 2020-12-23 | 2021-04-06 | 北京澎思科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112733919B (en) * | 2020-12-31 | 2022-05-20 | 山东师范大学 | Image semantic segmentation method and system based on void convolution and multi-scale and multi-branch |
CN112884824B (en) * | 2021-03-12 | 2024-07-12 | 辽宁师范大学 | Shoe seal height estimation method based on convolution network multi-scale feature fusion |
CN113011305B (en) * | 2021-03-12 | 2022-09-09 | 中国人民解放军国防科技大学 | SAR image road extraction method and device based on semantic segmentation and conditional random field |
CN112785480B (en) * | 2021-03-15 | 2022-05-03 | 河北工业大学 | Image splicing tampering detection method based on frequency domain transformation and residual error feedback module |
CN113033570B (en) * | 2021-03-29 | 2022-11-11 | 同济大学 | Image semantic segmentation method for improving void convolution and multilevel characteristic information fusion |
CN113034505B (en) * | 2021-04-30 | 2024-02-02 | 杭州师范大学 | Glandular cell image segmentation method and glandular cell image segmentation device based on edge perception network |
CN113254891B (en) * | 2021-05-17 | 2022-08-16 | 山东大学 | Information hiding method, device and system based on void space pyramid |
CN113191367B (en) * | 2021-05-25 | 2022-07-29 | 华东师范大学 | Semantic segmentation method based on dense scale dynamic network |
CN113378704B (en) * | 2021-06-09 | 2022-11-11 | 武汉理工大学 | Multi-target detection method, equipment and storage medium |
CN113592009A (en) * | 2021-08-05 | 2021-11-02 | 杭州逗酷软件科技有限公司 | Image semantic segmentation method and device, storage medium and electronic equipment |
CN113869181B (en) * | 2021-09-24 | 2023-05-02 | 电子科技大学 | Unmanned aerial vehicle target detection method for selecting pooling core structure |
CN114066903B (en) * | 2021-11-23 | 2024-10-29 | 北京信息科技大学 | Image segmentation method, system and storage medium |
CN113936220B (en) * | 2021-12-14 | 2022-03-04 | 深圳致星科技有限公司 | Image processing method, storage medium, electronic device, and image processing apparatus |
CN114612807B (en) * | 2022-03-17 | 2023-04-07 | 盐城工学院 | Method and device for identifying characteristics of tiny target, electronic equipment and storage medium |
CN116152807B (en) * | 2023-04-14 | 2023-09-05 | 广东工业大学 | Industrial defect semantic segmentation method based on U-Net network and storage medium |
CN116453199B (en) * | 2023-05-19 | 2024-01-26 | 山东省人工智能研究院 | GAN (generic object model) generation face detection method based on fake trace of complex texture region |
CN117935060B (en) * | 2024-03-21 | 2024-05-28 | 成都信息工程大学 | Flood area detection method based on deep learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145920A (en) * | 2018-08-21 | 2019-01-04 | 电子科技大学 | A kind of image, semantic dividing method based on deep neural network |
CN110232394A (en) * | 2018-03-06 | 2019-09-13 | 华南理工大学 | A kind of multi-scale image semantic segmentation method |
-
2020
- 2020-02-21 CN CN202010108637.8A patent/CN111369563B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232394A (en) * | 2018-03-06 | 2019-09-13 | 华南理工大学 | A kind of multi-scale image semantic segmentation method |
CN109145920A (en) * | 2018-08-21 | 2019-01-04 | 电子科技大学 | A kind of image, semantic dividing method based on deep neural network |
Non-Patent Citations (1)
Title |
---|
Edge-based image interpolation approach for video sensor network;Jinglun Shi et al.;《2011 8th International Conference on Information, Communications & Signal Processing》;20120403;第1-2页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111369563A (en) | 2020-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111369563B (en) | Semantic segmentation method based on pyramid void convolutional network | |
CN115049936B (en) | High-resolution remote sensing image-oriented boundary enhanced semantic segmentation method | |
CN113191215B (en) | Rolling bearing fault diagnosis method integrating attention mechanism and twin network structure | |
CN110232394B (en) | Multi-scale image semantic segmentation method | |
CN109035149B (en) | License plate image motion blur removing method based on deep learning | |
CN108898175B (en) | Computer-aided model construction method based on deep learning gastric cancer pathological section | |
CN112488234B (en) | End-to-end histopathology image classification method based on attention pooling | |
CN111986125B (en) | Method for multi-target task instance segmentation | |
CN112381097A (en) | Scene semantic segmentation method based on deep learning | |
CN111523521A (en) | Remote sensing image classification method for double-branch fusion multi-scale attention neural network | |
CN109948692B (en) | Computer-generated picture detection method based on multi-color space convolutional neural network and random forest | |
CN110321805B (en) | Dynamic expression recognition method based on time sequence relation reasoning | |
CN110930378B (en) | Emphysema image processing method and system based on low data demand | |
CN114612714B (en) | Curriculum learning-based reference-free image quality evaluation method | |
CN108305253A (en) | A kind of pathology full slice diagnostic method based on more multiplying power deep learnings | |
CN111553297A (en) | Method and system for diagnosing production fault of polyester filament based on 2D-CNN and DBN | |
CN111523483B (en) | Chinese meal dish image recognition method and device | |
CN109034370A (en) | Convolutional neural network simplification method based on feature mapping pruning | |
CN111402138A (en) | Image super-resolution reconstruction method of supervised convolutional neural network based on multi-scale feature extraction fusion | |
CN111415323A (en) | Image detection method and device and neural network training method and device | |
CN112766283A (en) | Two-phase flow pattern identification method based on multi-scale convolution network | |
CN115661459A (en) | 2D mean teacher model using difference information | |
CN115908142A (en) | Contact net tiny part damage testing method based on visual recognition | |
CN112132207A (en) | Target detection neural network construction method based on multi-branch feature mapping | |
CN118015611A (en) | Vegetable plant target detection method and device based on YOLOv8 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |