[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111369563B - Semantic segmentation method based on pyramid void convolutional network - Google Patents

Semantic segmentation method based on pyramid void convolutional network Download PDF

Info

Publication number
CN111369563B
CN111369563B CN202010108637.8A CN202010108637A CN111369563B CN 111369563 B CN111369563 B CN 111369563B CN 202010108637 A CN202010108637 A CN 202010108637A CN 111369563 B CN111369563 B CN 111369563B
Authority
CN
China
Prior art keywords
convolution
image
pyramid
module
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010108637.8A
Other languages
Chinese (zh)
Other versions
CN111369563A (en
Inventor
史景伦
张宇
傅钎栓
李显惠
林阳城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Menghui Robot Co ltd
South China University of Technology SCUT
Original Assignee
Guangzhou Menghui Robot Co ltd
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Menghui Robot Co ltd, South China University of Technology SCUT filed Critical Guangzhou Menghui Robot Co ltd
Priority to CN202010108637.8A priority Critical patent/CN111369563B/en
Publication of CN111369563A publication Critical patent/CN111369563A/en
Application granted granted Critical
Publication of CN111369563B publication Critical patent/CN111369563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic segmentation method based on a pyramid hole convolution network, which comprises the following steps of: acquiring a medical image data set containing a real segmentation result, and performing preprocessing operations such as data enhancement on the data set; obtaining shallow image characteristics of the preprocessed image through a residual recursive convolution module and a pooling layer; obtaining deep image characteristics through a network formed by connecting a pyramid pooling module and a hole convolution module in parallel; decoding deep image features through an deconvolution layer, a jump connection and a residual recursive convolution module; inputting the decoding result into a softmax layer to obtain the category of each pixel; training a pyramid cavity convolution network, establishing a loss function, and determining network parameters through training samples; and inputting the test image into the trained pyramid hole convolution network to obtain a semantic segmentation result of the image. The method for pooling the void volume and the pyramid can effectively extract multi-scale semantic information and detail information and improve the network segmentation effect.

Description

Semantic segmentation method based on pyramid void convolutional network
Technical Field
The invention relates to the technical field of computer vision, in particular to a semantic segmentation method based on a pyramid cavity convolution network.
Background
In recent years, with the rapid development of deep learning technology, the application of the deep learning technology in the field of medical image analysis is also becoming wider. Among them, the semantic segmentation technique plays a great role in various application scenarios such as treatment planning, disease diagnosis, pathological research, etc. For medical images, accurate identification of the type of each object in the image requires knowledge background in the professional domain and time consuming for the professional authority. Through the research on the semantic segmentation technology, the input medical image can be automatically and accurately segmented, so that a doctor can conveniently make more accurate judgment and a better treatment plan is designed.
The traditional semantic segmentation algorithm comprises a segmentation method based on watershed, a segmentation method based on clustering and a segmentation method based on statistical characteristics, but with the development of deep learning technology, the semantic segmentation method based on a CNNs model becomes mainstream, and especially with the proposal of an FCN (fuzzy C-channel network), a great deal of door is opened for the development of the semantic segmentation technology, and more researchers propose a plurality of improved semantic segmentation models based on the FCN model. Particularly, the U-Net model has the advantage that the model effect is still good under the condition that the training set is small, so that the U-Net model is widely used in the field of medical image semantic segmentation.
In the encoder structure of the U-Net model, downsampling is carried out in a maximum pooling mode, and pooling can increase the receptive field, so that deeper semantic information can be obtained. However, after pooling, the resolution of the feature map of the image may be reduced accordingly, resulting in loss of detail information. Although multi-scale detail information is acquired by means of hopping connection in the U-Net network, the loss of boundary position information and the reduction of the discrimination capability of the model space are still caused.
In the process proposed by the present invention, at least hole convolution has been found to be widely used because it has the advantage of being able to increase the field of view without causing a reduction in the resolution of the feature image. Meanwhile, in order to further improve the effect of the U-Net model, attention mechanism, pyramid pooling module, recursive convolution, residual connection, dense connection and other technologies are also used to combine with the U-Net model.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a semantic segmentation method based on a pyramid void convolution network, which extracts features of different scales by using a plurality of residual recursive convolution modules, a void convolution module and a pyramid pooling module, and then restores the size of a feature image by using multi-layer up-sampling and jump connection.
The technical purpose of the invention is realized by the following technical scheme:
a semantic segmentation method based on a pyramid hole convolution network comprises a first residual error recursive convolution module, a second residual error recursive convolution module, a pooling layer, a pyramid pooling module, a hole convolution module, an deconvolution layer, a third residual error recursive convolution module, a fourth residual error recursive convolution module and a softmax prediction layer, and the method is structurally connected in the following mode: the first residual recursive convolution module is sequentially connected with the pooling layer, the second residual recursive convolution module and the pooling layer in series, the pyramid pooling module and the cavity convolution module are connected in parallel and then connected with the pooling layer in series, and then the deconvolution layer, the third residual recursive convolution module, the deconvolution layer, the fourth residual recursive convolution module and the softmax prediction layer are sequentially connected in series; the semantic segmentation method comprises the following steps:
s1, acquiring a medical image data set containing a real segmentation result, and performing preprocessing operation on the data set to realize data enhancement;
s2, the preprocessed image sequentially passes through the first residual recursive convolution module, the pooling layer, the second residual recursive convolution module and the pooling layer, the semantic information of the image is extracted in a multi-scale mode, and the shallow image characteristics F are obtained respectively 11 、F 12 、F 21 、F 22
S3, image feature F 22 By a network of pyramid pooling modules and hole convolution modules in parallel, wherein the image features F are 22 Obtaining image characteristics F through a pyramid pooling module 3 Image feature F 22 Obtaining image characteristics F through a hole convolution module 4 (ii) a Image feature F 3 、F 4 Performing aggregation operation channel by channel, and performing convolution layer with convolution kernel of 1 × 1 to obtain deep image feature F 5 Therefore, deep semantic information can be further extracted;
s4, image characteristicsF 5 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 21 Performing channel-by-channel aggregation operation to obtain image feature F 61 (ii) a Then image feature F 61 Obtaining image characteristics F through a third residual error recursive convolution module 62 Wherein, the jump connection directly transmits the shallow feature and carries out channel-by-channel aggregation with the result after passing through the reverse convolution layer; by using the skip connection, more detail information of the original image can be kept in the output image characteristics, so that the boundary of the predicted segmentation image is smoother.
S5, image feature F 62 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 11 Performing channel-by-channel aggregation operation to obtain image feature F 71 (ii) a Then image feature F 71 Obtaining image characteristics F through a fourth residual error recursive convolution module 72
S6, image feature F 72 Inputting the image into a softmax prediction layer to obtain the category of each pixel in the original input image;
s7, training a pyramid cavity convolution network, establishing a loss function, and determining network parameters through training samples;
and S8, inputting the test image to be segmented into the trained pyramid cavity convolution network to obtain the semantic segmentation result of the image.
Further, the preprocessing operation in step S1 includes rotation, slicing, normalization, and adaptive histogram equalization.
Furthermore, the first residual error recursive convolution module, the second residual error recursive convolution module, the third residual error recursive convolution module and the fourth residual error recursive convolution module have the same structure, and each residual error recursive convolution module firstly passes through two recursive convolution layers which are connected in series and then is added with the input in a residual error mode to obtain the output; the recursive convolutional layer has the structure connection of conv, reLU, add, conv and ReLU in sequence, wherein conv is a convolutional layer with a convolution kernel of 3 x 3, and Add is pixel-by-pixel addition with input. The use of residual concatenation may help train deeper networks than the use of ordinary convolutional layers, while the use of recursive convolutional networks may better extract semantic information contained in the image.
Further, the pyramid pooling module in step S3 includes four adaptive average pooling layers with different pooling sizes, and is configured to obtain the image feature F obtained in step S2 from multiple scales 22 The four pooling layers adopt the pooling sizes of N, N/2, N/3 and N/6 respectively, wherein N represents the image feature F 22 The resolution of (2); then, the image features with different sizes obtained by different pooling layers are respectively passed through a convolution layer with convolution kernel of 1 × 1, and then the transposition convolution is carried out to obtain the image features F 22 Image features F of uniform size 31 、F 32 、F 33 、F 34 Then the up-sampling result of each scale and the input image characteristic F are combined 22 Aggregating, and passing the aggregated image features through a convolution layer with convolution kernel of 3 × 3 to obtain image features F 3 I.e. F 3 =Conv(Concatenate(F 22 ,F 31 ,F 32 ,F 33 ,F 34 ) Concatenate is an aggregation operation and Conv is a convolution operation of 3 x 3). By performing pooling operation of multiple scales, detailed information and deeper semantic information contained in the image can be better acquired.
Further, the cavity convolution module in step S3 is formed by connecting three cavity convolution units with different cavity factors in series, the cavity factors of the three cavity convolution units are 1, 2 and 4, respectively, and the sizes of the cavity convolution kernels are all 3 × 3; input image feature F 22 Then, the image characteristics obtained by the three cavity convolution units are respectively F 41 、F 42 、F 43 (ii) a The cavity convolution units are connected in a dense connection mode, wherein the dense connection mode is that the input of each cavity convolution unit is added with the output of the cavity convolution unit to be used as output; after passing through the cavity convolution module, a resolution and an image characteristic F can be obtained 22 Equal image features F 4 ,F 4 =Add(F 22 ,F 41 ,F 42 ,F 43 ) Where Add is a pixel-by-pixel addition operation. By using the hole convolution instead of the common convolution and pooling, deeper semantic information can be acquired by increasing the receptive field, and the problem that the detail information is lost due to the reduction of the resolution ratio caused by the pooling operation can be solved.
Further, the deconvolution layer in step S4 and step S5 is a transposed convolution.
Further, in step S6, end-to-end training is performed on the established pyramid hole convolution network, a random gradient descent algorithm is adopted for the training strategy, and a loss function uses categorical _ cross, and the formula is as follows:
Figure BDA0002389196330000051
wherein l c Representing a segmented feature map F s Class cross entropy loss of f s Representation feature mapping F s M is a feature map F s K is the number of classes,
Figure BDA0002389196330000052
representing a voxel f s Whether or not it belongs to the category k,
Figure BDA0002389196330000053
representing a voxel f s The probability of belonging to class k.
Compared with the prior art, the invention has the following advantages and effects:
(1) The method adopts the hole convolution module to extract the deep semantic information, and compared with the traditional convolution and pooling mode, the hole convolution module can increase the receptive field and simultaneously can not cause the reduction of the resolution. Meanwhile, the cavity convolution module comprises three cavity convolution layers with different cavity factors, and the three cavity convolution layers are connected in a dense connection mode, so that semantic information can be acquired in multiple scales.
(2) The invention combines and uses the pyramid space pooling module to extract the information of a plurality of scales contained in the image, thereby effectively acquiring the semantic information of deep level and the detail information of shallow level contained in the image.
(3) The invention uses residual recursive convolution to replace the common convolution, thereby helping to train deeper network structure and obtain better feature representation capability of segmentation task.
(4) The residual error recursive convolution, the cavity convolution, the pyramid pooling and other modules contained in the method are an algorithm capable of performing end-to-end training, and compared with a two-stage algorithm, the method is smaller in parameter number and more convenient to train.
Drawings
FIG. 1 is a flow chart of a semantic segmentation method based on a pyramid hole convolution network disclosed by the invention;
FIG. 2 (a) is a schematic diagram of a residual recursive convolution module in an embodiment of the present invention, and FIG. 2 (b) is a schematic diagram of a recursive convolution unit used in FIG. 2 (a);
FIG. 3 is a schematic diagram of a spatial pyramid pooling module in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a hole convolution module according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Embodiment as shown in fig. 1, the present embodiment provides a semantic segmentation method based on a pyramid hole convolutional network, which specifically includes the following steps:
s1, acquiring a medical image data set containing a real segmentation result, and performing data enhancement and other preprocessing operations on the data set; since most medical image data sets have the characteristics of small capacity, low contrast and the like, images in the data sets are firstly rotated, sliced, standardized and subjected to adaptive histogram equalization.
S2, the preprocessed image sequentially passes through the first residual error recursive convolution module, the pooling layer, the second residual error recursive convolution module and the pooling layer, semantic information of the image is extracted in a multi-scale mode, and shallow image features F are obtained respectively 11 、F 12 、F 21 、F 22 . The method specifically comprises the following steps: as shown in fig. 2 (a), the residual recursive convolution module is to pass the input through two cascaded recursive convolution layers, and then add the input and the input in a residual manner to obtain an output; as shown in fig. 2 (b), the recursive convolution units are connected in sequence by conv, reLU, add, conv, and ReLU, where conv is a convolution layer with a convolution kernel of 3 × 3, and Add is a pixel-by-pixel addition to the input; the pooling layer adopts a maximum pooling layer with a step length of 2.
S3, image feature F 22 By a network of pyramid pooling modules and hole convolution modules in parallel, wherein the image features F are 22 Obtaining image characteristics F through a pyramid pooling module 3 Image feature F 22 Obtaining image characteristics F through a cavity convolution module 4 (ii) a Then image feature F 3 、F 4 Performing aggregation operation channel by channel, and performing convolution layer with convolution kernel of 1 × 1 to obtain deep image feature F 5 Therefore, deep semantic information can be further extracted; the method specifically comprises the following steps:
as shown in fig. 3, the pyramid pooling module includes four adaptive average pooling layers (i.e. avgpool in fig. 3) with different pooling sizes, and is used for obtaining the image feature F obtained in step S2 from multiple scales 22 The four pooling layers adopt the pooling sizes of N, N/2, N/3 and N/6 respectively, wherein N represents the image feature F 22 The resolution of (2); then to different pooling layersThe obtained image features with different sizes respectively pass through a convolution layer with convolution kernel of 1 × 1 (i.e. conv 1 × 1 in fig. 3), and then are subjected to transposition convolution (i.e. up-conv in fig. 3), so as to obtain the image features F 22 Image features F of uniform size 31 、F 32 、F 33 、F 34 Then the up-sampling result of each scale and the input image characteristic F are combined 22 Aggregating, and passing the aggregated image features through a convolution layer with convolution kernel of 3 × 3 to obtain image features F 3 I.e. F 3 =Conv(Concatenate(F 22 ,F 31 ,F 32 ,F 33 ,F 34 ) Concatenate is an aggregation operation and Conv is a convolution operation of 3 x 3).
As shown in fig. 4, the hole convolution module is formed by connecting three hole convolution units with different hole factors in series, the hole factors of the three hole convolution units are 1, 2 and 4 respectively, and the sizes of the hole convolution kernels are all 3 × 3; input image feature F 22 Then, the image characteristics obtained by the three cavity convolution units are respectively F 41 、F 42 、F 43 (ii) a The hole convolution units are connected in a dense connection mode, and the dense connection mode is that the input of each hole convolution unit is added with the output of the hole convolution unit to be used as output; after passing through the cavity convolution module, a resolution and an image characteristic F can be obtained 22 Equal image features F 4 ,F 4 =Add(F 22 ,F 41 ,F 42 ,F 43 ) Where Add is a pixel-by-pixel addition operation.
In this embodiment, the channel-by-channel aggregation operation means that the aggregation operation is performed in the channel dimension, that is, the image feature F is assumed 3 The number of channels of is C 1 Image feature F 4 The number of channels is C 2 If the number of channels of the image features obtained after the aggregation is C 1 +C 2
S4, image feature F 5 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 21 Conducting a channel-by-channel polymerizationOperating to obtain image characteristics F 61 (ii) a Then image feature F 61 Obtaining image characteristics F through a third residual error recursive convolution module 62 (ii) a The method specifically comprises the following steps: the deconvolution layer adopts a transposition convolution; the jump connection is to directly transfer the shallow feature and perform channel-by-channel aggregation with the result after passing through the deconvolution layer, and the channel-by-channel aggregation operation is as described in the previous step S3.
S5, image feature F 62 Through an inverse convolution layer and then with the shallow image features F transmitted through the skip connection 11 Performing channel-by-channel aggregation operation to obtain image feature F 71 (ii) a Then image feature F 71 Obtaining image characteristics F through a fourth residual error recursive convolution module 72 (ii) a The method specifically comprises the following steps: the deconvolution layer adopts a transposition convolution; the jump connection is to directly transfer the shallow feature and perform channel-by-channel aggregation with the result after passing through the deconvolution layer, and the channel-by-channel aggregation operation is as described in the previous step S3.
S6, image feature F 72 And inputting the image into a softmax prediction layer to obtain the category to which each pixel in the original input image belongs.
S7, training the pyramid cavity convolution network, establishing a loss function, and determining network parameters through training samples, wherein the network parameters specifically comprise a learning rate, weight reduction, momentum items and a training strategy. End-to-end training is carried out on the established pyramid cavity convolution network, a random gradient descent algorithm is adopted in a training strategy, the initial learning rate is set to be 0.001, and the weight descent is set to be 10 -4 Adding 0.9 momentum term momentum; the loss function uses the coordinated _ cross-entropy loss function, which differs from the original cross-entropy loss function in that coordinated _ cross-entropy for k th The loss function of class voxels is increased by the corresponding loss weight v k The weight size and the voxel belong to k th The categories are inversely proportional, and the formula is:
Figure BDA0002389196330000091
wherein l c Representing a segmented feature map F s Class cross entropy loss of f s Representation feature mapping F s M is a feature map F s K is the number of classes,
Figure BDA0002389196330000092
representing a voxel f s Whether or not it belongs to the category k,
Figure BDA0002389196330000093
representing a voxel f s The probability of belonging to class k.
And S8, inputting the test image to be segmented into the trained pyramid cavity convolution network to obtain the semantic segmentation result of the image.
In summary, the semantic segmentation method based on the pyramid hole convolutional network disclosed in this embodiment provides and trains a pyramid hole convolutional network, establishes a loss function, and determines network parameters through training samples; and inputting the test image into the trained pyramid hole convolution network to obtain a semantic segmentation result of the image. The method for the void convolution and pyramid pooling can effectively extract multi-scale semantic information and detail information, and improves the segmentation effect of the network.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (7)

1. The semantic segmentation method based on the pyramid hole convolution network is characterized in that the pyramid hole convolution network comprises a first residual error recursive convolution module, a second residual error recursive convolution module, a pooling layer, a pyramid pooling module, a hole convolution module, an deconvolution layer, a third residual error recursive convolution module, a fourth residual error recursive convolution module and a softmax prediction layer, and the structural connection mode is as follows: the pyramid pooling module and the cavity convolution module are connected in parallel and then connected in series with the pooling layer, and then sequentially connected in series with the deconvolution layer, the third residual recursive convolution module, the deconvolution layer, the fourth residual recursive convolution module and the softmax prediction layer; the semantic segmentation method comprises the following steps:
s1, acquiring a medical image data set containing a real segmentation result, and preprocessing the data set to realize data enhancement;
s2, the preprocessed image sequentially passes through the first residual recursive convolution module, the pooling layer, the second residual recursive convolution module and the pooling layer, the semantic information of the image is extracted in a multi-scale mode, and the shallow image characteristics F are obtained respectively 11 、F 12 、F 21 、F 22
S3, image feature F 22 By a network of pyramid pooling modules and hole convolution modules in parallel, wherein the image features F are 22 Obtaining image characteristics F through a pyramid pooling module 3 Image feature F 22 Obtaining image characteristics F through a cavity convolution module 4 (ii) a Image feature F 3 、F 4 Performing aggregation operation channel by channel, and performing convolution layer with convolution kernel of 1 × 1 to obtain deep image feature F 5 Thereby further extracting deep semantic information;
s4, image feature F 5 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 21 Performing channel-by-channel aggregation operation to obtain image feature F 61 (ii) a Then image feature F 61 Obtaining image characteristics F through a third residual error recursive convolution module 62 Wherein, the jump connection directly transmits the shallow feature and carries out channel-by-channel aggregation with the result after passing through the reverse convolution layer;
s5, image feature F 62 Through an inverse convolution layer and then coupled with the shallow image features F delivered through the skip-join 11 Performing channel-by-channel aggregation operation to obtain image feature F 71 (ii) a Then image feature F 71 Obtaining image characteristics F through a fourth residual error recursive convolution module 72
S6, image feature F 72 Inputting the image into a softmax prediction layer to obtain the category of each pixel in the original input image;
s7, training a pyramid cavity convolution network, establishing a loss function, and determining network parameters through training samples;
and S8, inputting the test image to be segmented into the trained pyramid cavity convolution network to obtain the semantic segmentation result of the image.
2. The method for semantic segmentation based on the pyramid hole convolutional network of claim 1, wherein the preprocessing operation in step S1 includes rotation, slicing, normalization, and adaptive histogram equalization.
3. The semantic segmentation method based on the pyramid hole convolution network according to claim 1, characterized in that the first residual recursive convolution module, the second residual recursive convolution module, the third residual recursive convolution module and the fourth residual recursive convolution module have the same structure, and each residual recursive convolution module is formed by first passing an input through two recursive convolution layers connected in series and then adding the input and the input in a residual manner to obtain an output; the structure connection of the recursive convolutional layer is conv, reLU, add, conv and ReLU in sequence, wherein conv is a convolutional layer with a convolution kernel of 3 multiplied by 3, and Add is pixel-by-pixel addition with input;
4. the semantic segmentation method based on the pyramid hole convolutional network of claim 1, wherein the pyramid pooling module in step S3 comprises four adaptive average pooling layers with different pooling sizes, and is used for obtaining the image feature F obtained in step S2 from multiple scales 22 The four pooling layers adopt the pooling sizes of N, N respectivelyN2, N/3, N/6, where N represents an image feature F 22 The resolution of (2); then, the image features with different sizes obtained by different pooling layers are respectively passed through a convolution layer with convolution kernel of 1 × 1, and then the transposition convolution is carried out to obtain the image features F 22 Image features F of uniform size 31 、F 32 、F 33 、F 34 Then the up-sampling result of each scale and the input image characteristic F are combined 22 Polymerizing, and passing the polymerized image features through a convolution layer with convolution kernel of 3 × 3 to obtain image features F 3 I.e. F 3 =Conv(Concatenate(F 22 ,F 31 ,F 32 ,F 33 ,F 34 ) Concatenate is an aggregation operation and Conv is a convolution operation of 3 x 3).
5. The semantic segmentation method based on the pyramid hole convolution network according to claim 1, wherein in step S3, the hole convolution module is formed by connecting three hole convolution units with different hole factors in series, the hole factors of the three hole convolution units are 1, 2 and 4, respectively, and the sizes of the hole convolution kernels are all 3 × 3; input image feature F 22 Then, the image characteristics obtained by the three cavity convolution units are respectively F 41 、F 42 、F 43 (ii) a The cavity convolution units are connected in a dense connection mode, wherein the dense connection mode is that the input of each cavity convolution unit is added with the output of the cavity convolution unit to be used as output; after passing through the cavity convolution module, a resolution and an image characteristic F can be obtained 22 Equal image features F 4 ,F 4 =Add(F 22 ,F 41 ,F 42 ,F 43 ) Where Add is a pixel-by-pixel addition operation.
6. The method as claimed in claim 1, wherein the deconvolution layer in step S4 and step S5 is a transposed convolution.
7. The method for semantic segmentation based on the pyramid hole convolutional network of claim 1, wherein in step S6, the established pyramid hole convolutional network is trained end to end, a random gradient descent algorithm is adopted as a training strategy, and a loss function uses catagorical _ cross, and the formula is as follows:
Figure FDA0004016617890000041
wherein l c Representing a segmented feature map F s Class cross entropy loss of f s Representation feature mapping F s M is a feature map F s K is the number of classes,
Figure FDA0004016617890000042
representing a voxel f s Whether it belongs to the category k, or not>
Figure FDA0004016617890000043
Representing a voxel f s The probability of belonging to class k. />
CN202010108637.8A 2020-02-21 2020-02-21 Semantic segmentation method based on pyramid void convolutional network Active CN111369563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010108637.8A CN111369563B (en) 2020-02-21 2020-02-21 Semantic segmentation method based on pyramid void convolutional network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010108637.8A CN111369563B (en) 2020-02-21 2020-02-21 Semantic segmentation method based on pyramid void convolutional network

Publications (2)

Publication Number Publication Date
CN111369563A CN111369563A (en) 2020-07-03
CN111369563B true CN111369563B (en) 2023-04-07

Family

ID=71208108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010108637.8A Active CN111369563B (en) 2020-02-21 2020-02-21 Semantic segmentation method based on pyramid void convolutional network

Country Status (1)

Country Link
CN (1) CN111369563B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914853B (en) * 2020-07-17 2023-10-31 三峡大学 Feature extraction method for stereo matching
CN111833343A (en) * 2020-07-23 2020-10-27 北京小白世纪网络科技有限公司 Coronary artery stenosis degree estimation method system and equipment
KR20220013071A (en) * 2020-07-24 2022-02-04 에스케이하이닉스 주식회사 Device for generating a depth map
CN114140683A (en) * 2020-08-12 2022-03-04 天津大学 Aerial image target detection method, equipment and medium
CN112200006A (en) * 2020-09-15 2021-01-08 青岛邃智信息科技有限公司 Human body attribute detection and identification method under community monitoring scene
CN112132813B (en) * 2020-09-24 2022-08-05 中国医学科学院生物医学工程研究所 Skin ultrasonic image segmentation method based on improved UNet network model
CN114494266B (en) * 2020-10-26 2024-05-28 中国人民解放军空军军医大学 Cervical and peripheral multi-organ segmentation method adopting hierarchical cavity pyramid convolution
CN112348839B (en) * 2020-10-27 2024-03-15 重庆大学 Image segmentation method and system based on deep learning
CN112418228B (en) * 2020-11-02 2023-07-21 暨南大学 Image semantic segmentation method based on multi-feature fusion
CN112381097A (en) * 2020-11-16 2021-02-19 西南石油大学 Scene semantic segmentation method based on deep learning
CN112419267A (en) * 2020-11-23 2021-02-26 齐鲁工业大学 Brain glioma segmentation model and method based on deep learning
CN112330662B (en) * 2020-11-25 2022-04-12 电子科技大学 Medical image segmentation system and method based on multi-level neural network
CN112465834B (en) * 2020-11-26 2024-05-24 中科麦迪人工智能研究院(苏州)有限公司 Blood vessel segmentation method and device
CN112614107A (en) * 2020-12-23 2021-04-06 北京澎思科技有限公司 Image processing method and device, electronic equipment and storage medium
CN112733919B (en) * 2020-12-31 2022-05-20 山东师范大学 Image semantic segmentation method and system based on void convolution and multi-scale and multi-branch
CN112884824B (en) * 2021-03-12 2024-07-12 辽宁师范大学 Shoe seal height estimation method based on convolution network multi-scale feature fusion
CN113011305B (en) * 2021-03-12 2022-09-09 中国人民解放军国防科技大学 SAR image road extraction method and device based on semantic segmentation and conditional random field
CN112785480B (en) * 2021-03-15 2022-05-03 河北工业大学 Image splicing tampering detection method based on frequency domain transformation and residual error feedback module
CN113033570B (en) * 2021-03-29 2022-11-11 同济大学 Image semantic segmentation method for improving void convolution and multilevel characteristic information fusion
CN113034505B (en) * 2021-04-30 2024-02-02 杭州师范大学 Glandular cell image segmentation method and glandular cell image segmentation device based on edge perception network
CN113254891B (en) * 2021-05-17 2022-08-16 山东大学 Information hiding method, device and system based on void space pyramid
CN113191367B (en) * 2021-05-25 2022-07-29 华东师范大学 Semantic segmentation method based on dense scale dynamic network
CN113378704B (en) * 2021-06-09 2022-11-11 武汉理工大学 Multi-target detection method, equipment and storage medium
CN113592009A (en) * 2021-08-05 2021-11-02 杭州逗酷软件科技有限公司 Image semantic segmentation method and device, storage medium and electronic equipment
CN113869181B (en) * 2021-09-24 2023-05-02 电子科技大学 Unmanned aerial vehicle target detection method for selecting pooling core structure
CN114066903B (en) * 2021-11-23 2024-10-29 北京信息科技大学 Image segmentation method, system and storage medium
CN113936220B (en) * 2021-12-14 2022-03-04 深圳致星科技有限公司 Image processing method, storage medium, electronic device, and image processing apparatus
CN114612807B (en) * 2022-03-17 2023-04-07 盐城工学院 Method and device for identifying characteristics of tiny target, electronic equipment and storage medium
CN116152807B (en) * 2023-04-14 2023-09-05 广东工业大学 Industrial defect semantic segmentation method based on U-Net network and storage medium
CN116453199B (en) * 2023-05-19 2024-01-26 山东省人工智能研究院 GAN (generic object model) generation face detection method based on fake trace of complex texture region
CN117935060B (en) * 2024-03-21 2024-05-28 成都信息工程大学 Flood area detection method based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145920A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of image, semantic dividing method based on deep neural network
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
CN109145920A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of image, semantic dividing method based on deep neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Edge-based image interpolation approach for video sensor network;Jinglun Shi et al.;《2011 8th International Conference on Information, Communications & Signal Processing》;20120403;第1-2页 *

Also Published As

Publication number Publication date
CN111369563A (en) 2020-07-03

Similar Documents

Publication Publication Date Title
CN111369563B (en) Semantic segmentation method based on pyramid void convolutional network
CN115049936B (en) High-resolution remote sensing image-oriented boundary enhanced semantic segmentation method
CN113191215B (en) Rolling bearing fault diagnosis method integrating attention mechanism and twin network structure
CN110232394B (en) Multi-scale image semantic segmentation method
CN109035149B (en) License plate image motion blur removing method based on deep learning
CN108898175B (en) Computer-aided model construction method based on deep learning gastric cancer pathological section
CN112488234B (en) End-to-end histopathology image classification method based on attention pooling
CN111986125B (en) Method for multi-target task instance segmentation
CN112381097A (en) Scene semantic segmentation method based on deep learning
CN111523521A (en) Remote sensing image classification method for double-branch fusion multi-scale attention neural network
CN109948692B (en) Computer-generated picture detection method based on multi-color space convolutional neural network and random forest
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN110930378B (en) Emphysema image processing method and system based on low data demand
CN114612714B (en) Curriculum learning-based reference-free image quality evaluation method
CN108305253A (en) A kind of pathology full slice diagnostic method based on more multiplying power deep learnings
CN111553297A (en) Method and system for diagnosing production fault of polyester filament based on 2D-CNN and DBN
CN111523483B (en) Chinese meal dish image recognition method and device
CN109034370A (en) Convolutional neural network simplification method based on feature mapping pruning
CN111402138A (en) Image super-resolution reconstruction method of supervised convolutional neural network based on multi-scale feature extraction fusion
CN111415323A (en) Image detection method and device and neural network training method and device
CN112766283A (en) Two-phase flow pattern identification method based on multi-scale convolution network
CN115661459A (en) 2D mean teacher model using difference information
CN115908142A (en) Contact net tiny part damage testing method based on visual recognition
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN118015611A (en) Vegetable plant target detection method and device based on YOLOv8

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant