CN117934510A

CN117934510A - Colon polyp image segmentation method based on shape perception and feature enhancement

Info

Publication number: CN117934510A
Application number: CN202410109987.4A
Authority: CN
Inventors: 余靖; 齐露露
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2024-01-26
Filing date: 2024-01-26
Publication date: 2024-04-26
Anticipated expiration: 2044-01-26
Also published as: CN117934510B

Abstract

The invention discloses a colon polyp image segmentation method based on shape perception and feature enhancement, which comprises the following steps: obtaining and dividing colon polyp images to obtain a training set and a testing set, and carrying out image enhancement preprocessing operation on the training set and the testing set; constructing an initial segmentation model based on PVTv a2 trunk extraction network, a shape perception module, a feature enhancement module, a parallel partial decoder and a multi-modal attention module; training the initial segmentation model through the training set after the pretreatment operation, and verifying through the test set after the pretreatment operation to obtain a segmentation model with weight parameters adjusted; and performing image segmentation on the image to be detected through the segmentation model to obtain a colon polyp image. The invention can solve the problems of under-segmentation and over-segmentation of polyps caused by complex size, shape and color of polyps and unclear boundaries of polyps and mucous membranes, realizes automatic segmentation of polyp images, and has good application prospect.

Description

Colon polyp image segmentation method based on shape perception and feature enhancement

技术领域Technical Field

本发明涉及基于深度学习的医学图像处理技术领域，特别涉及一种基于形状感知和特征增强的结肠息肉图像分割方法。The present invention relates to the technical field of medical image processing based on deep learning, and in particular to a colon polyp image segmentation method based on shape perception and feature enhancement.

背景技术Background technique

由于各种因素，结肠镜图像中的息肉分割是一项复杂的挑战，(1)结肠息肉的形状、大小、颜色等外观上或者数量有所不同，差异很大；(2)息肉与背景的外观颜色非常相似，因此息肉与背景边界通常是难以区分的。Polyp segmentation in colonoscopic images is a complex challenge due to various factors: (1) colon polyps vary greatly in shape, size, color, and quantity; and (2) the appearance and color of polyps and the background are very similar, so the boundary between polyps and the background is usually difficult to distinguish.

为了克服息肉难以分割的挑战，许多学者尝试了很多努力，已经为息肉分割任务开发了各种技术，主要分为两类，(1)基于手工特征的方法和(2)基于深度学习的方法。To overcome the challenge of polyp segmentation, many scholars have tried a lot of efforts and have developed various techniques for the polyp segmentation task, which can be mainly divided into two categories: (1) manual feature based methods and (2) deep learning based methods.

早期的传统方法采用手工制作的特征来进行息肉的分割，这些方法主要依赖于颜色、形状、纹理、边缘等特征，并训练分类器来区分息肉和周围正常粘膜。但是这些方法几乎不能捕捉息肉的全局上下文信息且对复杂场景不鲁棒，因此效率低且不准确。主要原因是息肉的形状大小颜色差异大，手工特征提取的能力相当有限。因此距离临床应用还有很远的距离。Early traditional methods used handcrafted features to segment polyps. These methods mainly relied on features such as color, shape, texture, and edge, and trained classifiers to distinguish polyps from surrounding normal mucosa. However, these methods can hardly capture the global contextual information of polyps and are not robust to complex scenes, so they are inefficient and inaccurate. The main reason is that polyps vary greatly in shape, size, and color, and the ability of manual feature extraction is quite limited. Therefore, it is still a long way from clinical application.

基于深度学习的结肠息肉分割方法相比与传统的方法可以节省更多的时间并提高分割精度，因此越来越多的人致力于使用卷积神经网络(CNNs)来进行息肉分割任务，其中基于编码器和解码器的的网络在医疗图片的语义分割方面表现出了优异的性能，如UNet、UNet++、ResUNet、ResUNet++等。编码器用来提取特征表示，解码器用来生成分割掩码，其中的跳跃连接可以将编码器中复杂且丰富的特征信息传递给解码器子网络，不仅促进了高级语义特征的传播，同时使网络保留低级的细节信息。为了进一步提高息肉分割网络的性能，还专门设计了专门用来进行息肉分割的特定网络，如PraNet、ACSNet、MSNet等，这些网络将反向注意、多尺度减法等思想引入了息肉分割问题，效果很好。尽管卷积神经网络(CNNs)在息肉分割方面取得了很大的进展，但是在距离临床应用应仍然有很大的差距。这些模型对于大多数简单息肉的预测具有良好的性能，而对复杂息肉的预测分辨率不理想，从而导致分割结果粗糙和边界不准确，并且由于卷积的操作具有局部性，网络只能通过堆叠卷积层来获取足够的感受野，全局建模能力有限，CNN无法对长距离依赖进行建模。Compared with traditional methods, the colon polyp segmentation method based on deep learning can save more time and improve segmentation accuracy. Therefore, more and more people are committed to using convolutional neural networks (CNNs) for polyp segmentation tasks. Among them, the network based on encoder and decoder has shown excellent performance in the semantic segmentation of medical images, such as UNet, UNet++, ResUNet, ResUNet++, etc. The encoder is used to extract feature representation, and the decoder is used to generate segmentation masks. The jump connection can pass the complex and rich feature information in the encoder to the decoder sub-network, which not only promotes the propagation of high-level semantic features, but also enables the network to retain low-level detail information. In order to further improve the performance of the polyp segmentation network, a specific network specifically designed for polyp segmentation is also designed, such as PraNet, ACSNet, MSNet, etc. These networks introduce ideas such as reverse attention and multi-scale subtraction into the polyp segmentation problem, and the effect is very good. Although convolutional neural networks (CNNs) have made great progress in polyp segmentation, there is still a big gap from clinical application. These models have good performance for the prediction of most simple polyps, but the prediction resolution of complex polyps is not ideal, resulting in rough segmentation results and inaccurate boundaries. In addition, due to the locality of convolution operations, the network can only obtain sufficient receptive field by stacking convolution layers. The global modeling capability is limited and CNN cannot model long-distance dependencies.

发明内容Summary of the invention

为解决上述现有技术中所存在的问题，本发明提供了如下技术方案：In order to solve the problems existing in the above-mentioned prior art, the present invention provides the following technical solutions:

一种基于形状感知和特征增强的结肠息肉图像分割方法，包括以下步骤：A colon polyp image segmentation method based on shape perception and feature enhancement comprises the following steps:

获取结肠息肉图像并进行划分，获得训练集和测试集，并对所述训练集和测试集进行图像增强预处理操作；Acquire and segment colon polyp images to obtain a training set and a test set, and perform image enhancement preprocessing operations on the training set and the test set;

基于PVTv2主干提取网络、形状感知模块、特征增强模块、并行部分解码器和多模态注意模块构建初始分割模型；Construct an initial segmentation model based on the PVTv2 backbone extraction network, shape perception module, feature enhancement module, parallel partial decoder and multimodal attention module;

通过预处理操作后的训练集对初始分割模型进行训练，并通过预处理操作后的测试集进行验证，获得调整权重参数的分割模型；The initial segmentation model is trained by the training set after the preprocessing operation, and verified by the test set after the preprocessing operation to obtain the segmentation model with adjusted weight parameters;

通过所述分割模型对待检测图像进行图像分割，获得结肠息肉图像。The image to be detected is segmented by using the segmentation model to obtain a colon polyp image.

优选的，获取结肠息肉图像的方法包括：从Kvasir-SEG、CVC-ClinicDB、CVC-ColonDB、CVC-T和ETIS中抽取结肠息肉的内窥镜图像。Preferably, the method for acquiring colon polyp images comprises: extracting endoscopic images of colon polyps from Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, CVC-T and ETIS.

优选的，所述图像增强预处理的方法包括：将所述结肠息肉图像集中图像的分辨率调整为352×352，并进行数据增强操作，所述数据增强操作包括水平翻转、垂直翻转、90度旋转和尺寸为224×224的随机剪裁。Preferably, the image enhancement preprocessing method includes: adjusting the resolution of the images in the colon polyp image set to 352×352, and performing data enhancement operations, wherein the data enhancement operations include horizontal flipping, vertical flipping, 90-degree rotation, and random cropping with a size of 224×224.

优选的，通过预处理操作后的训练集对初始分割模型进行训练的方法包括：PVTv2主干提取网络提取训练集中结肠息肉图像的多尺度特征，生成4个层次的特征，所述4个层次的特征分别为E₁、E₂、E₃、E₄，将特征E₁、E₂、E₃输入到形状感知模块中，生成分割图S₁，将特征E₂、E₃、E₄分别输入到特征增强模块中，获得增强特征F₂、F₃、F₄，将F₂、F₃、F₄输入到并行部分解码器中，生成初始分割图S₅；Preferably, the method for training the initial segmentation model by the training set after the preprocessing operation includes: extracting multi-scale features of the colon polyp image in the training set by the PVTv2 backbone extraction network to generate four levels of features, wherein the four levels of features are _E1 , _E2 , _E3 , and _E4 , respectively, inputting the features _E1 , _E2 , and _E3 into the shape perception module to generate a segmentation map _S1 , inputting the features _E2 , _E3 , and _E4 into the feature enhancement module to obtain enhanced features _F2 , _F3 , and _F4 , inputting _F2 , _F3 , and _F4 into the parallel partial decoder to generate an initial segmentation map _S5 ;

将F₄和初始分割图S₅输入到多模态注意模块中生成特征图M₄，将特征图M₄与初始分割图S₅相加生成分割图S₄，将F₃和S₄输入到多模态注意模块中生成特征图M₃，将特征图M₃与分割图S₄相加生成分割图S₃，将F₂和分割图S₃输入到多模态注意模块中生成特征图M₂，分割图S₁与特征图M₂和分割图S₃进行相加操作生成分割图S₂，将分割图S₂上采样到和输入的息肉图像相同的大小并进行Sigmoid操作得到最终的分割图。 _F4 and the initial segmentation map _S5 are input into the multimodal attention module to generate feature map _M4 , feature map _M4 is added to the initial segmentation map _S5 to generate segmentation map _S4 , _F3 and _S4 are input into the multimodal attention module to generate feature map _M3 , feature map _M3 is added to segmentation map _S4 to generate segmentation map _S3 , _F2 and segmentation map _S3 are input into the multimodal attention module to generate feature map _M2 , segmentation map _S1 is added to feature map _M2 and segmentation map _S3 to generate segmentation map _S2 , segmentation map _S2 is upsampled to the same size as the input polyp image and a Sigmoid operation is performed to obtain the final segmentation map.

优选的，所述生成分割图S₁的方法包括：将特征E₁、E₂、E₃分别输入到3×3的卷积层，得到卷积特征并按顺序进行相邻特征之间的特征减法操作，获得编码器中相邻级别的不同尺度的特征差，然后基于特征差通过信道注意力进行重要通道特征突出，得到E_{1_2}，E_{2_3}，对E_{1_2}和E_{2_3}进行3×3卷积层、特征相减、通道注意力操作得到E_{1_3}，将所述E_{1_2}和E_{1_3}进行相加得到E_{1_2_3}，最后对特征图E_{2_3}进行3×3卷积的结果和E_{1_2_3}进行通道拼接和卷积层操作获得互补增强特征，基于互补增强特征通过空间注意力获得分割图S₁。Preferably, the method for generating the segmentation map _S1 includes: inputting features _E1 , _E2 , and _E3 into a 3×3 convolution layer respectively to obtain convolution features and performing feature subtraction operations between adjacent features in sequence to obtain feature differences of different scales at adjacent levels in the encoder, and then highlighting important channel features through channel attention based on the feature differences to obtain _{E1_2} and _{E2_3} , performing a 3×3 convolution layer, feature subtraction, and channel attention operation on _{E1_2} and _{E2_3} to obtain _{E1_3} , adding _{E1_2} and _{E1_3} to obtain _{E1_2_3} , and finally performing channel splicing and convolution layer operations on the result of 3×3 convolution of the feature map _{E2_3} and _{E1_2_3} to obtain complementary enhanced features, and obtaining the segmentation map _S1 through spatial attention based on the complementary enhanced features.

优选的，所述特征增强模块包括第一条分支、第二条分支、第三条分支、第四条分支和残差分支；Preferably, the feature enhancement module includes a first branch, a second branch, a third branch, a fourth branch and a residual branch;

所述第一条分支包括依次连接的1×1卷积层和跨语义注意模块；The first branch includes a 1×1 convolutional layer and a cross-semantic attention module connected in sequence;

所述第二条分支包括依次连接的1×1卷积层、1×3卷积层、3×1卷积层、扩张率为3的膨胀卷积和跨语义注意模块；The second branch includes a 1×1 convolutional layer, a 1×3 convolutional layer, a 3×1 convolutional layer, a dilated convolution with a dilation rate of 3, and a cross-semantic attention module connected in sequence;

所述第三条分支包括依次连接的1×1卷积层、1×5卷积层、5×1卷积层、扩张率为5的膨胀卷积和跨语义注意模块；The third branch includes a 1×1 convolutional layer, a 1×5 convolutional layer, a 5×1 convolutional layer, a dilated convolution with a dilation rate of 5, and a cross-semantic attention module connected in sequence;

所述第四条分支包括依次连接的1×1卷积层、1×7卷积层、7×1卷积层、扩张率为7的膨胀卷积和跨语义注意模块；The fourth branch includes a 1×1 convolutional layer, a 1×7 convolutional layer, a 7×1 convolutional layer, a dilated convolution with a dilation rate of 7, and a cross-semantic attention module connected in sequence;

所述残差条分支包括依次连接的1×1卷积层和非局部操作；The residual strip branch includes sequentially connected 1×1 convolutional layers and non-local operations;

将所述第一条分支、第二条分支、第三条分支和第四条分支进行拼接，通过一个3×3卷积层和残差分支进行通道拼接得到特征增强模块。The first branch, the second branch, the third branch and the fourth branch are spliced, and a feature enhancement module is obtained by performing channel splicing through a 3×3 convolutional layer and a residual branch.

优选的，通过多模态注意模块生成特征图的过程包括：Preferably, the process of generating a feature map by a multimodal attention module comprises:

分别计算上一级分割图S_i+1，i＝2,3,4的前景注意、背景注意、边界注意，计算公式如下：Calculate the foreground attention, background attention, and boundary attention of the previous level segmentation map _Si+1 , i=2, 3, 4 respectively. The calculation formula is as follows:

其中，pred∈R^H×W×1表示上一级分割图，S_foreground表示前景注意，S_background表示背景注意，S_boundary表示边界注意；Among them, pred∈R ^H×W×1 represents the previous level segmentation map, S _foreground represents foreground attention, S _background represents background attention, and S _boundary represents boundary attention;

将所述前景注意、背景注意和边界注意分别与特征F_i相乘，得到三个结果分支，将所述结果分支分别进行1×1和3×3卷积操作，再与分割图S_i+1相加，对这三个结果分支进行通道连接并进行挤压激励和3×3卷积操作，最后再与特征F_i相加生成特征图M_i，其中i＝2,3,4。The foreground attention, background attention and boundary attention are multiplied with the feature _Fi respectively to obtain three result branches, and the result branches are subjected to 1×1 and 3×3 convolution operations respectively, and then added to the segmentation map S _i+1 . The three result branches are channel-connected and subjected to squeeze excitation and 3×3 convolution operations, and finally added to the feature _Fi to generate the feature map _Mi , where i=2,3,4.

优选的，在训练过程中通过损失函数进行深度监督，所述损失函数的表达式为：Preferably, deep supervision is performed during the training process through a loss function, and the expression of the loss function is:

其中，表示加权IoU损失，/>表示加权BCE损失；in, represents the weighted IoU loss,/> represents the weighted BCE loss;

总损失表达式为：The total loss expression is:

其中G表示真实标签，表示上采样的预测图。Where G represents the true label, Represents the upsampled prediction map.

本发明具有如下技术效果：The present invention has the following technical effects:

1、本发明提出的形状感知模块SAM利用各级主干特征之间的差异来区分息肉和背景，并去除背景噪声用于提取更清晰的形状、边界、纹理等细节信息。1. The shape perception module SAM proposed in the present invention utilizes the differences between the backbone features at each level to distinguish polyps from the background, and removes background noise to extract clearer shape, boundary, texture and other detail information.

2、本发明提出的特征增强模块FEM用于改善不同大小形状的息肉分割并选择和细化特征。2. The feature enhancement module FEM proposed in the present invention is used to improve the segmentation of polyps of different sizes and shapes and to select and refine features.

3、发明提出的多模态注意模块MMAM不仅仅只关注背景信息，关注焦点扩展到背景、前景和边界三个区域，从而交叉考虑这三个区域在局部区域分割中的作用，并实现加权平衡，从而提高网络的辨别能力。3. The multimodal attention module MMAM proposed in the invention not only focuses on background information, but also expands its focus to the three areas of background, foreground and boundary, so as to cross-consider the role of these three areas in local area segmentation and achieve weighted balance, thereby improving the network's discrimination ability.

4、本发明采用深度监督的方式，对各阶段产生的分割图进行损失计算，加快模型收敛速度并提高网络的训练效率。4. The present invention adopts a deep supervision method to calculate the loss of the segmentation maps generated at each stage, thereby accelerating the convergence of the model and improving the training efficiency of the network.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

图1为本发明提出的基于形状感知和特征增强的结肠息肉分割方法的流程示意图；FIG1 is a schematic flow chart of a method for segmenting colon polyps based on shape perception and feature enhancement proposed by the present invention;

图2为本发明提出的基于形状感知和特征增强的结肠息肉分割方法的框架结构示意图；FIG2 is a schematic diagram of the framework structure of the colon polyp segmentation method based on shape perception and feature enhancement proposed by the present invention;

图3为本发明提出的基于形状感知和特征增强的结肠息肉分割方法的形状感知模块SAM的示意图；FIG3 is a schematic diagram of a shape perception module SAM of the colon polyp segmentation method based on shape perception and feature enhancement proposed by the present invention;

图4为本发明提出的基于形状感知和特征增强的结肠息肉分割方法的特征增强模块FEM的示意图；FIG4 is a schematic diagram of a feature enhancement module FEM of the colon polyp segmentation method based on shape perception and feature enhancement proposed by the present invention;

图5为本发明提出的基于形状感知和特征增强的结肠息肉分割方法的跨语义注意模块CSA的示意图；FIG5 is a schematic diagram of a cross-semantic attention module CSA of the colon polyp segmentation method based on shape perception and feature enhancement proposed in the present invention;

图6为本发明提出的基于形状感知和特征增强的结肠息肉分割方法的并行部分解码器PD的示意图；FIG6 is a schematic diagram of a parallel partial decoder PD of the colon polyp segmentation method based on shape perception and feature enhancement proposed by the present invention;

图7为本发明提出的基于形状感知和特征增强的结肠息肉分割方法的多模态注意模块MMAM的示意图。FIG7 is a schematic diagram of the multimodal attention module MMAM of the colon polyp segmentation method based on shape perception and feature enhancement proposed in the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。It should be noted that the steps shown in the flowcharts of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and that, although a logical order is shown in the flowcharts, in some cases, the steps shown or described can be executed in an order different from that shown here.

实施例一Embodiment 1

如图1所示为本发明实施例所述的一种基于形状感知和特征增强的结肠息肉分割方法的流程示意图，所述的基于形状感知和特征增强的结肠息肉分割方法，包括以下步骤：FIG1 is a flow chart of a method for segmenting colon polyps based on shape perception and feature enhancement according to an embodiment of the present invention. The method for segmenting colon polyps based on shape perception and feature enhancement comprises the following steps:

步骤S1、获取公开的结肠息肉数据集，并将数据集划分为训练集和测试集，对结肠息肉数据进行数据增强操作；Step S1, obtaining a public colon polyp dataset, dividing the dataset into a training set and a test set, and performing data enhancement operations on the colon polyp data;

步骤S2、模型由PVTv2主干提取网络、形状感知模块SAM、特征增强模块FEM、并行部分解码器PD、多模态注意模块MMAM组成；Step S2, the model consists of a PVTv2 backbone extraction network, a shape perception module SAM, a feature enhancement module FEM, a parallel partial decoder PD, and a multimodal attention module MMAM;

步骤S3、将增强操作后的训练集依次输入到基于形状感知和特征增强的结肠息肉分割方法中进行多尺度训练，来提高训练的可靠性，将训练数据集尺寸缩放比例设置为{0.75,1.0,1.25}，用损失函数进行深度监督，计算个阶段生成的分割图和真实标签之间损失，调整参数，直到模型收敛时保存性能最好的epoch的权重参数文件；Step S3, input the training set after the enhancement operation into the colon polyp segmentation method based on shape perception and feature enhancement for multi-scale training to improve the reliability of the training, set the training data set size scaling ratio to {0.75, 1.0, 1.25}, use the loss function for deep supervision, calculate the loss between the segmentation map generated in each stage and the true label, adjust the parameters, and save the weight parameter file of the epoch with the best performance when the model converges;

步骤S4、将保存的权重参数文件导入到模型中，然后将测试图像输入到模型中，计算模型性能并保存分割结果图像。Step S4: import the saved weight parameter file into the model, then input the test image into the model, calculate the model performance and save the segmentation result image.

以下分别具体说明：The following are detailed descriptions:

所述步骤S1的具体过程如下：The specific process of step S1 is as follows:

其中，收集的公开的结肠息肉数据集包括Kvasir-SEG、CVC-ClinicDB、CVC-ColonDB、ETIS和CVC-T对于数据集的划分，使用900张Kvasir-SEG中的图像和550张CVC-ClinicDB中的图像一共1450张图像作为训练集，剩下的Kvasir-SEG中100张和CVC-ClinicDB中62张图像以及ETIS、CVC-ColonDB、CVC-T中的图像作为测试数据集，将结肠息肉数据集的分辨率统一调整为352×352，以及数据增强操作包括水平翻转、垂直翻转、90度旋转和尺寸为224×224的随机剪裁。Among them, the collected public colon polyp datasets include Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS and CVC-T. For the division of the dataset, 900 images in Kvasir-SEG and 550 images in CVC-ClinicDB, a total of 1450 images, are used as training sets, and the remaining 100 images in Kvasir-SEG and 62 images in CVC-ClinicDB as well as images in ETIS, CVC-ColonDB, and CVC-T are used as test datasets. The resolution of the colon polyp dataset is uniformly adjusted to 352×352, and the data enhancement operations include horizontal flipping, vertical flipping, 90-degree rotation, and random cropping of size 224×224.

下表给出本实施例中各结肠息肉数据集的详细信息：The following table gives detailed information of each colon polyp dataset in this embodiment:

数据集data set 分辨率Resolution 数量quantity 训练样本Training samples 测试样本Test samples Kvasir-SEGKvasir-SEG 332×487～1920×1072332×487～1920×1072 10001000 900900 100100 CVC-ClinicDBCVC-ClinicDB 384×288384×288 612612 550550 6262 CVC-ColonDBCVC-ColonDB 574×500574×500 380380 n/an/a 380380 ETISETIS 1225×9661225×966 196196 n/an/a 196196 CVC-TCVC-T 574×500574×500 6060 n/an/a 6060

所述步骤S2的具体过程如下：The specific process of step S2 is as follows:

如图2所示为本发明实施例所述的一种基于形状感知和特征增强的结肠息肉分割方法的框架结构示意图，使用PVTv2主干提取网络提取息肉的多尺度特征；形状感知模块SAM利用主干各级特征的差异来去除结肠镜图像中的噪声，提取更清晰的形状特征并生成一个分割图；特征增强模块FEM来增强和丰富编码器特征语义信息，捕捉不同形状和大小的息肉并选择和细化特征，然后输入到并行部分解码器PD中以生成初始分割图；使用设计的多模态注意模块MMAM将由并行部分解码器PD形成的全局特征图或者上一级解码器生成的高级分割图作为引导图，以实现背景、前景、边界的注意，使得网络更加关注可疑和复杂的息肉区域。As shown in Figure 2, a schematic diagram of the framework structure of a colon polyp segmentation method based on shape perception and feature enhancement according to an embodiment of the present invention is shown. The PVTv2 backbone extraction network is used to extract multi-scale features of polyps. The shape perception module SAM utilizes the differences in features at each level of the backbone to remove noise in the colonoscopy image, extract clearer shape features and generate a segmentation map. The feature enhancement module FEM is used to enhance and enrich the encoder feature semantic information, capture polyps of different shapes and sizes, select and refine features, and then input them into the parallel partial decoder PD to generate an initial segmentation map. The designed multimodal attention module MMAM is used to use the global feature map formed by the parallel partial decoder PD or the high-level segmentation map generated by the previous decoder as a guide map to achieve attention to the background, foreground, and boundaries, so that the network pays more attention to suspicious and complex polyp areas.

具体步骤如下：Specific steps are as follows:

PVTv2主干提取网络提取结肠息肉的多尺度特征，生成4个层次的特征，分别为E₁、E₂、E₃、E₄，低级特征包含更多的细节信息，所以为了捕获息肉更精细的形状、纹理、边界等信息，将各级主干特征E₁、E₂、E₃输入到形状感知模块SAM中，生成一个分割图S₁，将E₂、E₃、E₄输入特征增强模块FEM中分别生成增强特征F₂、F₃、F₄，将F₂、F₃、F₄输入到并行部分解码器PD中以聚合高级特征和捕获全局上下文信息生成粗定位的全局图分割图S₅，接下来将F₄和初始分割图S₅输入到多模态注意模块MMAM中生成特征M₄，M₄再与初始分割图S₅相加生成分割图S₄，然后以同样的操作将F₃和S₄输入到多模态注意模块MMAM中生成特征图M₃，再与上一级的分割图S₄相加生成分割图S₃，最后将由形状感知模块SAM中生成的分割图S₁与特征图M₂和分割图S₃进行相加操作生成分割图S₂，其中特征图M₂是F₂和S₃输入到多模态注意模块MMAM中生成的，将分割图S₂上采样到和输入的息肉图像相同的大小并进行Sigmoid操作得到最终的概率图。The PVTv2 backbone extraction network extracts multi-scale features of colon polyps and generates four levels of features, namely _E1 , _E2 , _E3 , and _E4 . Low-level features contain more detailed information. Therefore, in order to capture more detailed shape, texture, boundary and other information of polyps, the backbone features _E1 , _E2 , and _E3 at each level are input into the shape perception module SAM to generate a segmentation map _S1 . _E2 , _E3 , and _E4 are input into the feature enhancement module FEM to generate enhanced features _F2 , _F3 , and _F4 respectively. _F2 , _F3 , and _F4 are input into the parallel partial decoder PD to aggregate high-level features and capture global context information to generate a coarsely located global image segmentation map _S5 . Next, _F4 and the initial segmentation map _S5 are input into the multimodal attention module MMAM to generate feature _M4 . _M4 is then added to the initial segmentation map _S5 to generate segmentation map _S4 . Then, _F3 and S4 are added in the same way. ₄ is input into the multimodal attention module MMAM to generate a feature map M ₃ , which is then added to the segmentation map S ₄ of the previous level to generate a segmentation map S _3. Finally, the segmentation map S ₁ generated by the shape perception module SAM is added to the feature map M ₂ and the segmentation map S ₃ to generate a segmentation map S ₂ , where the feature map M ₂ is generated by inputting F ₂ and S ₃ into the multimodal attention module MMAM. The segmentation map S ₂ is upsampled to the same size as the input polyp image and a Sigmoid operation is performed to obtain the final probability map.

进一步的，PVTv2主干提取网络是在ImageNet上预训练的PVTv2，其中为了使PVTv2适应息肉分割任务，移除最后一个分类层，得到各级主干特征E₁、E₂、E₃、E₄，E₁作为最低级的特征包含有详细的细节特征，E₄作为最高级的特征包含有丰富的语义信息。Furthermore, the PVTv2 backbone extraction network is PVTv2 pre-trained on ImageNet, where in order to adapt PVTv2 to the polyp segmentation task, the last classification layer is removed to obtain backbone features _E1 , _E2 , _E3 , and _E4 at each level. _E1, as the lowest-level feature, contains detailed detail features, and _E4, as the highest-level feature, contains rich semantic information.

进一步的，如图3所示为本发明实施例所述的一种基于形状感知和特征增强的结肠息肉分割方法的形状感知模块SAM的示意图，形状感知模块SAM的输入为主干特征E₁、E₂、E₃，分别输入到3×3的卷积层，将得到的特征按顺序进行相邻特征之间的特征减法操作，以获得编码器中相邻级别的不同尺度的特征差，然后差异特征图分别通过信道注意力CA来突出重要的通道特征得到E_{1_2}，E_{2_3}，接下来对E_{1_2}和E_{2_3}进行3×3卷积、特征相减、通道注意力CA操作得到E_{1_3}，将得到的E_{1_2}和E_{1_3}进行相加得到E_{1_2_3}，最后对特征图E_{2_3}进行3×3卷积的结果和E_{1_2_3}进行通道拼接和卷积操作生成互补增强特征，并应用空间注意力SA获得具有息肉详细位置信息的特征分割图S₁。Further, as shown in FIG3 , a schematic diagram of a shape perception module SAM of a colon polyp segmentation method based on shape perception and feature enhancement described in an embodiment of the present invention is shown. The input of the shape perception module SAM is the main features E ₁ , E ₂ , and E ₃ , which are respectively input into the 3×3 convolution layer. The obtained features are sequentially subjected to feature subtraction operations between adjacent features to obtain feature differences of different scales at adjacent levels in the encoder. Then, the difference feature maps are respectively subjected to channel attention CA to highlight important channel features to obtain E _{1_2} and E _{2_3} . Next, E _{1_2} and E _{2_3} are subjected to 3×3 convolution, feature subtraction, and channel attention CA operations to obtain E _{1_3} . The obtained E _{1_2} and E _{1_3} are added to obtain E _{1_2_3} . Finally, the result of 3×3 convolution of the feature map E _{2_3} and E _{1_2_3} are subjected to channel splicing and convolution operations to generate complementary enhanced features, and spatial attention SA is applied to obtain a feature segmentation map S ₁ with detailed location information of the polyp.

进一步的，如图4所示为本发明实施例所述的一种基于形状感知和特征增强的结肠息肉分割方法的特征增强模块FEM的示意图，所述特征增强模块FEM包括：第一条分支、第二条分支、第三条分支、第四条分支、残差分支；Further, as shown in FIG4 , a schematic diagram of a feature enhancement module FEM of a method for segmenting colon polyps based on shape perception and feature enhancement according to an embodiment of the present invention is shown, wherein the feature enhancement module FEM comprises: a first branch, a second branch, a third branch, a fourth branch, and a residual branch;

所述特征增强模块FEM第一条分支包括：依次连接1×1卷积层和跨语义注意模块CSA；The first branch of the feature enhancement module FEM includes: sequentially connecting a 1×1 convolutional layer and a cross-semantic attention module CSA;

所述特征增强模块FEM第二条分支包括：依次连接1×1卷积层、1×3卷积层、3×1卷积层、扩张率为3的膨胀卷积和跨语义注意模块CSA；The second branch of the feature enhancement module FEM includes: a 1×1 convolution layer, a 1×3 convolution layer, a 3×1 convolution layer, a dilated convolution with a dilation rate of 3, and a cross-semantic attention module CSA connected in sequence;

所述特征增强模块FEM第三条分支包括：依次连接1×1卷积层、1×5卷积层、5×1卷积层、扩张率为5的膨胀卷积和跨语义注意模块CSA；The third branch of the feature enhancement module FEM includes: sequentially connecting a 1×1 convolution layer, a 1×5 convolution layer, a 5×1 convolution layer, a dilated convolution with a dilation rate of 5, and a cross-semantic attention module CSA;

所述特征增强模块FEM第四条分支包括：依次连接1×1卷积层、1×7卷积层、7×1卷积层、扩张率为7的膨胀卷积和跨语义注意模块CSA；The fourth branch of the feature enhancement module FEM includes: sequentially connecting a 1×1 convolution layer, a 1×7 convolution layer, a 7×1 convolution layer, a dilated convolution with a dilation rate of 7, and a cross-semantic attention module CSA;

所述特征增强模块FEM残差条分支包括：依次连接1×1卷积层和非局部操作Non-Local；The feature enhancement module FEM residual strip branch includes: sequentially connecting a 1×1 convolutional layer and a non-local operation Non-Local;

将上述四条分支拼接，然后通过一个3×3卷积层，然后和残差分支进行通道拼接得到输出特征。The above four branches are concatenated, then passed through a 3×3 convolutional layer, and then channel-concatenated with the residual branch to obtain the output features.

进一步的，如图5所示为本发明实施例所述的一种基于形状感知和特征增强的结肠息肉分割方法的跨语义注意模块CSA的示意图，跨语义注意模块CSA的输入特征沿着通道方向分为E_ic和E_is，E_ic用来提取一维通道全局信息M_ic，其中对E_ic进行全局平均池化，然后将得到的2D张量转换为1D，接下来进行一维卷积操作，最后将所获得的矢量转换回2D空间并进行Sigmoid操作；E_is用来提取二维空间信息M_is，对E_is进行3×3的卷积操作和Sigmoid操作，将这两部分M_ic和M_is以交互的方式结合起来，交互的具体操作为两个分支，第一个分支将M_is和E_ic进行矩阵乘法操作；第二个分支对E_is进行3×3的卷积操作，然后和M_ic进行矩阵乘法操作；最后对这两个分支进行通道上的拼接。Furthermore, as shown in FIG5 , a schematic diagram of a cross-semantic attention module CSA of a colon polyp segmentation method based on shape perception and feature enhancement described in an embodiment of the present invention is shown. The input features of the cross-semantic attention module CSA are divided into E _ic and E _is along the channel direction. E _ic is used to extract one-dimensional channel global information M _ic , wherein E _ic is globally averaged pooled, and then the obtained 2D tensor is converted to 1D, followed by a one-dimensional convolution operation, and finally the obtained vector is converted back to 2D space and a Sigmoid operation is performed; E _is used to extract two-dimensional spatial information M _is , and a 3×3 convolution operation and a Sigmoid operation are performed on E _is , and the two parts M _ic and _Mis are combined in an interactive manner. The specific operation of the interaction is two branches, the first branch performs a matrix multiplication operation on _Mis and E _ic ; the second branch performs a 3×3 convolution operation on E _is , and then performs a matrix multiplication operation with M _ic ; finally, the two branches are spliced on the channel.

进一步的，如图6所示为本发明实施例所述的一种基于形状感知和特征增强的结肠息肉分割方法的并行部分解码器PD的示意图，并行部分解码器PD的示意图的输入为来自特征增强模块FEM的输出结果，分别为F₂、F₃、F₄，其中一个分支将F₄上采样2倍，使其F₄大小和F₃相同，将F₄上采样的结果与F₃相乘之后再与F₄上采样的结果进行通道上的拼接得到f_{3_4}；另一个分支将F₃上采样2倍和将F₄上采样4倍的结果与F₂相乘，然后与进行上采样并进行3×3卷积操作的f_{3_4}进行通道上的拼接得到f_{2_3_4}，接下来对f_{2_3_4}进行个两个3×3卷积和一个1×1卷积，得到的特征图上采样到输入图像的大小得到初始分割图S₅ Further, as shown in FIG6 , a schematic diagram of a parallel partial decoder PD of a method for segmenting colon polyps based on shape perception and feature enhancement according to an embodiment of the present invention is shown. The input of the schematic diagram of the parallel partial decoder PD is the output results from the feature enhancement module FEM, which are F ₂ , F ₃ , and F ₄ , respectively. One branch upsamples F ₄ by 2 times to make its size the same as F ₃ , multiplies the upsampled result of F ₄ by F _{3 ,} _and then performs channel splicing with the upsampled result of F ₄ to obtain f _{3_4} ; another branch upsamples F ₃ by 2 times and the upsampled result of F ₄ by 4 times by F ₂ , and then performs channel splicing with f _{3_4} that is upsampled and subjected to a 3×3 convolution operation to obtain f _{2_3_4} , and then performs two 3×3 convolutions and one 1×1 convolution on f _{2_3_4} , and the obtained feature map is upsampled to the size of the input image to obtain the initial segmentation map S ₅

进一步的，如图7所示为本发明实施例所述的一种基于形状感知和特征增强的结肠息肉分割方法的模态注意模块MMAM的示意图，多模态注意模块MMAM分别计算上一级分割图S_i+1，i＝3,4,5的前景注意、背景注意、边界注意，计算公式如下：Further, as shown in FIG7 , a schematic diagram of a modal attention module MMAM of a colon polyp segmentation method based on shape perception and feature enhancement according to an embodiment of the present invention is shown. The multimodal attention module MMAM calculates the foreground attention, background attention, and boundary attention of the previous level segmentation map S _i+1 , i=3, 4, 5 respectively. The calculation formula is as follows:

其中pred∈R^H×W×1表示上一级分割图，S_foreground表示前景注意，S_background表示背景注意，S_boundary表示边界注意；Where pred∈R ^H×W×1 represents the previous level segmentation map, S _foreground represents foreground attention, S _background represents background attention, and S _boundary represents boundary attention;

这三个注意再分别与特征F_i相乘，接下来将得到的结果分支分别进行1×1和3×3卷积操作，再与分割图S_i+1相加，对这三个分支进行通道连接并进行挤压激励SE和3×3卷积操作，最后再与特征F_i相加生成特征图M_i，其中i＝2,3,4。These three attentions are then multiplied with the feature _Fi respectively, and then the resulting branches are subjected to 1×1 and 3×3 convolution operations respectively, and then added to the segmentation map Si ₊₁ . The three branches are channel-connected and subjected to squeeze-excitation SE and 3×3 convolution operations, and finally added to the feature _Fi to generate the feature map _Mi , where i=2,3,4.

所述步骤S3的具体过程如下：The specific process of step S3 is as follows:

将经过预处理的训练数据集输入到基于形状感知和特征增强的结肠息肉分割方法中进行训练，超参数中初始学习率为0.0001，权重衰减也调整为0.0001，采用AdamW优化算法，批量大小为8，用损失函数进行深度监督，计算个阶段生成的分割图和真实标签之间损失，其中损失函数的定义为:The preprocessed training data set is input into the colon polyp segmentation method based on shape perception and feature enhancement for training. The initial learning rate in the hyperparameters is 0.0001, and the weight decay is also adjusted to 0.0001. The AdamW optimization algorithm is used, the batch size is 8, and the loss function is used for deep supervision. The loss between the segmentation map generated in each stage and the true label is calculated. The loss function is defined as:

其中表示加权IoU损失，/>表示加权BCE损失；in represents the weighted IoU loss,/> represents the weighted BCE loss;

总损失为：The total loss is:

其中G表示真实标签，表示上采样的分割图；Where G represents the true label, represents the upsampled segmentation map;

最后直到模型收敛时保存性能最好的epoch的权重参数文件。Finally, save the weight parameter file of the epoch with the best performance until the model converges.

所述步骤S4的具体过程如下：The specific process of step S4 is as follows:

将测试图像输入到已经训练好的模型中，用保存好的权重参数测试模型的性能，输出并保存分割图。Input the test image into the trained model, test the performance of the model with the saved weight parameters, and output and save the segmentation map.

为了验证本发明与现有技术对比的有效性，我们分别与UNet、UNet++、SFA、PraNet、ACSNet、EU-Net、SANet、MSNet技术在测试数据集中进行对比实验，实验结果如下表：In order to verify the effectiveness of the present invention compared with the prior art, we conducted comparative experiments with UNet, UNet++, SFA, PraNet, ACSNet, EU-Net, SANet, and MSNet technologies in the test data set. The experimental results are shown in the following table:

由比较结果可知，我们的发明比较与上面的8个现有的经典方法有很好的学习能力和泛化能力，并具有精准的分割息肉图像的能力。From the comparison results, it can be seen that our invention has good learning and generalization capabilities compared with the above 8 existing classic methods, and has the ability to accurately segment polyp images.

以上显示和描述了本发明的基本原理、主要特征和优点。本行业的技术人员应该了解，本发明不受上述实施例的限制，上述实施例和说明书中描述的只是说明本发明的原理，在不脱离本发明精神和范围的前提下，本发明还会有若干种变化和改进，这些变化和改进都落入要求保护的本发明范围内。本发明要求保护范围由所附的权利要求书及其等效物界定。The above shows and describes the basic principles, main features and advantages of the present invention. It should be understood by those skilled in the art that the present invention is not limited to the above embodiments. The above embodiments and descriptions are only for explaining the principles of the present invention. Without departing from the spirit and scope of the present invention, the present invention may have several changes and improvements, which fall within the scope of the present invention. The scope of protection of the present invention is defined by the attached claims and their equivalents.

Claims

1. A method for segmenting a colon polyp image based on shape perception and feature enhancement, comprising the steps of:

obtaining and dividing colon polyp images to obtain a training set and a testing set, and carrying out image enhancement preprocessing operation on the training set and the testing set;

Constructing an initial segmentation model based on PVTv a2 trunk extraction network, a shape perception module, a feature enhancement module, a parallel partial decoder and a multi-modal attention module;

Training the initial segmentation model through the training set after the pretreatment operation, and verifying through the test set after the pretreatment operation to obtain a segmentation model with weight parameters adjusted;

And performing image segmentation on the image to be detected through the segmentation model to obtain a colon polyp image.

2. The method for segmentation of a colon polyp image based on shape perception and feature enhancement as set forth in claim 1,

The method for acquiring the colon polyp image comprises the following steps: endoscopic images of colon polyps were extracted from Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, CVC-T, and ETIS.

3. The method for segmentation of a colon polyp image based on shape perception and feature enhancement as set forth in claim 1,

The image enhancement preprocessing method comprises the following steps: the resolution of the image in the set of colon polyp images is adjusted to 352 x 352 and a data enhancement operation is performed including horizontal flip, vertical flip, 90 degree rotation and random cropping of 224 x 224 size.

4. The method for segmentation of a colon polyp image based on shape perception and feature enhancement as set forth in claim 1,

The method for training the initial segmentation model through the training set after the preprocessing operation comprises the following steps: PVTv 2a trunk extraction network extracts multi-scale features of a colon polyp image in a training set to generate 4 layers of features, wherein the 4 layers of features are E ₁、E₂、E₃、E₄ respectively, the features E ₁、E₂、E₃ are input into a shape sensing module to generate a segmentation map S ₁, the features E ₂、E₃、E₄ are input into a feature enhancement module respectively to obtain enhancement features F ₂、F₃、F₄, the features F ₂、F₃、F₄ are input into a parallel partial decoder to generate an initial segmentation map S ₅;

The method comprises the steps of inputting the F ₄ and the initial segmentation map S ₅ into a multi-mode attention module to generate a feature map M ₄, adding the feature map M ₄ and the initial segmentation map S ₅ to generate a segmentation map S ₄, inputting the F ₃ and the S ₄ into the multi-mode attention module to generate a feature map M ₃, adding the feature map M ₃ and the segmentation map S ₄ to generate a segmentation map S ₃, inputting the F ₂ and the segmentation map S ₃ into the multi-mode attention module to generate a feature map M ₂, adding the segmentation map S ₁ and the feature map M ₂ and the segmentation map S ₃ to generate a segmentation map S ₂, upsampling the segmentation map S ₂ to the same size as the input polyp image and performing a Sigmoid operation to obtain a final segmentation map.

5. The method for segmentation of a colon polyp image based on shape perception and feature enhancement as set forth in claim 4,

The method for generating the segmentation map S ₁ includes: the method comprises the steps of inputting characteristics E ₁、E₂、E₃ into a 3X 3 convolution layer respectively to obtain convolution characteristics, sequentially carrying out characteristic subtraction operation between adjacent characteristics to obtain characteristic differences of different scales of adjacent levels in an encoder, carrying out important channel characteristic highlighting through channel attention based on the characteristic differences to obtain E _{1_2},E_{2_3}, carrying out 3X 3 convolution layers, characteristic subtraction and channel attention operation on E _{1_2} and E _{2_3} to obtain E _{1_3}, adding E _{1_2} and E _{1_3} to obtain E _{1_2_3}, carrying out channel splicing and convolution layer operation on a result of 3X 3 convolution on a characteristic diagram E _{2_3} to obtain complementary enhancement characteristics, and carrying out space attention based on the complementary enhancement characteristics to obtain a segmentation diagram S ₁.

6. The method for segmentation of a colon polyp image based on shape perception and feature enhancement as set forth in claim 4,

The characteristic enhancement module comprises a first branch, a second branch, a third branch, a fourth branch and a residual branch;

the first branch comprises a1 multiplied by 1 convolution layer and a cross-semantic attention module which are connected in sequence;

The second branch comprises a 1 multiplied by 1 convolution layer, a 1 multiplied by 3 convolution layer, a 3 multiplied by 1 convolution layer, an expansion convolution and cross-semantic attention module with expansion rate of 3 which are connected in sequence;

The third branch comprises a 1 multiplied by 1 convolution layer, a 1 multiplied by 5 convolution layer, a 5 multiplied by 1 convolution layer, an expansion convolution and cross-semantic attention module with expansion ratio of 5 which are connected in sequence;

the fourth branch comprises a 1 multiplied by 1 convolution layer, a 1 multiplied by 7 convolution layer, a 7 multiplied by 1 convolution layer and an expansion convolution and cross-semantic attention module with expansion rate of 7 which are connected in sequence;

The residual stripe branch comprises a1 multiplied by 1 convolution layer and non-local operation which are connected in sequence;

And splicing the first branch, the second branch, the third branch and the fourth branch, and performing channel splicing through a 3X 3 convolution layer and a residual branch to obtain the characteristic enhancement module.

7. The method for segmentation of a colon polyp image based on shape perception and feature enhancement as set forth in claim 4,

The process of generating a feature map by the multimodal attention module includes:

The foreground attention, background attention, and boundary attention of the previous stage segmentation map S _i+1, i=2, 3, and 4 are calculated respectively, and the calculation formula is as follows:

S_foreground＝pred,S_background＝1-pred,

Wherein pred ε R ^H×W×1 represents the previous segmentation map, S _foreground represents foreground attention, S _background represents background attention, and S _boundary represents boundary attention;

The foreground attention, the background attention and the boundary attention are multiplied by a feature F _i respectively to obtain three result branches, the result branches are subjected to 1×1 convolution operation and 3×3 convolution operation respectively, then added with a segmentation map S _i+1, the three result branches are subjected to channel connection, extrusion excitation and 3×3 convolution operation, and finally added with a feature F _i to generate a feature map M _i, wherein i=2, 3 and 4.

8. The method for segmentation of a colon polyp image based on shape perception and feature enhancement as set forth in claim 1,

And carrying out deep supervision through a loss function in the training process, wherein the expression of the loss function is as follows:

wherein, Representing weighted IoU loss,/>Representing weighted BCE loss;

The total loss expression is:

wherein G represents the actual label and wherein, Representing the upsampled prediction map.