CN113362223B

CN113362223B - Image Super-Resolution Reconstruction Method Based on Attention Mechanism and Two-Channel Network

Info

Publication number: CN113362223B
Application number: CN202110573693.3A
Authority: CN
Inventors: 张旭; 何涛; 夏英
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Shenzhen Hidion Intelligent Technology Co ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2022-06-24
Anticipated expiration: 2041-05-25
Also published as: CN113362223A

Abstract

The invention belongs to the field of artificial intelligence, deep learning and image processing, and particularly relates to an attention mechanism and two-channel network-based image super-resolution reconstruction method, which comprises the following steps: acquiring an image to be detected in real time, and preprocessing the image to be detected; inputting the preprocessed image into a trained image super-resolution reconstruction model to obtain a high-definition reconstruction image; evaluating the reconstructed image by adopting the peak signal-to-noise ratio and the structural similarity, and marking the high-definition reconstructed image according to an evaluation result; the basis of the image super-resolution reconstruction model is a convolutional neural network; the invention uses a dual-channel network, one network uses an improved residual error structure to extract valuable high-frequency characteristics, namely high-grade characteristics, and the other network uses an improved VGG network, so that the sizes of input and output images are consistent, abundant low-frequency characteristics are extracted, and finally, the characteristics are fused, so that the reconstructed images are clearer.

Description

Image super-resolution reconstruction method based on attention mechanism and two-channel network

技术领域technical field

本发明属于人工智能、深度学习和图像处理领域，具体涉及一种基于注意力机制和双通道网络的图像超分辨率重建方法。The invention belongs to the fields of artificial intelligence, deep learning and image processing, in particular to an image super-resolution reconstruction method based on an attention mechanism and a dual-channel network.

背景技术Background technique

图像超分辨率重建技术就是利用一组低质量、低分辨率图像(或运动序列)来产生高质量、高分辨率图像。图像超分辨率重建应用于各种计算机视觉任务中，包括监控成像，医学成像，物体识别等。在现实生活中，受限于图像采集设备成本、视频图像传输带宽，抑或是成像模态本身的技术瓶颈，我们并不是每一次都有条件获得边缘锐化，无块状模糊的大尺寸高清图像。在这种需求背景下，超分辨率重建技术应运而生，传统的图像超分辨率(SR)重建问题被定义为从其低分辨率(LR)观察中恢复高分辨率(HR)图像。使用图像超分辨率重建技术可以提高图像的识别能力和识别精度，可以实现目标物的专注分析，从而可以获取感兴趣区域更高空间分辨率的图像，而不必直接采用数据量巨大的高空间分辨率图像的配置。而基于学习的图像超分辨率重建技术是近些年的热门方向，借助深度学习中端到端映射的卷积神经网络技术，通过学习低分辨与高分辨图像之间的映射关系估计低分辨图像中丢失的高频细节，以获得边缘清晰、纹理细节丰富的高质量图像。近年来，随着注意力机制的提出，越来越多的模型开始尝试使用注意力机制提升效果，在图像超分辨率重建中，将注意力机制嵌入到模型中，提升结果的精度。Image super-resolution reconstruction technology uses a set of low-quality, low-resolution images (or motion sequences) to generate high-quality, high-resolution images. Image super-resolution reconstruction is applied in various computer vision tasks, including surveillance imaging, medical imaging, object recognition, etc. In real life, limited by the cost of image acquisition equipment, video image transmission bandwidth, or the technical bottleneck of the imaging modality itself, we do not always have the conditions to obtain large-scale high-definition images with sharp edges and no blocky blur. . In the context of this demand, super-resolution reconstruction techniques emerge as the times require, and the traditional image super-resolution (SR) reconstruction problem is defined as recovering a high-resolution (HR) image from its low-resolution (LR) observations. The use of image super-resolution reconstruction technology can improve the recognition ability and accuracy of images, and can achieve focused analysis of the target object, so that higher spatial resolution images of the region of interest can be obtained without directly using high spatial resolution with huge amounts of data. rate image configuration. Learning-based image super-resolution reconstruction technology is a popular direction in recent years. With the help of the convolutional neural network technology of end-to-end mapping in deep learning, low-resolution images are estimated by learning the mapping relationship between low-resolution and high-resolution images. High-frequency details that are lost in the mid-range to obtain high-quality images with sharp edges and rich texture details. In recent years, with the introduction of the attention mechanism, more and more models have begun to try to use the attention mechanism to improve the effect. In the image super-resolution reconstruction, the attention mechanism is embedded into the model to improve the accuracy of the results.

传统图像重建方法中，常采用CNN卷积神经网络计算，在计算过程中会逐步的丢失许多平滑的信息，而图像超分辨率重建的主要目的就是获取高精度的高清图片，任何信息的丢失都会影响最终的重建精度；且CNN卷积神经网络的计算采用局部感受野的方式，在计算时，受限于卷积核尺寸的大小，所获取的信息是图片的局部信息，无法获取整张图片的全局信息，而图片中的像素点之间是有关联的，较远的长距离依赖的信息也是图像重建的一个重要依赖信息。In traditional image reconstruction methods, CNN convolutional neural network is often used for calculation, and a lot of smooth information will be gradually lost in the calculation process. The main purpose of image super-resolution reconstruction is to obtain high-precision high-definition pictures, any loss of information will It affects the final reconstruction accuracy; and the calculation of the CNN convolutional neural network adopts the method of local receptive field. During the calculation, limited by the size of the convolution kernel, the obtained information is the local information of the picture, and the entire picture cannot be obtained. The global information of the image, and the pixels in the picture are related, and the long-distance dependent information is also an important dependent information for image reconstruction.

发明内容SUMMARY OF THE INVENTION

为解决以上现有技术存在的问题，本发明提出了一种基于注意力机制和双通道网络的图像超分辨率重建方法，该方法包括：实时获取待检测图像，对待检测图像进行预处理；将预处理后的图像输入到训练好的图像超分辨率重建模型中，得到高清重建图；采用峰值信噪比和结构相似性对重建图进行评价，根据评价结果对高清重建图进行标记；图像超分辨率重建模型基础为卷积神经网络；In order to solve the above problems in the prior art, the present invention proposes an image super-resolution reconstruction method based on an attention mechanism and a dual-channel network. The method includes: acquiring an image to be detected in real time, and preprocessing the image to be detected; The preprocessed image is input into the trained image super-resolution reconstruction model to obtain a high-definition reconstructed image; the peak signal-to-noise ratio and structural similarity are used to evaluate the reconstructed image, and the high-definition reconstructed image is marked according to the evaluation results; The resolution reconstruction model is based on a convolutional neural network;

对图像超分辨率重建模型进行训练的过程包括：The process of training an image super-resolution reconstruction model includes:

S1：获取原始高清图片数据集，采用双三次插值降解模型对数据集中的图片进行缩放；S1: Obtain the original high-definition picture data set, and use the bicubic interpolation degradation model to scale the pictures in the data set;

S2：对缩放后的数据集进行预处理，得到训练数据集；S2: Preprocess the scaled data set to obtain a training data set;

S3：将训练数据集中的每个图像数据分别输入到图像超分辨率重建模型中的浅层特征通道和深层特征通道中进行特征提取；S3: Input each image data in the training dataset into the shallow feature channel and the deep feature channel in the image super-resolution reconstruction model for feature extraction;

S4：采用第一卷积层提取输入图像的初始特征；将初始特征输入到信息级联模块中，聚合卷积层的层级特征信息；S4: Use the first convolution layer to extract the initial features of the input image; input the initial features into the information cascade module to aggregate the hierarchical feature information of the convolution layer;

S5：将信息级联模块聚合的层级特征信息输入到改进的残差模块中，得到通道上的关联性和全局空间上的依赖信息；S5: Input the hierarchical feature information aggregated by the information cascade module into the improved residual module to obtain the correlation on the channel and the dependency information on the global space;

S6：采用非局部空洞卷积对依赖信息进行全局特征提取，得到最终的深层特征图；S6: Use non-local hole convolution to perform global feature extraction on dependency information to obtain the final deep feature map;

S7：采用第二卷积层提取输入图像的初始特征；将初始特征输入到改进的VGG网络中，提取图像的浅层特征，得到浅层特征图；S7: use the second convolution layer to extract the initial features of the input image; input the initial features into the improved VGG network, extract the shallow features of the image, and obtain a shallow feature map;

S8：将深层特征图和浅层特征图进行融合，对融合特征图进行上采样，得到高清重建图；S8: fuse the deep feature map and the shallow feature map, and upsample the fused feature map to obtain a high-definition reconstruction map;

S9：利用损失函数约束高清重建图像与原始高清图像之间的差异，不断调整模型的参数，直到模型收敛，完成模型的训练。S9: Use the loss function to constrain the difference between the high-definition reconstructed image and the original high-definition image, and continuously adjust the parameters of the model until the model converges, and the training of the model is completed.

优选的，采用双三次插值降解模型对数据集中的图片进行缩放的倍数为2倍、3倍、4倍和8倍。Preferably, the bicubic interpolation degradation model is used to zoom the pictures in the data set by 2 times, 3 times, 4 times and 8 times.

优选的，双三次插值降解模型的公式为：Preferably, the formula of the bicubic interpolation degradation model is:

I^LR＝H_dnI^HR+nI ^LR =H _dn I ^HR +n

优选的，对缩放后的数据集进行预处理的过程包括对图像进行增强处理，包括对图像进行平移，水平和竖直方向的翻转处理；将增强后的数据进行分割为不同的小图像块，将分割后的图像进行集合，得到训练数据集。Preferably, the process of preprocessing the scaled data set includes performing enhancement processing on the image, including performing translation processing on the image and flipping the image in horizontal and vertical directions; dividing the enhanced data into different small image blocks, Assemble the segmented images to obtain a training data set.

优选的，信息级联模块包括堆叠10次特征聚合结构；特征聚合结构包括至少三层卷积神经网络、特征通道合并层、通道注意力层和通道数变换层，各个层卷积神经网络依次连接，且除最后一层卷神经网络外的各个层卷积神经网络的输出端支路连接特征通道合并层，特征通道合并层、通道注意力层和通道数变换层依次连接，构成信息级联模块；该模块处理图像数据的过程包括：首先使用各层卷积神经网络依次对输入图像进行特征信息的提取，之后将每层卷积提取的特征信息在特征通道合并层上合并，使用通道注意力机制对合并的信息进行重要性区分，最后将通道数降为输入通道数大小，重复上述步骤10次，得到聚合卷积层的层级特征信息。Preferably, the information cascading module includes a feature aggregation structure stacked 10 times; the feature aggregation structure includes at least three layers of convolutional neural networks, a feature channel merging layer, a channel attention layer and a channel number transformation layer, and each layer of the convolutional neural network is connected in turn. , and the output branch of each layer of convolutional neural network except the last layer of convolutional neural network is connected to the feature channel merging layer, the feature channel merging layer, the channel attention layer and the channel number transformation layer are connected in turn to form an information cascade module ; The process of processing image data by this module includes: firstly using each layer of convolutional neural network to extract feature information of the input image in turn, and then merging the feature information extracted by each layer of convolution on the feature channel merging layer, using channel attention The mechanism distinguishes the importance of the merged information, and finally reduces the number of channels to the number of input channels, and repeats the above steps 10 times to obtain the hierarchical feature information of the aggregated convolutional layer.

优选的，改进的残差模块包括：残差网络结构、通道注意力机制层和空间注意力机制层，残差网络结构包括卷积神经网络层、非线性激活层和卷积神经网络层；该模块处理图像数据的过程包括：将层级特征信息输入到残差网络结构中提取特征信息，将提取到的特征信息使用通道注意力机制获取通道上的关联性，再往下传递，使用空间注意力机制获取全局空间上的依赖性。Preferably, the improved residual module includes: a residual network structure, a channel attention mechanism layer and a spatial attention mechanism layer, and the residual network structure includes a convolutional neural network layer, a nonlinear activation layer and a convolutional neural network layer; the The process of the module processing image data includes: inputting the hierarchical feature information into the residual network structure to extract feature information, using the extracted feature information to use the channel attention mechanism to obtain the correlation on the channel, and then passing it down, using spatial attention. Mechanism to obtain dependencies on the global space.

优选的，非局部空洞卷积块包括：四层并行的膨胀参数分别为1、2、4、6的空洞卷积层和三层普通卷积神经网络层；该模块处理图像数据的过程包括：首先采用四种不同膨胀参数的空洞卷积和两种普通卷积神经网络分别对改进的残差网络输入的依赖信息进行特征信息提取；然后将四种空洞卷积获取的特征信息在特征通道上进行融合，普通卷积神经网络提取的特征信息则按像素矩阵的值进行融合；最后将这两种融合的特征信息相加，获取全局的特征信息Preferably, the non-local atrous convolution block includes: four parallel atrous convolutional layers with dilation parameters of 1, 2, 4, and 6 and three ordinary convolutional neural network layers; the process of processing image data by this module includes: Firstly, four kinds of atrous convolutions with different dilation parameters and two kinds of ordinary convolutional neural networks are used to extract feature information from the dependency information input by the improved residual network; For fusion, the feature information extracted by the ordinary convolutional neural network is fused according to the value of the pixel matrix; finally, the two kinds of fused feature information are added to obtain the global feature information

优选的，改进的VGG网络结构包括：10层普通卷积层和3层池化层，将各个池化层嵌入到普通卷积层中，得到VGG网络结构；该模块处理图像数据的过程包括：首先使用2层卷积和一层池化提取64个通道特征信息，之后使用使用2层卷积和一层池化提取128个通道特征信息，然后使用3层卷积和一层池化提取512个通道特征信息，最后再使用3层卷积提取512通道的信息后还原为64通道；其中，池化层使用padding保持特征尺度不变。Preferably, the improved VGG network structure includes: 10 layers of ordinary convolution layers and 3 layers of pooling layers, and each pooling layer is embedded in the ordinary convolution layer to obtain the VGG network structure; the process of processing image data by this module includes: First use 2 layers of convolution and one layer of pooling to extract 64 channel feature information, then use 2 layers of convolution and one layer of pooling to extract 128 channel feature information, and then use 3 layers of convolution and one layer of pooling to extract 512 channel feature information channel feature information, and finally use 3 layers of convolution to extract the information of 512 channels and restore it to 64 channels; among them, the pooling layer uses padding to keep the feature scale unchanged.

优选的，图像超分辨率重建模型的损失函数表达式为：Preferably, the loss function expression of the image super-resolution reconstruction model is:

优选的，采用峰值信噪比和结构相似性对重建图进行评价的公式为：Preferably, the formula for evaluating the reconstructed image using peak signal-to-noise ratio and structural similarity is:

本发明的优点：Advantages of the present invention:

1.本发明使用了双通道网络，一路网络使用改进的残差结构提取了有价值的高频特征-即高级特征，一路网络使用改进的VGG(对vgg卷积层和池化层的参数进行了微调，保证输入输出的图像尺度大小一致，同时丢弃了最后的全连接层)提取了丰富的低频特征，在最后进行特征融合。1. The present invention uses a dual-channel network, one network uses an improved residual structure to extract valuable high-frequency features-that is, high-level features, and one network uses an improved VGG (parameters of the vgg convolutional layer and pooling layer are used. fine-tuning to ensure that the input and output image scales are consistent, while discarding the last fully connected layer) to extract rich low-frequency features, and feature fusion at the end.

2.本发明在模型的特定位置(头尾各2个信息级联模块)使用了稠密连接的方式，将每一个卷积层的信息都进行了聚合达到了充分利用卷积层信息的目的，并且在最后使用了通道注意力机制将结合在一起的信息进行了通道权重的计算，而不是单纯的降通道。2. The present invention uses a dense connection method in a specific position of the model (two information cascade modules at the head and tail), and aggregates the information of each convolutional layer to achieve the purpose of making full use of the information of the convolutional layer, And at the end, the channel attention mechanism is used to calculate the channel weights of the combined information, instead of simply descending the channel.

3.本发明使用了空间注意力机制，在已有的通道注意力机制后加入空间注意力，使得全局信息的提取更为充分，特征的利用更为全面。同时在上采样之前，使用了非局部空洞卷积，将之前的结果信息又进行了一次全局依赖的特征提取，使输出结果联系更为紧密，特征信息更为丰富。3. The present invention uses a spatial attention mechanism, adding spatial attention after the existing channel attention mechanism, so that the extraction of global information is more sufficient, and the utilization of features is more comprehensive. At the same time, before upsampling, non-local hole convolution is used, and the previous result information is extracted with a global dependency, so that the output results are more closely related and the feature information is richer.

附图说明Description of drawings

图1为本发明的图像超分辨率重建模型的总体结构图；Fig. 1 is the overall structure diagram of the image super-resolution reconstruction model of the present invention;

图2为本发明的信息级联结构图；Fig. 2 is the information cascade structure diagram of the present invention;

图3为本发明的残差结构图；3 is a residual structure diagram of the present invention;

图4为本发明的通道注意力和空间注意力结构图；Fig. 4 is the channel attention and spatial attention structure diagram of the present invention;

图5为本发明的非局部空洞卷积图。FIG. 5 is a non-local hole convolution diagram of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

一种基于注意力机制和双通道网络的图像超分辨率重建方法，该方法包括：实时获取待检测图像，对待检测图像进行预处理；将预处理后的图像输入到训练好的图像超分辨率重建模型中，得到高清重建图；采用峰值信噪比和结构相似性对重建图进行评价，根据评价结果对高清重建图进行标记；图像超分辨率重建模型基础为卷积神经网络。An image super-resolution reconstruction method based on an attention mechanism and a dual-channel network, the method comprises: acquiring an image to be detected in real time, preprocessing the to-be-detected image; inputting the preprocessed image into the trained image super-resolution In the reconstruction model, a high-definition reconstruction image is obtained; the peak signal-to-noise ratio and structural similarity are used to evaluate the reconstructed image, and the high-definition reconstructed image is marked according to the evaluation results; the image super-resolution reconstruction model is based on a convolutional neural network.

一种图像超分辨率重建模型结构，如图1所示，该结构包括深层特征通道、浅层特征通道、上采样层以及第三卷积层；所述深层特征通道包括第一卷积层、信息级联模块、改进的残差模块以及非局部空洞卷积块；输入的图像经过第一卷积层处理后依次通过信息级联模块、改进的残差模块和非局部空洞卷积块中进行处理，得到深层特征图；所述浅层特征通道包括第二卷积层和改进的VGG网络，输入的图像经过第二卷积层的处理后通过改进的VGG网络处理，得到浅层特征图；将深层特征图和浅层特征图进行融合，并采用上采样层对融合图像进行上采样，并采用第三卷积层对上采样图像进行卷积操作，得到高清重建图。An image super-resolution reconstruction model structure, as shown in Figure 1, the structure includes a deep feature channel, a shallow feature channel, an upsampling layer and a third convolution layer; the deep feature channel includes a first convolution layer, The information cascade module, the improved residual module and the non-local hole convolution block; the input image is processed by the first convolution layer through the information cascade module, the improved residual module and the non-local hole convolution block in turn. processing to obtain a deep feature map; the shallow feature channel includes a second convolution layer and an improved VGG network, and the input image is processed by the second convolution layer and processed by the improved VGG network to obtain a shallow feature map; The deep feature map and the shallow feature map are fused, and the upsampling layer is used to upsample the fused image, and the third convolutional layer is used to convolve the upsampled image to obtain a high-definition reconstructed image.

可选的，在深层特征通道中包含n个信息级联模块和m个改进的残差模块，其中所有的信息级联模块串联，得到信息级联模块组；所有的改进的残差模块串联，得到改进的残差模块组。Optionally, the deep feature channel includes n information cascade modules and m improved residual modules, wherein all the information cascade modules are connected in series to obtain an information cascade module group; all the improved residual modules are connected in series, An improved set of residual modules.

优选的，在深层特征通道中包含2n信息级联模块，将其中的n信息级联模块进行串联，得到第一信息级联模块组，将其余的n个信息级联模块进行串联，得到第二信息级联模块组；所述第一信息级联模块组和所述第二信息级联模块组分别设置在改进的残差模块组的输入端和输出端。Preferably, 2n information cascade modules are included in the deep feature channel, n information cascade modules are connected in series to obtain a first information cascade module group, and the remaining n information cascade modules are connected in series to obtain a second information cascade module group. an information cascading module group; the first information cascading module group and the second information cascading module group are respectively arranged at the input end and the output end of the improved residual module group.

数据集采用的是DIV2K数据集，其中八百张高清(HR)图片，以及和其对应的经过降解模型(双三次插值降解)的低分变率(LR)图片作为训练集，使用五张图片作为验证集。使用Set5，Set14，Urban100，Manga109，BSD100五个数据集作为测试集，这几个测试数据集的特点就是纹理信息非常丰富，降解后的低分辨率图片会丢失大部分的纹理信息，非常考验图像超分辨率重建的精度。评价指标为传统的PSNR和SSIM，其中，PSNR表示峰值信噪比，SSIM表示结构相似性。The data set is the DIV2K data set, in which 800 high-definition (HR) pictures and their corresponding low-resolution (LR) pictures that have undergone degradation model (bicubic interpolation degradation) are used as the training set, and five pictures are used. as a validation set. Use Set5, Set14, Urban100, Manga109, BSD100 five datasets as test sets. The characteristics of these test datasets are that the texture information is very rich, and the degraded low-resolution images will lose most of the texture information, which is a very test of the image. Accuracy of super-resolution reconstructions. The evaluation indicators are traditional PSNR and SSIM, where PSNR represents peak signal-to-noise ratio and SSIM represents structural similarity.

训练集中所有数据在神经网络中都进行了一次正向传播和一次反向传播被称为一轮，每一轮都会对模型的参数进行更新，最大轮数被设置为1000轮。我们设置每200轮迭代更新一次学习率，在训练模型的1000轮迭代过程中，在测试数据集上取得最好效果的模型及其参数被保存起来。All the data in the training set has undergone one forward propagation and one backpropagation in the neural network, which is called one round. Each round will update the parameters of the model, and the maximum number of rounds is set to 1000 rounds. We set the learning rate to be updated every 200 epochs, and during the 1000 epochs of training the model, the model and its parameters that performed best on the test dataset were saved.

在原始高清图片数据集中采用双三次插值降解模型对数据集中的图片进行缩放的倍数为2倍、3倍、4倍和8倍。降解模型的公式为：In the original high-definition picture data set, the bicubic interpolation degradation model is used to scale the pictures in the data set by 2 times, 3 times, 4 times and 8 times. The formula for the degradation model is:

I^LR＝H_dnI^HR+nI ^LR =H _dn I ^HR +n

其中，I^LR表示低分辨率图像，H_dn表示降解模型，I^HR表示原始高分辨率图像，n表示额外的噪音。where I ^LR represents the low-resolution image, H _dn represents the degradation model, I ^HR represents the original high-resolution image, and n represents the additional noise.

对缩放后的数据集进行预处理的过程包括对图像进行增强处理，包括对图像进行平移，水平和竖直方向的翻转处理；将增强后的数据进行分割为不同的小图像块，将分割后的图像进行集合，得到训练数据集。The process of preprocessing the scaled data set includes enhancing the image, including translating the image, flipping the image horizontally and vertically; dividing the enhanced data into different small image blocks, and dividing the The images are collected to obtain the training data set.

如图2所示，信息级联模块的结构包括：堆叠10次以下结构-依次是三层卷积神经网络、特征通道合并层、通道注意力层和通道数变换层。该模块处理图像数据的过程包括：首先使用各层卷积神经网络依次对输入图像进行特征信息的提取，之后将每层卷积提取的特征信息在特征通道合并层上合并，使用通道注意力机制对合并的信息进行重要性区分，最后将通道数降为输入通道数大小，重复上述步骤10次，得到聚合卷积层的层级特征信息。As shown in Figure 2, the structure of the information cascade module includes: stacking the structure below 10 times - followed by a three-layer convolutional neural network, a feature channel merging layer, a channel attention layer, and a channel number transformation layer. The process of processing image data by this module includes: firstly extracting feature information of the input image by using each layer of convolutional neural network in turn, then combining the feature information extracted by each layer of convolution on the feature channel combining layer, using the channel attention mechanism The importance of the merged information is distinguished, and finally the number of channels is reduced to the number of input channels, and the above steps are repeated 10 times to obtain the hierarchical feature information of the aggregated convolutional layer.

使用信息级联模块进行图像信息的聚合，充分的保留每层卷积层的信息，因为图像刚刚进入卷积神经网络，低频信息充足且丰富，但是随着网络层数的加深，已经更注重于更抽象的特征，许多边缘纹理信息和平滑信息被逐渐丢失，所以此时使用信息级联模块能够很好地捕获更多的低频信息融合到模型中。Use the information cascade module to aggregate image information, fully retain the information of each convolutional layer, because the image has just entered the convolutional neural network, the low-frequency information is sufficient and abundant, but with the deepening of the network layer, more attention has been paid to it. For more abstract features, many edge texture information and smoothing information are gradually lost, so using the information cascade module at this time can well capture more low-frequency information and fuse it into the model.

F_IC＝H_IC(I^LR)F _IC = H _IC (I ^LR )

其中，I^LR表示低分辨率输入图像，H_IC表示级联模块的卷积操作，F_IC表示卷积计算的的结果。Among them, I ^LR represents the low-resolution input image, H _IC represents the convolution operation of the cascade module, and F _IC represents the result of the convolution calculation.

如图3所示，改进的残差模块的结构包括：残差网络结构、通道注意力机制层和空间注意力机制层，残差网络结构包括卷积神经网络层、非线性激活层和卷积神经网络层；该模块处理图像数据的过程包括：将层级特征信息输入到残差网络结构中提取特征信息，将提取到的特征信息使用通道注意力机制获取通道上的关联性，再往下传递，使用空间注意力机制获取全局空间上的依赖性。As shown in Figure 3, the structure of the improved residual module includes: residual network structure, channel attention mechanism layer and spatial attention mechanism layer. The residual network structure includes convolutional neural network layer, nonlinear activation layer and convolutional layer. Neural network layer; the process of processing image data by this module includes: inputting hierarchical feature information into the residual network structure to extract feature information, using the extracted feature information to use the channel attention mechanism to obtain the correlation on the channel, and then passing it down , using the spatial attention mechanism to obtain dependencies on the global space.

将级联模块的输出当做改进的残差模块的输入，在每一个ResNetBlock后都接入通道注意力机制和空间注意力机制，同时捕获通道上的关联性和全局空间上的依赖信息，并融入到卷积神经网络中，丰富特征信息，为深层网络的训练提供稳定性。The output of the cascade module is used as the input of the improved residual module. After each ResNetBlock, the channel attention mechanism and the spatial attention mechanism are connected, and the correlation information on the channel and the dependency information on the global space are captured at the same time, and integrated into In the convolutional neural network, the feature information is enriched to provide stability for the training of the deep network.

F_RBC＝H_RBC(F_IC)F _RBC = H _RBC (F _IC )

F_CA＝H_CA(F_RBC)F _CA = H _CA (F _RBC )

F_SA＝H_SA(F_CA)F _SA = H _SA (F _CA )

其中，H_RBC表示具有输入源连接的残差块结构卷积操作，即将输入信息与经过残差块的输出进行融合。F_RBC表示经过残差块的输出特征信息，其内可表示为[f₁，f₂，f₃…f_n，]分别代表每个卷积核计算出来的通道特征。之后对每个通道特征使用通道注意力机制，获取每个通道的权重值与原输入数据乘积，即H_CA表示通道注意力的卷积操作，F_CA表示经过通道注意力后的特征信息。然后，使用空间注意力机制对输出特性信息计算全局依赖信息，与原输入数据融合，即H_SA表示空间注意力机制的卷积操作，F_SA表示经过空间注意力后的特征信息。Among them, H _RBC represents the residual block structure convolution operation with input source connection, that is, the input information is fused with the output of the residual block. F _RBC represents the output feature information of the residual block, which can be expressed as [ _f ₁ , f ₂ , f ₃ . Then, the channel attention mechanism is used for each channel feature to obtain the product of the weight value of each channel and the original input data, that is, H _CA represents the convolution operation of channel attention, and F _CA represents the feature information after channel attention. Then, the spatial attention mechanism is used to calculate the global dependency information for the output feature information, which is fused with the original input data, that is, _HSA represents the convolution operation of the spatial attention mechanism, and _FSA represents the feature information after spatial attention.

如图4所示，所述通道注意力结构包括：该结构组成依次为，一个全局平局池化、一个1x1卷积层、一个非线性激活层、一个1x1卷积层。该模块处理图像数据的过程包括：首先通过全局平均池化获取每个通道的权重表示符号，然后使用1x1卷积层降低通道数，之后使用非线性激活层引入非线性信息，再使用1x1卷积层将通道数变换回来，最后乘回原输入特征信息获取特征通道上的关联性。空间注意力结构包括：该结构组成依次为，一个1x1卷积层、一个softmax激活层、一个1x1卷积层、一个非线性激活层。该模块处理图像数据的过程包括：首先将输入特征信息CxHxW经过1x1卷积层转换为HWx1x1的全局特征图，然后使用softmax函数对该全局特征图进行归一化约束，之后乘回原输入信息，最后经过一层1x1卷积层和一层非线性激活层获取全局空间上的依赖信息。As shown in Figure 4, the channel attention structure includes: the structure is composed of a global draw pooling, a 1x1 convolutional layer, a nonlinear activation layer, and a 1x1 convolutional layer in order. The process of processing image data by this module includes: first obtain the weight representation symbol of each channel through global average pooling, then use a 1x1 convolution layer to reduce the number of channels, then use a nonlinear activation layer to introduce nonlinear information, and then use a 1x1 convolution layer The layer transforms the number of channels back, and finally multiplies back the original input feature information to obtain the correlation on the feature channels. The spatial attention structure includes: the structure consists of a 1x1 convolutional layer, a softmax activation layer, a 1x1 convolutional layer, and a nonlinear activation layer. The process of processing image data by this module includes: firstly converting the input feature information CxHxW into a global feature map of HWx1x1 through a 1x1 convolutional layer, then using the softmax function to normalize the global feature map, and then multiplying it back to the original input information, Finally, the dependency information in the global space is obtained through a 1x1 convolutional layer and a non-linear activation layer.

如图5所示，非局部空洞卷积块的结构包括：该结构组成为，四层并行的膨胀参数分别为1、2、4、6的空洞卷积层，以及三层普通卷积神经网络层。该模块处理图像数据的过程包括：首先特征信息会同时使用四种不同膨胀参数的空洞卷积和两种普通卷积神经网络分别提取特征信息，然后一方面将四种空洞卷积获取的特征信息在特征通道上进行融合，另一方面普通卷积神经网络提取的特征信息则按像像素矩阵的值进行融合，最后将这两种融合的特征信息相加，获取全局的特征信息。As shown in Figure 5, the structure of the non-local atrous convolution block includes: the structure consists of four parallel atrous convolution layers with dilation parameters of 1, 2, 4, and 6, and a three-layer ordinary convolutional neural network Floor. The process of processing image data by this module includes: first, the feature information will use four kinds of dilated convolutions with different dilation parameters and two kinds of ordinary convolutional neural networks to extract the feature information respectively, and then on the one hand, the feature information obtained by the four kinds of dilated convolution will be used. Fusion is performed on the feature channel. On the other hand, the feature information extracted by the ordinary convolutional neural network is fused according to the value of the pixel matrix, and finally the two fused feature information is added to obtain the global feature information.

改进的VGG网络结构包括：10层普通卷积层和3层池化层，将各个池化层嵌入到普通卷积层中，得到VGG网络结构；该模块处理图像数据的过程包括：首先使用2层卷积和一层池化提取64个通道特征信息，之后使用使用2层卷积和一层池化提取128个通道特征信息，然后使用3层卷积和一层池化提取512个通道特征信息，最后再使用3层卷积提取512通道的信息后还原为64通道；其中，池化层使用padding保持特征尺度不变。The improved VGG network structure includes: 10 layers of ordinary convolution layers and 3 layers of pooling layers, and each pooling layer is embedded into the ordinary convolution layer to obtain the VGG network structure; the process of processing image data by this module includes: first use 2 Layer convolution and one layer pooling to extract 64 channel feature information, then use 2 layers of convolution and one layer of pooling to extract 128 channel feature information, and then use 3 layers of convolution and one layer of pooling to extract 512 channel features Finally, 3 layers of convolution are used to extract the information of 512 channels and restore it to 64 channels; among them, the pooling layer uses padding to keep the feature scale unchanged.

使用非局部空洞卷积进行全局特征的提取，并将提取完的特征信息进行上采样，扩大为我们需要的尺寸输出结果。空洞卷积通过设置膨胀率可以在不增加参数的情况下扩大感受野，将其嵌入至非局部卷积中，可以显著的降低计算量，同时也可以从不同尺度上获取全局信息，特征的提取更为全面。Use non-local hole convolution to extract global features, and upsample the extracted feature information to expand the output result to the size we need. The atrous convolution can expand the receptive field without increasing the parameters by setting the expansion rate. Embedding it into the non-local convolution can significantly reduce the amount of calculation. At the same time, it can also obtain global information from different scales and extract features. more comprehensive.

F_NLHC＝H_NLHC(F_SA)F _NLHC = H _NLHC (F _SA )

其中，H_NLHC表示非局部空洞卷积的卷积操作，F_NLHC表示经过非局部空洞卷积后获取的特征信息。我们将最后的特征信息经过上采样之后，输出为对应的高清重建图像，即重建图像的公式为：Among them, H _NLHC represents the convolution operation of non-local hole convolution, and F _NLHC represents the feature information obtained after non-local hole convolution. After upsampling the final feature information, we output the corresponding high-definition reconstructed image, that is, the formula for the reconstructed image is:

F_Up＝H_Up(F_NLHC)F _Up = H _Up (F _NLHC )

其中H_Up表示上采样的卷积操作，F_Up表示上采样的输出特征。where H _Up represents the up-sampled convolution operation, and F _Up represents the up-sampled output feature.

图像超分辨率重建模型的损失函数表达式为：The loss function expression of the image super-resolution reconstruction model is:

其中，θ表示模型的参数量，C_HR表示超分辨率计算方程，

和

分别表示第i张低分辨率图像和第i张对应的高分辨率图像，N表示数据集中图像的数量，HR表示高分辨率，LR表示低分辨率。Among them, θ represents the parameter quantity of the model, C _HR represents the super-resolution calculation equation,

and

represent the ith low-resolution image and the ith corresponding high-resolution image, respectively, N represents the number of images in the dataset, HR represents high resolution, and LR represents low resolution.

超分辨率计算方程的表达式为：The expression of the super-resolution calculation equation is:

C_HR＝F_UP(F_NLHC(F_SA(F_CA(F_RBC(F_IC(I^LR))))))C _HR = F _UP (F _NLHC (F _SA (F _CA (F _RBC (F _IC (I ^LR )))))))

其中，F_UP表示上采样后的输出信息，F_NLHC表示非局部空洞卷积提取的信息，F_SA表示空间注意力机制提取的信息，F_CA表示通道注意力机制提取的信息，F_RBC表示残差块提取的信息，F_IC表示级联模块输出的信息。Among them, F _UP represents the output information after up-sampling, F _NLHC represents the information extracted by the non-local hole convolution, F _SA represents the information extracted by the spatial attention mechanism, F _CA represents the information extracted by the channel attention mechanism, and F _RBC represents the residual information. The information extracted by the difference block, F _IC represents the information output by the cascade module.

采用峰值信噪比(PSNR)和结构相似性(SSIM)作为结果评价指标：Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) were used as results evaluation metrics:

其中，MSE表示均方误差，MAX表示像素值中的最大值，μ_X和μ_Y表示图像X、图像Y的像素的均值，σ_X和σ_Y表示图像X、图像Y的像素的标准值，σ_XY表示图像X和图像Y的协方差。Among them, MSE represents the mean square error, MAX represents the maximum value of the pixel values, μ _X and μ _Y represent the average value of the pixels of image X and image Y, σ _X and σ _Y represent the standard value of the pixels of image X and image Y, _σXY represents the covariance of image X and image Y.

以上所举实施例，对本发明的目的、技术方案和优点进行了进一步的详细说明，所应理解的是，以上所举实施例仅为本发明的优选实施方式而已，并不用以限制本发明，凡在本发明的精神和原则之内对本发明所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above-mentioned embodiments further describe the purpose, technical solutions and advantages of the present invention in detail. It should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made to the present invention within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. an image super-resolution reconstruction method based on attention mechanism and dual-channel network, is characterized in that, comprises: obtains the image to be detected in real time, and carries out preprocessing to the image to be detected; In the image super-resolution reconstruction model, a high-definition reconstructed image is obtained; the peak signal-to-noise ratio and structural similarity are used to evaluate the reconstructed image, and the high-definition reconstructed image is marked according to the evaluation results;

The process of training an image super-resolution reconstruction model includes:

S1: Obtain the original high-definition picture data set, and use the bicubic interpolation degradation model to scale the pictures in the data set;

S2: Preprocess the scaled data set to obtain a training data set;

S3: Input each image data in the training dataset into the shallow feature channel and the deep feature channel in the image super-resolution reconstruction model for feature extraction;

S4: Use the first convolution layer to extract the initial features of the input image; input the initial features into the information cascade module to aggregate the hierarchical feature information of the convolution layer;

S5: Input the hierarchical feature information aggregated by the information cascade module into the improved residual module to obtain the correlation on the channel and the dependency information on the global space;

S6: Use the non-local hole convolution block to extract global features from the dependency information to obtain the final deep feature map;

S7: use the second convolution layer to extract the initial features of the input image; input the initial features into the improved VGG network, extract the shallow features of the image, and obtain a shallow feature map;

S8: fuse the deep feature map and the shallow feature map, and upsample the fused feature map to obtain a high-definition reconstruction map;

S9: Use the loss function to constrain the difference between the high-definition reconstructed image and the original high-definition image, and continuously adjust the parameters of the model until the model converges, and the training of the model is completed.

2. a kind of image super-resolution reconstruction method based on attention mechanism and dual-channel network according to claim 1, is characterized in that, the multiple that adopts bicubic interpolation degradation model to zoom the picture in the data set is 2 times, 3x, 4x and 8x.

3. a kind of image super-resolution reconstruction method based on attention mechanism and dual-channel network according to claim 1, is characterized in that, the formula of bicubic interpolation degradation model is:

I ^LR =H _dn I ^HR +n

where I ^LR represents the low-resolution image, H _dn represents the degradation model, I ^HR represents the original high-resolution image, and n represents the additional noise.

4. An image super-resolution reconstruction method based on an attention mechanism and a dual-channel network according to claim 1, wherein the process of preprocessing the scaled data set comprises performing enhancement processing on the image, comprising: Translate the image and flip it in horizontal and vertical directions; divide the enhanced data into different small image blocks, and collect the divided images to obtain a training data set.

5. An image super-resolution reconstruction method based on an attention mechanism and a dual-channel network according to claim 1, wherein the information cascade module comprises a feature aggregation structure stacked 10 times; the feature aggregation structure comprises at least three layers Convolutional neural network, feature channel merging layer, channel attention layer and channel number transformation layer, each layer of convolutional neural network is connected in turn, and the output branch of each layer of convolutional neural network except the last layer of convolutional neural network The feature channel merging layer is connected, and the feature channel merging layer, the channel attention layer and the channel number transformation layer are connected in turn to form an information cascade module; the process of processing image data by this module includes: first, use each layer of convolutional neural network to sequentially analyze the input image. Extract feature information, then combine the feature information extracted by each layer of convolution on the feature channel combining layer, use the channel attention mechanism to distinguish the importance of the combined information, and finally reduce the number of channels to the number of input channels, repeat The above steps are repeated 10 times to obtain the hierarchical feature information of the aggregated convolutional layer.

6. An image super-resolution reconstruction method based on an attention mechanism and a dual-channel network according to claim 1, wherein the improved residual module comprises: a residual network structure, a channel attention mechanism layer and a space Attention mechanism layer, residual network structure includes convolutional neural network layer, nonlinear activation layer and convolutional neural network layer; the process of processing image data by this module includes: inputting hierarchical feature information into the residual network structure to extract feature information , the extracted feature information uses the channel attention mechanism to obtain the correlation on the channel, and then passes it down, and uses the spatial attention mechanism to obtain the dependence on the global space.

7. An image super-resolution reconstruction method based on an attention mechanism and a dual-channel network according to claim 1, wherein the non-local hole convolution block comprises: four parallel dilation parameters are 1 and 2, respectively. , 4 and 6 atrous convolutional layers and three layers of ordinary convolutional neural network layers; the process of processing image data by this module includes: first, atrous convolution with four different dilation parameters and two ordinary convolutional neural networks are used to improve the Then, the feature information obtained by the four kinds of hole convolutions is fused on the feature channel, and the feature information extracted by the ordinary convolutional neural network is fused according to the value of the pixel matrix; finally The two kinds of fused feature information are added to obtain the global feature information.

8. An image super-resolution reconstruction method based on an attention mechanism and a dual-channel network according to claim 1, wherein the improved VGG network structure comprises: 10 layers of ordinary convolution layers and 3 layers of pooling layers , Embed each pooling layer into the ordinary convolutional layer to obtain the VGG network structure; the process of processing image data by this module includes: firstly using 2 layers of convolution and one layer of pooling to extract 64 channel feature information, and then using 2 layers Convolution and one-layer pooling extract 128 channel feature information, then use 3-layer convolution and one-layer pooling to extract 512 channel feature information, and finally use 3-layer convolution to extract 512-channel information and restore it to 64 channels; Among them, the pooling layer uses padding to keep the feature scale unchanged.

9. a kind of image super-resolution reconstruction method based on attention mechanism and dual-channel network according to claim 1, is characterized in that, the loss function expression of image super-resolution reconstruction model is:

Among them, θ represents the parameter quantity of the model, C _HR represents the super-resolution calculation equation,

and

10. A kind of image super-resolution reconstruction method based on attention mechanism and dual-channel network according to claim 1, is characterized in that, adopts peak signal-to-noise ratio and structural similarity to evaluate the formula of reconstructed image:

where PSNR represents the peak signal-to-noise ratio, MSE represents the mean square error, MAX represents the maximum value among the pixel values, SSIM represents the structural similarity, μX and μY represent the mean of the pixels of image _X and image _Y , respectively, _σX and σ _Y represents the standard value of the pixels of image X and image Y, respectively, and σ _XY represents the covariance of image X and image Y.