CN117474764B - High-resolution reconstruction method for remote sensing image under complex degradation model - Google Patents
High-resolution reconstruction method for remote sensing image under complex degradation model Download PDFInfo
- Publication number
- CN117474764B CN117474764B CN202311819893.8A CN202311819893A CN117474764B CN 117474764 B CN117474764 B CN 117474764B CN 202311819893 A CN202311819893 A CN 202311819893A CN 117474764 B CN117474764 B CN 117474764B
- Authority
- CN
- China
- Prior art keywords
- remote sensing
- resolution
- reconstruction
- model
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000006731 degradation reaction Methods 0.000 title claims abstract description 51
- 230000015556 catabolic process Effects 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 41
- 239000000284 extract Substances 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- 238000013461 design Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 238000011282 treatment Methods 0.000 claims description 2
- 230000003321 amplification Effects 0.000 description 9
- 238000003199 nucleic acid amplification method Methods 0.000 description 9
- 230000006835 compression Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 6
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 4
- 230000008447 perception Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 230000002860 competitive effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 description 2
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003245 coal Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 239000003345 natural gas Substances 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/86—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using syntactic or structural representations of the image or video pattern, e.g. symbolic string recognition; using graph matching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a high-resolution reconstruction method for a remote sensing image under a complex degradation model, and belongs to the field of high-resolution reconstruction. The method introduces an MAE self-supervision model into the field of high-resolution reconstruction, and the prior information of the remote sensing image is learned by the MAE in the first stage. And in the second stage, a reconstruction network is provided, and the reconstruction network utilizes the prior information learned by the MAE to complete the high-resolution reconstruction task of the low-resolution remote sensing image. Meanwhile, an edge attention module is provided in the reconstruction network, the edge attention module can extract gradient information of the feature map, and larger learning weight is given to edge positions with larger gradients, so that the reconstruction network focuses on the edge positions. The superiority of the reconstruction model in the high-resolution reconstruction task is proved on a public data set, the model obtains good objective index results on most degradation models, the visually reconstructed image has clear edges, contains more abundant detail information, and has wide application prospect in the field of high-resolution reconstruction.
Description
Technical Field
The invention belongs to the field of high-resolution reconstruction, and particularly relates to a high-resolution reconstruction method for a remote sensing image under a complex degradation model.
Background
The remote sensing image contains abundant ground object targets, has large scale and wide range, has abundant detail information and perception information, and can effectively perform scene perception and environment analysis. Therefore, the remote sensing image is widely applied in a plurality of fields:
in the disaster monitoring field, related personnel can utilize remote sensing images to monitor the area in a large scale for a long time, so that the occurrence of disasters is effectively prevented, and property loss and casualties caused by the disasters are reduced. After the disaster occurs, the remote sensing image can also provide disaster area information in real time and continuously, so that rescue workers can know the disaster progress and grade, and the rescue workers can be guided to carry out disaster relief activities.
In the field of resource exploration, spatial information in a remote sensing image can help people to determine various geological structures so as to infer position information of coal resources, and in addition, because different substances have different properties such as absorption, reflection and the like of light, the spectrum information in the remote sensing image can be utilized to explore resources such as natural gas, petroleum and the like.
In the field of land monitoring, timely mastering of the change condition of land area, utilization type and other information is important for land management, reasonable land resource utilization and farmland red line conservation. The space-time information of the remote sensing image is comprehensively utilized, so that the dynamic monitoring of the land can be realized, and the requirements of modern land management are met. The remote sensing image from the public Dataset AID Dataset is shown in fig. 1.
However, in the process of imaging the remote sensing image, the acquired remote sensing image may have the problems of low spatial resolution and blurred ground feature edge details once due to the limitation of objective conditions of the imaging equipment. At the same time, the data transmission process is limited by network bandwidth and transmission time, so that image compression processing is often required, and image information is lost. Therefore, the finally obtained remote sensing image lacks part of high-frequency information, has low spatial resolution and is influenced by noise and blurring. Such images are difficult to meet the demands of practical applications. If the remote sensing image quality is improved from hardware, the requirements on equipment are strict, the cost is high, and the difficulty is high, so researchers aim to solve the problem with low cost through software. The single remote sensing image super-Resolution technology (Single Remote Sensing Image Super Resolution, SRSISR) refers to a process of inputting a Low Resolution (LR) remote sensing image and reconstructing the Low Resolution (HR) remote sensing image by using an image processing algorithm model. The SRSISR technology can enhance the high-frequency information of the remote sensing image, improve the spatial resolution, eliminate noise and blur, improve the image quality and enable the remote sensing image to be better applied to actual scenes.
Conventional SRSISR techniques fall into three categories: (1) an interpolation-based algorithm; (2) a modeling-based algorithm; (3) algorithms based on shallow learning. The interpolation-based algorithm estimates the current unknown pixel value through the known pixel values around the pixel to be inserted, the algorithm is simple to realize and high in running speed, but the interpolation cannot recover the high-frequency information of the defect of the degraded image, and the reconstructed image is smooth and fuzzy. The modeling-based algorithm utilizes priori knowledge to combine with a mathematical model to reconstruct a high-resolution image, and can be divided into a frequency domain method, a space domain method and a frequency domain space domain combination method. The algorithm utilizes the priori knowledge of the image to restrict the reconstruction process, relieves the blurring effect brought by the interpolation algorithm to a certain extent, but is not suitable for high-resolution reconstruction with larger amplification factor. The algorithm based on shallow learning collects a large number of high-low resolution image pairs to construct a learning library, and learns the mapping relation between the image pairs by using a learning model, wherein the representative algorithm comprises neighborhood embedding, manifold learning, sparse representation and the like. These algorithms are developed from traditional machine learning methods, and require artificial design features, so the reconstruction results depend on how good the design features are.
Deep learning has achieved a great deal of development in various fields in recent years. In 2016, dong Chaodeng, deep learning was applied to the field of single image high resolution reconstruction, and Super Resolution Convolutional Neural Networks (srccn) model was proposed. The SRCNN comprises a three-layer network structure, and can realize the end-to-end high-resolution reconstruction task. On objective indexes, SRCNs obtain results superior to those of the traditional algorithm, which indicates the feasibility and superiority of deep learning in the technical field of single image super-resolution. Li Jiang on the basis of the Super-Resolution Generative Adversarial Network (SRGAN) model, a dense residual block and a receptive field module are introduced into a generating network for feature extraction. The dense residual blocks are fused with the residual blocks and the dense blocks, and the gradient vanishing problem can be relieved on the basis of strengthening feature propagation. Lei proposes a Local-Global Combined Network (LGCNet) model. LGCNet first extracts image features using an L-layer convolution layer. The shallow convolved receptive field in the neural network is less focused on local information, and the deep convolved receptive field is more focused on global information. And then LGCNet fuses the results of shallow and deep convolution through a multi-bifurcation structure so as to fuse local information and global information of the remote sensing image, and better guide high-resolution reconstruction of the remote sensing image. The model obtains an LR remote sensing image through a fixed downsampling mode (double three downsampling is adopted in most cases) under the condition of the existing HR remote sensing image, and then supervised learning under a fixed degradation model is carried out by using paired HR-LR image pairs. Although the degradation process is simple and easy to realize, the degradation process of the remote sensing image in the actual scene is complex, and besides the reduction of the spatial resolution, the image can be influenced by blurring and noise. The performance of the above model is limited when applied in a practically complex scenario.
Disclosure of Invention
In order to solve the problem of image information loss in the transmission process of a remote sensing image, the invention provides a high-resolution reconstruction method for the remote sensing image under a complex degradation model, which comprises the following specific steps:
s1: acquiring a sample image dataset, wherein the dataset comprises M images of different scenes, randomly selecting the images, and dividing the dataset into a training set and a testing set;
s2: designing a degradation model, adding noise after blurring processing and downsampling of a high-resolution remote sensing image, and generating a final low-resolution remote sensing image;
s3: training a self-encoder model with a mask, and learning priori information of the low-resolution remote sensing image;
s4: building a new reconstruction network; the reconstruction network and the self-encoder model with the mask are trained simultaneously, and the prior information learned by the self-encoder model with the mask is utilized to reconstruct the low-resolution remote sensing image with high resolution; the reconstructed network structure is divided into three parts as a whole, and the three parts are as follows:
s41: the first part carries out shallow feature extraction; the shallow layer feature extraction module extracts multi-scale features by using convolution layers with convolution kernel sizes of 3, 5 and 7 respectively, then connects the three extracted features in channel dimension, reduces the number of channels by using the convolution layer with the convolution kernel size of 1 multiplied by 1, and fuses the multi-scale features; the shallow feature extraction module initially extracts multi-scale features in the remote sensing imageF 0 Represented by the following formula (2):F 0 =H SFE (I LR ) (2)
wherein the method comprises the steps ofH SFE A mapping function representing the shallow feature extraction,I LR representing the input low resolution remote sensing image,F 0 representing the extracted multi-scale feature features;
s42: the second part is used for deep feature extraction and consists of residual branches and feature branches;
the feature branch directly transmits the shallow features extracted from the first part to the rear of the deep feature extraction network;
the residual difference branch uses the structural design of a UNet model, and a convolution layer in the middle of the branch divides the branch into a front part and a rear part; the front part and the rear part are respectively cascaded with r basic blocks, the structures of the r basic blocks are mutually corresponding, each basic block of the front part is sequentially composed of a multi-scale receptive field attention module, a residual fusion block and a priori module, and each basic block of the rear part is sequentially composed of a convolution layer, the multi-scale receptive field attention module, the residual fusion block and the priori module; the multi-scale receptive field attention module and the residual fusion module are responsible for solving the characteristic learning sub-problem, and the prior module is responsible for solving the prior learning sub-problem;
the inputs of the previous base block are from a priori information and the output of the last base block, the output of the previous r base block being expressed as equation (3):(3)
wherein the method comprises the steps ofA mapping function representing the r-th basic block of the previous section; />Representing the output of the r-th basic block,representing the output of the previous base block of the r-th base block; the latter part is also cascaded with a plurality of basic blocks, the input of the basic block of the latter part is from the prior information and the output of the last basic block, and the output of the basic block corresponding to the former part is also from the output of the basic block corresponding to the former part, which are connected in the channel dimension and are subjected to feature fusion by utilizing a convolution layer; the output of the r-th basic block of the latter part is expressed as formula (4): />(4)
Wherein the method comprises the steps ofRepresenting the output of the r-th basic block of the following part,/->Representing the mapping function of the r-th basic block of the latter part,Concatrepresenting channel dimension join operations, ">Representing the output of the corresponding basic blocks of the previous part, wherein n is the total number of modules in the network, n=the first r basic blocks+1 convolution blocks of the middle part+the last r basic blocks;
s43: the third part is an up-sampling module, the up-sampling mode adopts sub-pixel convolution, the generation of artifacts in the reconstruction process is avoided, and the image containing abundant detail information is reconstructed.
The invention provides a high-resolution reconstruction model based on two-stage training, wherein the prior information of a remote sensing image is learned through an MAE self-supervision model in the first stage, the reconstruction network provided in the second stage completes the high-resolution reconstruction of the remote sensing image under the guidance of the prior information, and meanwhile, an edge attention module is designed, so that the edge of the reconstructed image is clearer. On the public data set, the designed reconstruction model has better results than other algorithm models against complex degradation models.
Drawings
Fig. 1 is a remote sensing image in the prior art.
Fig. 2 is a diagram of the overall framework of the present invention.
Fig. 3 is a diagram of a reconstructed network structure.
Fig. 4 is a block diagram of the SFE module.
Fig. 5 is a block diagram of the ISAB module.
Fig. 6 is a degradation model flow diagram.
Fig. 7 is a visual reconstruction result of bicubic downsampling into a degradation model.
FIG. 8 is a diagram ofAnd reconstructing results for visualization of the degradation model.
Description of the embodiments
The self-monitoring model MAE (Masked Auto-Encoders) is a model proposed by the end of 2021, and is mainly used for image restoration tasks. The image features learned by the MAE can be used for downstream visual tasks such as detection, classification and the like, the MAE is creatively introduced into the field of high-resolution reconstruction, a high-resolution reconstruction model based on two-stage training is provided, and the overall frame of the model is shown in figure 2. The following are specific examples:
s1: the public data set AID Dataset is downloaded. AID Dataset was published by university of science and technology and university of Wuhan in 2017, and contains 10000 sample images collected from Google Earth, and scenes including 30 types of scenes, including airports, lands, baseball fields, beach, bridges, centers, churches, and the like, each of which has about 200-420 images in size of 600X 600 pixels. The training set randomly selects 100 images from 30 types of scenes of the AID Dataset, the total number of images is 3000, and the test set randomly selects 15 images from the rest images of the same scene, the total number of images is 450.
S2: and (3) designing a degradation model, and adding noise after blurring and downsampling the high-Resolution remote sensing image to generate a final Low-Resolution (LR) remote sensing image, as shown in FIG. 6.
The degradation model designed by the algorithm aims at two application scenes. The first device for photographing the remote sensing image is fine, the photographed high-resolution image has high quality and is basically not affected by blurring, but in order to transfer the image back to the ground, downsampling and lossy compression are needed for the image, the size of a transmission file is reduced, and noise is introduced in the compression and transmission processes. At this time, the degradation model is simplified into a low-resolution remote sensing image obtained by adding noise after downsampling the high-resolution remote sensing image. The second application scene is that the shot high-resolution image is affected by blurring under the condition that the quality of the equipment for shooting the remote sensing image is poor. For the second application scene, three fuzzy modes are introduced for better generalization performance of the model: isotropic gaussian blur, anisotropic gaussian blur, and motion blur. Meanwhile, JEPG compression noise is added to the image, and the image degradation process is simulated.
S3: and (3) performing one-stage MAE model training, inputting the LR image into the MAE model, and enabling the encoder to learn prior information of the remote sensing image, namely learning the characteristics of the input low-resolution image. Separately training MAE, learning priori information of remote sensing image, calculating Euclidean distance between generated mask pixels and original mask pixels by MAE loss functionRepresented by the following formula (1):
(1)
wherein the method comprises the steps ofRepresenting the generation of mask pixels, ">Representing the original mask pixels, n is the total number of pixels of the image, the input image 224 x 224 of the method, where n=224 x 224. The Euclidean distance is used in mathematics to calculate the distance between two points, in computer vision tasks, the distance per pixel between 2 images is calculated, the difference between 2 images is measured, euclidean distance>The smaller the image the more alike.
S4: building a new reconstruction network; the reconstruction network and the MAE are trained simultaneously, and the high-resolution reconstruction is carried out on the low-resolution remote sensing image by using the prior information learned by the MAE. In order to solve the problem that the edge of a reconstructed image is not clear, an edge attention module is innovatively provided in a reconstruction network, and the edge position is given more learning weight by extracting gradient information. The second-stage reconstruction network structure is shown in fig. 3, and the second-stage reconstruction network model is divided into three parts as a whole:
s41: the first part performs shallow feature extraction (Shallow Feature Extraction, SFE), which mainly focuses on the extraction of low-level visual features, such as texture, color, shape, etc. The SFE module extracts multi-scale features by using convolution layers with convolution kernel sizes of 3, 5 and 7 respectively, the sizes of different convolution kernels represent receptive fields with different sizes, the sizes of the receptive fields are different, and the extracted image features have different scales, so that the extraction of the features of targets with different scales in the remote sensing image is facilitated. And then connecting the three extracted features in the channel dimension, reducing the number of channels by using a convolution layer with the convolution kernel size of 1 multiplied by 1, and fusing the multi-scale features. The SFE module preliminarily extracts multi-scale features in the remote sensing image, and the multi-scale features are represented as the following formula (2):(2)
wherein the method comprises the steps ofMapping function representing shallow feature extraction, +.>Low resolution remote sensing image representing input, +.>Representing the extracted features;
s42: the second part is used for deep feature extraction and consists of residual branches and feature branches, and the multi-level feature extraction of priori information and shallow features is realized through a deep learning model, so that higher-level semantic features can be learned, and the method has stronger expression and generalization capability. The feature branch directly transmits the shallow layer features extracted from the first part to the rear of the network, more information can be provided to flow to the deep layer part of the network, so that the network can be converged to an optimal solution more quickly in the training process, and thus, the residual branch only needs to learn a residual (or difference) part between input and output, the network convergence is accelerated, and the gradient vanishing problem is relieved;
residual branches refer to the structural design of the UNet model, and a convolution layer in the middle of each branch divides the branch into a front part and a rear part. The front part and the rear part are respectively cascaded with r basic blocks, the structures of the r basic blocks are mutually corresponding, and each basic Block sequentially consists of a multi-scale receptive field attention module (Muti-Scale Receptive Field Attention Block, MRFAB), a residual fusion Block (Residual Fusing Block, RFB) and a priori module (Prior Block, PB). Wherein MRFAB and RFB are responsible for solving the feature learning sub-problem, PB is responsible for solving the priori learning sub-problem;
in order to better extract the spatial information and edge information of the image, we have designed an improved attention module (Improving Spatial Attention Block, ISAB) in MRFAB;
ISAB has two branches, as shown in fig. 5, the first branch passes through a spatial attention module (SAB), it carries out mean pooling (AvgPool) and maximum pooling (MaxPool) on the input feature map in the spatial dimension, the size of the input feature map becomes that the pooled result is linked in the channel dimension, then each element of the matrix is changed into a probability value between 0 and 1 through a convolution layer and Sigmoid activation function, the probability value is multiplied by the spatial channel weight, the larger the multiplied result is to indicate that the information of the spatial position is more useful, the useful information can be reserved, and the useless information is restrained;
the second branch adopts an edge attention module, and convolves an input feature map with a Sobel operator extracted by an edge to extract gradient information of the feature map in the x direction and the y direction respectively, and then square sums the gradient information and then square opens the sum to obtain a gradient matrix of the feature map; the gradient matrix is changed into a weight matrix after passing through a convolution layer and a Sigmoid activation function to obtain weight parameters of edge positions, wherein the weight parameters of the spatial channels are as followsThe weighting parameter of the edge position is +.>. The ISAB performs weighted summation on the spatial attention and the edge attention and then outputs the weighted summation, so that the spatial position and the edge position on the feature map can be focused, the detail information can be reserved, and the experiment is carried out to set +.>。
The inputs of the previous base block are from a priori information and the output of the last base block, the output of the previous r base block being expressed as equation (3):
(3)
wherein the method comprises the steps ofMapping function representing the r-th basic block of the previous part,/->Representing the output of the r-th basic block,representing the output of the previous base block of the r-th base block; the latter part also concatenates multiple base blocks, each base block structurally adding one convolutional layer compared to the base blocks of the former part. The inputs of the base blocks of the latter part come not only from a priori information and the output of the last base block, but also from the outputs of the corresponding base blocks of the former part, which are concatenated in the channel dimension (Concat), and feature fusion is performed using a convolutional layer. Thus, the characteristics learned by the shallow network are fully utilized, and the loss of the characteristics is reduced. The output of the r-th basic block of the latter part is expressed as formula (4):
(4)
wherein the method comprises the steps ofRepresenting the output of the r-th basic block of the following part,/->Representing the mapping function of the r-th basic block of the latter part,Concatrepresenting channel dimension join operations, ">The output of the corresponding basic blocks of the front part is denoted, where n is the total number of modules in the network, n=r (the first r basic blocks) +1 (the convolution blocks of the middle part) +r (the last r basic blocks).
S43: the third part is an up-sampling module, so that the spatial resolution is improved. The up-sampling mode adopts sub-pixel convolution, so that the generation of artifacts in the reconstruction process is avoided, and an image containing abundant detail information is reconstructed.
The loss function of the up-sampling module adopts L1 loss, and calculates the absolute difference between the reconstructed image and the real high-resolution image, so that the training process of the model is more stable, and the model is represented as the following formula (5):(5)
wherein the method comprises the steps ofRepresenting reconstructed image +.>Representing a true high resolution image; the total loss function is expressed as formula (6):(6)。
the invention adopts Peak Signal-to-Noise Ratio (PSNR), structural similarity (Structure Similarity Index Measure, SSIM) and learning perception image similarity (Learned Perceptual Image Patch Similarity, LPIPS) as evaluation indexes to evaluate the performance of the model. PSNR is used to evaluate the quality of a reconstructed digital signal, the greater the PSNR, the better the quality of the reconstructed image. The SSIM is an index for measuring the similarity of two digital images, and the larger the SSIM is, the higher the similarity between the reconstructed image and the real image is. The LPIPS is used for measuring the perceived similarity between two images, and the smaller the LPIPS is, the larger the similarity between the reconstructed image and the real image is. And 3 evaluation indexes objectively and comprehensively evaluate the reconstruction performance of the model from the angles such as pixel distance, structural similarity, perception similarity and the like.
Four algorithm models Bicubic, EDSR, RCAN, DASR are selected to train on the same data set with the model provided by the invention, and test results are compared. Bicubic is a commonly used interpolation algorithm that can be used to improve the spatial resolution of an image. EDSR is a high-resolution reconstruction model proposed by the CVPR conference in 2017, and is a residual structure as a whole, a main branch is connected with a plurality of residual blocks in series, and the upsampling adopts a sub-pixel convolution mode, so that the performance of the model exceeds the most advanced algorithm at the time. RCAN is a high-resolution reconstruction model proposed in 2018, has a similar architecture to EDSR, and provides a residual attention block, wherein a channel attention module is connected in series on a residual branch of the residual block, and important features are focused, so that the performance of the model is improved. DASR is a high resolution reconstruction model proposed by the 2021 CVPR conference to address complex degradation processes.
In addition, nine degradation models with specific parameters are selected for testing, wherein Bicubic represents the degradation model which is only subjected to Bicubic downsampling, other treatments are added to the rest degradation models except for Bicubic downsampling, ISO represents isotropic Gaussian blur, ANI represents anisotropic Gaussian blur, motion represents Motion blur,represents fuzzy core variance (subscript indicates size),>the gaussian white noise variance (subscript indicates the size) is shown, and the reconstruction results when the five algorithm models face different degradation models are compared respectively. The nine degradation models are subjected to JPEG compression processing with a compression quality of 95 by default. The comparison of the reconstruction result and other algorithms when the amplification factors are 2 and 4 from the perspective of the evaluation index PSNR/SSIM/LPIPS is shown in the table 1 and the table 2:
table 1 amplification factor x 2 reconstruction results
Table 2 amplification factor x 4 reconstruction results
From the reconstruction result of objective index reaction, the reconstruction model based on the deep learning obtains the result superior to the Bicubic interpolation algorithm under various degradation models, and proves the superiority of the reconstruction model based on the deep learning.
When the amplification factor is 2, EDSR and RCAN models trained by taking Bicubic downsampling as a degradation model face to Bicubic downsampling only and have ISO/uFuzzy core and ANI-bearing _>The fuzzy kernel obtains results superior to other models when three degradation models with relatively weak degradation degrees are used, which shows that EDSR and RCAN models can better solve the problems of small amplification factors and high-resolution reconstruction with relatively weak degradation degrees. However, when the degree of degradation is enhanced, the degradation model is more complex, including motion blur, greater blur kernel variance, and noise addition, the performance of the EDSR and RCAN models is significantly degraded, for example, when the degradation model is increased from ISO_, the degradation model is improved from ISO_>Become ISO_ the->When EDSR's PSNR index is reduced by 3.09dB, RCAN is reduced by 3.17dB, and DASR is reduced by 2.65dB, the model provided by the invention is reduced by only 1.70dB. And the reconstruction result is also worse than the DASR model and the model proposed by the present invention. This demonstrates that the high resolution reconstruction model based on two-stage training is more able to cope with degradation models of high degree of degradation and complexity. The high-resolution reconstruction model provided by the method not only obtains competitive results when dealing with the degradation model with weak degradation degree, but also obtains results superior to other algorithms when dealing with the degradation model with high degradation degree and high complexity. When the amplification factor is 4, more detail information needs to be reconstructed at the moment, and reconstruction tasks are more difficult.
In conclusion, the high-resolution reconstruction model provided by the invention obtains competitive results when the amplification factor is small and the degradation model with relatively weak degradation degree is processed, and the high-resolution reconstruction model provided by the invention obtains the best results when the amplification factor is large and the degradation model with relatively strong degradation degree is processed, which shows that the high-resolution reconstruction model provided by the invention can reconstruct abundant detail information and has relatively high robustness to complex degradation models.
From the viewpoint of visual perception, fig. 7 shows the reconstruction results of each reconstruction model when the magnification factor is 4 and bicubic downsampling is the degradation model. The reconstructed image of the Bicubic algorithm is smooth and fuzzy, and the reconstruction detail is less. The images reconstructed by the EDSR model and the RCAN model contain more noise, and the DASR model and the reconstruction model provided by the invention have better visual effects. But compared with the DASR model, the image edge reconstructed by the model is clearer, and more detail information is reconstructed.
Figure 8 shows that the magnification factor is 4,is the reconstruction result of each reconstruction model when the model is degraded. It can be seen that the Bicubic algorithm, the EDSR model and the RCAN model have difficulty in eliminating noise in the low resolution image when the degradation model introduces noise, and noise in the reconstructed image is significant. Both DASR and the models presented herein have better noise removal capabilities, but DASR reconstructed images are smoother, losing some detailed information. The reconstructed image of the model provided by the invention can still keep clearer edges and recover more image features.
Claims (4)
1. A high-resolution reconstruction method for a remote sensing image under a complex degradation model is characterized by comprising the following steps:
s1: acquiring a sample image dataset, wherein the dataset comprises M images of different scenes, randomly selecting the images, and dividing the dataset into a training set and a testing set;
s2: designing a degradation model, adding noise after blurring processing and downsampling of a high-resolution remote sensing image, and generating a final low-resolution remote sensing image;
s3: training a self-encoder model with a mask, and learning priori information of the low-resolution remote sensing image;
s4: building a new reconstruction network; the reconstruction network and the self-encoder model with the mask are trained simultaneously, and the prior information learned by the self-encoder model with the mask is utilized to reconstruct the low-resolution remote sensing image with high resolution; the reconstructed network structure is divided into three parts as a whole, and the three parts are as follows:
s41: the first part carries out shallow feature extraction; the shallow layer feature extraction module extracts multi-scale features by using convolution layers with convolution kernel sizes of 3, 5 and 7 respectively, then connects the three extracted features in channel dimension, reduces the number of channels by using the convolution layer with the convolution kernel size of 1 multiplied by 1, and fuses the multi-scale features; the shallow feature extraction module initially extracts multi-scale features in the remote sensing imageF 0 Represented by the following formula (2):F 0 =H SFE (I LR ) (2)
wherein the method comprises the steps ofH SFE A mapping function representing the shallow feature extraction,I LR representing the input low resolution remote sensing image,F 0 representing the extracted multi-scale feature features;
s42: the second part is used for deep feature extraction and consists of residual branches and feature branches;
the feature branch directly transmits the shallow features extracted from the first part to the rear of the deep feature extraction network;
the residual difference branch uses the structural design of a UNet model, and a convolution layer in the middle of the branch divides the branch into a front part and a rear part; the front part and the rear part are respectively cascaded with r basic blocks, the structures of the r basic blocks are mutually corresponding, each basic block of the front part is sequentially composed of a multi-scale receptive field attention module, a residual fusion block and a priori module, and each basic block of the rear part is sequentially composed of a convolution layer, the multi-scale receptive field attention module, the residual fusion block and the priori module; the multi-scale receptive field attention module and the residual fusion module are responsible for solving the characteristic learning sub-problem, and the prior module is responsible for solving the prior learning sub-problem;
the inputs of the previous base block are from a priori information and the output of the last base block, the output of the previous r base block being expressed as equation (3):(3)
wherein the method comprises the steps ofA mapping function representing the r-th basic block of the previous section; />Representing the output of the r-th basic block,/v>Representing the output of the previous base block of the r-th base block; the latter part is also cascaded with a plurality of basic blocks, the input of the basic block of the latter part is from the prior information and the output of the last basic block, and the output of the basic block corresponding to the former part is also from the output of the basic block corresponding to the former part, which are connected in the channel dimension and are subjected to feature fusion by utilizing a convolution layer; the output of the r-th basic block of the latter part is expressed as formula (4): />(4)
Wherein the method comprises the steps ofRepresenting the output of the r-th basic block of the following part,/->Representing the mapping function of the r-th basic block of the latter part,Concatrepresenting channel dimension join operations, ">Representing the output of the corresponding basic blocks of the previous part, wherein n is the total number of modules in the network, n=the first r basic blocks+1 convolution blocks of the middle part+the last r basic blocks;
s43: the third part is an up-sampling module, the up-sampling mode adopts sub-pixel convolution, the generation of artifacts in the reconstruction process is avoided, and the image containing abundant detail information is reconstructed.
2. The high-resolution reconstruction method for remote sensing images under complex degradation models as set forth in claim 1, wherein the method comprises the steps ofThe loss function of the masked self-encoder model then uses euclidean distanceRepresented by the following formula (1):
(1)
wherein the method comprises the steps ofRepresenting the generation of mask pixels, ">Representing the original mask pixels, n is the total number of pixels of the image.
3. The high-resolution reconstruction method for remote sensing images under complex degradation models according to claim 2, wherein the loss function of the upsampling module uses L1 loss to calculate the absolute difference between the reconstructed image and the real high-resolution image, so that the training process of the model is more stable, and the expression is as shown in formula (5):(5)
wherein the method comprises the steps ofRepresenting reconstructed image +.>Representing a true high resolution image; the total loss function is expressed as formula (6):
(6)。
4. a high resolution reconstruction method for a remote sensing image under a complex degradation model according to claim 3, wherein an improved attention module is designed in the multiscale receptive field attention module;
the improved attention module is provided with two branches, wherein the first branch passes through the space attention module, and the improved attention module performs average value pooling and maximum value pooling on the input characteristic diagram in the space dimension, and the input characteristic diagram is [ ]B,C,H,W) The size is changed into%B,1,H,W) The pooled results are linked in the channel dimension, and then each element of the matrix is changed into a probability value between 0 and 1 through a convolution layer and a Sigmoid activation function, and the probability value is combined with the spatial channel weight parameterThe larger the multiplied result is, the more useful the information indicating the spatial position is;
the second branch adopts an edge attention module, and convolves an input feature map with a Sobel operator extracted by an edge to extract gradient information of the feature map in the x direction and the y direction respectively, and then square sums the gradient information and then square opens the sum to obtain a gradient matrix of the feature map; the gradient matrix is changed into a weight matrix after passing through a convolution layer and a Sigmoid activation function to obtain weight parameters of edge positions, wherein the weight parameters of the spatial channels are as followsThe weighting parameter of the edge position is +.>The method comprises the steps of carrying out a first treatment on the surface of the The improved attention module performs weighted summation on the spatial attention and the edge attention and outputs the weighted summation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311819893.8A CN117474764B (en) | 2023-12-27 | 2023-12-27 | High-resolution reconstruction method for remote sensing image under complex degradation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311819893.8A CN117474764B (en) | 2023-12-27 | 2023-12-27 | High-resolution reconstruction method for remote sensing image under complex degradation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117474764A CN117474764A (en) | 2024-01-30 |
CN117474764B true CN117474764B (en) | 2024-04-16 |
Family
ID=89626084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311819893.8A Active CN117474764B (en) | 2023-12-27 | 2023-12-27 | High-resolution reconstruction method for remote sensing image under complex degradation model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117474764B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118397130B (en) * | 2024-07-01 | 2024-08-30 | 山东第一医科大学附属肿瘤医院(山东省肿瘤防治研究院、山东省肿瘤医院) | CT image processing method for tumor radiotherapy effect |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110111251A (en) * | 2019-04-22 | 2019-08-09 | 电子科技大学 | A kind of combination depth supervision encodes certainly and perceives the image super-resolution rebuilding method of iterative backprojection |
CN115272066A (en) * | 2022-06-23 | 2022-11-01 | 昆明理工大学 | Image super-resolution reconstruction method based on detail information asymptotic restoration |
CN116385264A (en) * | 2023-03-30 | 2023-07-04 | 浙江大学 | Super-resolution remote sensing data reconstruction method |
CN116434347A (en) * | 2023-06-12 | 2023-07-14 | 中山大学 | Skeleton sequence identification method and system based on mask pattern self-encoder |
CN116664397A (en) * | 2023-04-19 | 2023-08-29 | 太原理工大学 | TransSR-Net structured image super-resolution reconstruction method |
-
2023
- 2023-12-27 CN CN202311819893.8A patent/CN117474764B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110111251A (en) * | 2019-04-22 | 2019-08-09 | 电子科技大学 | A kind of combination depth supervision encodes certainly and perceives the image super-resolution rebuilding method of iterative backprojection |
CN115272066A (en) * | 2022-06-23 | 2022-11-01 | 昆明理工大学 | Image super-resolution reconstruction method based on detail information asymptotic restoration |
CN116385264A (en) * | 2023-03-30 | 2023-07-04 | 浙江大学 | Super-resolution remote sensing data reconstruction method |
CN116664397A (en) * | 2023-04-19 | 2023-08-29 | 太原理工大学 | TransSR-Net structured image super-resolution reconstruction method |
CN116434347A (en) * | 2023-06-12 | 2023-07-14 | 中山大学 | Skeleton sequence identification method and system based on mask pattern self-encoder |
Non-Patent Citations (3)
Title |
---|
基于多路径残差网络交叉学习的图像超分辨率重建;郭锋锋;马璐;;攀枝花学院学报;20200315(02);48-54 * |
基于渐进式特征增强网络的超分辨率重建算法;杨勇;吴峥;张东阳;刘家祥;;信号处理;20200916(09);1598-1606 * |
结合感知边缘约束与多尺度融合网络的图像超分辨率重建方法;欧阳宁;韦羽;林乐平;;计算机应用;20200430(10);3041-3047 * |
Also Published As
Publication number | Publication date |
---|---|
CN117474764A (en) | 2024-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110570353B (en) | Super-resolution reconstruction method for generating single image of countermeasure network by dense connection | |
CN112507997B (en) | Face super-resolution system based on multi-scale convolution and receptive field feature fusion | |
CN111105352B (en) | Super-resolution image reconstruction method, system, computer equipment and storage medium | |
Huang et al. | Deep hyperspectral image fusion network with iterative spatio-spectral regularization | |
CN109741256A (en) | Image super-resolution rebuilding method based on rarefaction representation and deep learning | |
CN110232653A (en) | The quick light-duty intensive residual error network of super-resolution rebuilding | |
CN111062872A (en) | Image super-resolution reconstruction method and system based on edge detection | |
CN111598778B (en) | Super-resolution reconstruction method for insulator image | |
CN111080567A (en) | Remote sensing image fusion method and system based on multi-scale dynamic convolution neural network | |
CN109509160A (en) | Hierarchical remote sensing image fusion method utilizing layer-by-layer iteration super-resolution | |
CN115147271A (en) | Multi-view information attention interaction network for light field super-resolution | |
CN110223234A (en) | Depth residual error network image super resolution ratio reconstruction method based on cascade shrinkage expansion | |
Yu et al. | E-DBPN: Enhanced deep back-projection networks for remote sensing scene image superresolution | |
Yang et al. | Image super-resolution based on deep neural network of multiple attention mechanism | |
CN112001843A (en) | Infrared image super-resolution reconstruction method based on deep learning | |
CN116029902A (en) | Knowledge distillation-based unsupervised real world image super-resolution method | |
CN117474764B (en) | High-resolution reconstruction method for remote sensing image under complex degradation model | |
CN117114984A (en) | Remote sensing image super-resolution reconstruction method based on generation countermeasure network | |
CN116091492B (en) | Image change pixel level detection method and system | |
Yang et al. | An effective and comprehensive image super resolution algorithm combined with a novel convolutional neural network and wavelet transform | |
Li | Image super-resolution using attention based densenet with residual deconvolution | |
CN115578262A (en) | Polarization image super-resolution reconstruction method based on AFAN model | |
CN117576483B (en) | Multisource data fusion ground object classification method based on multiscale convolution self-encoder | |
CN112508786B (en) | Satellite image-oriented arbitrary-scale super-resolution reconstruction method and system | |
Xu et al. | Fast and accurate image super-resolution using a combined loss |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |