CN114202459A

CN114202459A - Blind image super-resolution method based on depth prior

Info

Publication number: CN114202459A
Application number: CN202111244295.3A
Authority: CN
Inventors: 李一秾; 禹晶; 肖创柏
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2022-03-18
Anticipated expiration: 2041-10-26
Also published as: CN114202459B

Abstract

The invention discloses a blind image super-resolution method based on depth prior, which uses the deep convolutional neural network DIP-Net and combines non-local attention to estimate high-resolution images; Estimate the blur kernel; iteratively update the blur kernel and the high-resolution image alternately, use the estimated blur kernel to generate a down-sampled image of the low-resolution image, and use the loss to update the network parameters in combination with the low-resolution image. The fuzzy kernel and the high-resolution image are jointly modeled, and the fuzzy kernel and the high-resolution image are estimated at the same time by the combination of the network model and the mathematical model. Provides potential additional information; end-to-end blind image super-resolution reconstruction is achieved using only a single low-resolution image as a supervision signal, without the training process of the dataset. The invention can accurately estimate the blur kernel and effectively reconstruct the edge and details of the high-resolution image.

Description

Blind image super-resolution method based on depth prior

Technical Field

The invention relates to the field of image super-resolution, in particular to a blind image super-resolution method based on depth prior.

Background

In the image acquisition processes of video monitoring, mobile phone camera shooting, medical imaging, remote sensing imaging, video capturing and the like, a high-resolution image cannot be acquired generally due to the limitation of an imaging mechanism, an imaging environment or imaging equipment. The high-resolution image can provide more detailed information, and is beneficial to analysis and understanding tasks such as image interpretation. The method for improving the resolution of the acquired image in terms of hardware quality has the problems of cost, process and the like, and under the premise that hardware conditions cannot be changed, the image super-resolution method breaks through the limitation of the inherent sampling frequency of a sensor through an image processing means, so that the expansion of a low-resolution image on the spatial resolution or cut-off frequency is realized, and the purpose of improving the spatial resolution of the image is achieved. The single image super-resolution only utilizes the single low-resolution image to reconstruct a high-resolution image, recovers high-frequency components lost in the down-sampling process, and improves the effective resolution of the image.

The process of acquiring images by the imaging device can be regarded as a degradation process from a high-resolution image to a low-resolution image, and the image super-resolution reconstruction is the inverse process of the degradation process, namely the process of reconstructing the high-resolution image by a low-resolution observation image, which is a pathological inverse problem, the solution is not unique, and therefore, a space of feasible solution is required to be constrained by image prior. Traditional priors generally mathematically model the statistical properties of natural images, whereas depth priors refer to prior information that learns or represents images through a depth convolutional neural network. The mathematical expression models complex image prior difficultly in display, a deep network can be used for implicitly learning the mapping relation between data through a deep convolutional neural network, and the deep convolutional neural network can be flexibly used as a submodule of a regularization method. Depth priors can be divided into explicit and implicit modeled image priors. One type of method requires training the network using a data set to learn some potential prior information of the image; the other is to realize the prior constraint of the image by using a network structure without supervision training.

The image super-resolution method can be divided into a non-blind super-resolution method and a blind super-resolution method according to whether a fuzzy kernel is known or not. When the fuzzy kernel is known, the method is a non-blind image super-resolution method, and when the fuzzy kernel is unknown, the method is a blind image super-resolution method. Non-blind image super-resolution methods typically assume a blur kernel as a gaussian function or a bicubic interpolation function. Since the blur kernel in the real scene is complex, when there is a large difference between the real blur kernel and the assumed blur kernel, the reconstruction effect cannot be guaranteed. The blind image super-resolution method estimates a real fuzzy kernel in a degradation process by utilizing potential information in a low-resolution image and reconstructs a high-resolution image.

Although the research of deep learning in image super-resolution has made remarkable progress, most of the existing image super-resolution methods based on deep learning are non-blind, and the blind image super-resolution research work is not much. The image super-resolution method based on deep learning is divided into a method of supervised learning and a method of unsupervised learning according to whether a training data set is needed or not. The supervised learning method constructs a high-resolution image data set and a low-resolution image data set through a predefined degradation process, and trains a network by using the data sets. The self-supervised learning is a common non-supervised learning method in the image super-resolution problem, a low-resolution image is used as a supervision signal, additional information is mined by the image to reconstruct the image, and a true value training network of the high-resolution image or a fuzzy core is not needed. The DRN needs a training network of a paired simulation image data set and an unpaired real image data set, and comprises two network structures of a DRNS with a smaller parameter quantity and a DRNL with a larger parameter quantity. The SRGAN introduces a content loss function training network into the generating countermeasure network, wherein the content loss is defined as the Euclidean distance between the characteristics of a high-resolution reconstructed image and the characteristics of a true image, and the content loss function is used for reconstructing the high-frequency content of the image and improving the visual effect of the reconstructed image. DRN and SRGAN are non-blind image super-resolution methods for supervised learning. The ZSSR uses the assumed or estimated fuzzy core to construct an image pyramid of the low-resolution images according to the self-similarity of the images, and uses the high-low resolution images formed by the multi-layer low-resolution images and the downsampled images to train network parameters. In the ZSSR, although fuzzy kernel information is introduced into the image reconstruction process in the reconstruction process, the fuzzy kernel is not estimated, and the ZSSR does not belong to a blind image super-resolution method in a strict sense.

The blind image super-resolution method based on deep learning can be divided into independent solving and combined modeling according to a fuzzy kernel solving process. In the independently solved blind image super-resolution method, fuzzy kernel estimation and high-resolution image estimation are two independent stages, the error of the fuzzy kernel estimation stage can influence the result of the subsequent high-resolution image estimation, and the result of the high-resolution image estimation cannot correct the error of the fuzzy kernel estimation. And the fuzzy kernel and the high-resolution image can be estimated simultaneously by jointly modeling, the estimation problem of the fuzzy kernel and the high-resolution image is expressed as an optimization problem comprising two decision variables, and the fuzzy kernel and the high-resolution image are corrected mutually by alternately solving. The KernelGAN model proposed by Bell-Kligler et al uses a generative confrontation network, uses a deep linear network as a generator to model an image degradation process and generate a down-sampled version of a low-resolution image, uses a discriminator to judge the cross-scale self-similarity between the low-resolution image and the down-sampled image, and estimates a blur kernel by maximizing the cross-scale self-similarity therebetween. The method belongs to a blind image super-resolution method for self-supervised learning for independently solving a fuzzy kernel, and the existing blind image super-resolution method for combined modeling is few.

The invention discloses a blind image super-resolution method based on depth prior self-supervision learning, which combines a network model and a mathematical model and adopts a combined modeling mode to simultaneously estimate a fuzzy kernel and a high-resolution image. The method estimates the high-resolution image by using a depth convolution neural network DIP-Net, introduces a non-local attention module, provides additional information required by image reconstruction by using complementary constraint among similar image blocks, estimates a blur kernel by solving an accurate solution about a blur kernel optimization problem, and alternately and iteratively estimates the blur kernel and the high-resolution image. The depth prior has the smoothness constraint effect on the image, and the convolutional neural network has the invariance, implicitly utilizes the self-similarity of the image, and combines a non-local attention module to explicitly utilize the self-similarity of the image. The invention takes the low-resolution image as the self-supervision signal, does not need the high-resolution image or the fuzzy check value and has no training process. The method disclosed by the invention can accurately estimate the fuzzy core and effectively reconstruct the edge and detail information of the image.

Disclosure of Invention

In view of this, the embodiment of the present invention provides a depth prior-based blind image super-resolution method, so as to implement blind super-resolution reconstruction of a low-resolution image.

In order to achieve the above object, an embodiment of the present invention provides the following solutions:

a blind image super-resolution method based on depth prior is characterized by comprising the following 5 steps:

step 1, constructing an image generation network model

The method uses DIP-Net as an image generation network to realize the mapping x (z; theta) from a random vector z to a high-resolution image x, utilizes the network to inhibit noise, and implicitly models a smoothness constraint prior term. The DIP-Net network input z is a random vector uniformly distributed on the interval (0; 1), namely z-U (0; 1). Randomly initializing a group of network parameters, using the degraded image as a self-monitoring signal, and updating the parameters by using a gradient descent method

Causing the loss function to converge.

According to the invention, non-local constraint is introduced into DIP-Net, and additional information contained in similar image blocks is explicitly obtained through mutual constraint among the similar image blocks. Non-local operations are defined as follows:

in the formula, x_iIn order to be able to process the features,

as output characteristics, x_jIs x_iThe neighborhood feature of (c), ρ (x)_i；x_j) As a function of similarity, for measuring x_iAnd x_jThe correlation between the two, g (-) is a feature extraction function,c (x) is a normalization parameter. The similarity function generally has four forms of a gaussian function, an embedded gaussian function, an inner product and a cascade.

The non-local attention module is defined as the following form on the basis of non-local operation:

non-local attention module uses residual connection' x_i+ "so that it can be embedded in any pre-trained network without affecting the network's task, i.e., when W is_zWhen 0, the network keeps the original structure.

The image generation network DIP-Net used by the invention has a U-shaped coding and decoding structure and comprises five groups of downsampling and upsampling convolution structures, wherein the third, fourth and fifth groups of downsampling layer characteristics of the network are added with a non-local attention module. Each group of convolution operations fuses the features of the down-sampling layer and the features of the up-sampling layer with the same corresponding dimension through cross-layer connection, and the number of channels connected in the cross-layer mode is fixed to be 16. The network input z is a random vector subject to uniform distribution, the size of the random vector is consistent with that of a high-resolution image, the number of channels is generally set to be 8 or 16, and the invention is set to be 8.

Step 2, initializing network parameters

Randomly initializing a parameter θ of an image generation network₀I.e. to obtain an initial estimate x of the high resolution image₀＝f(z；θ₀). Setting learning rate eta, down-sampling factor a, fuzzy kernel size s, network input random vector z and fuzzy kernel regularization parameter lambda_hAnd a maximum iteration number K. Since the blind super-resolution method does not know the true size of the blur kernel, the size of the blur kernel needs to be estimated or preset when reconstructing a high-resolution image.

Step 3, estimating fuzzy core

Fixed network parameter theta_k-1Estimate the blur kernel h by_k：

In the formula (I), the compound is shown in the specification,

which represents the fourier transform of the signal,

which represents the inverse of the fourier transform,

complex conjugate representing Fourier transform, etc_aRepresenting the elemental dot product operation on a x a image blocks,

carrying out average processing operation on a multiplied by a image blocks, ° c_aIndicating a-fold upsampling with zero padding.

Step 4, estimating high-resolution image

Step 4.1 calculate loss function:

fixing the estimate h of the current blur kernel_kGiven of theta_k-1Update theta_k. With updated fuzzy kernel h_kWith estimated high-resolution image x_k-1Generating a down-sampled image through variable step convolution operation, and calculating a loss function according to the low-resolution image and the down-sampled image:

the above equation is a mean square error loss function, and other continuously derivable functions may be used as the loss function of the network.

Step 4.2, updating the image generation network parameters:

calculating the gradient of the loss function with respect to the network parameter, and updating theta by using a gradient descent method shown in the following formula_k，

In the formula, η represents a learning rate. The invention updates the parameters using the Adam gradient descent algorithm.

Step 4.3 generating a high resolution image: generation of high resolution image x using updated parameter image generation network_k＝f(z；θ_k)。

Step 5, judging convergence, outputting fuzzy core and estimating high-resolution image

Through the steps 3 and 4, the objective function is solved in an iterative mode once to obtain the estimation h of the fuzzy kernel_kAnd estimate x of the high resolution image_k-1Is updated to x_k. If the maximum iteration times or iteration convergence is reached, stopping iteration and outputting a final fuzzy kernel and high-resolution image estimation; otherwise, let k be k +1, and then repeat steps 3 and 4.

Preferably, the similarity function of the non-local attention module of the algorithm takes the form of an inner product represented by the following formula,

ρ(x_i，x_j)＝θ(x_i)^Tφ(x_j)

in the formula, θ (-) and Φ (-) are feature extraction functions, and c (x) ═ N is used as a normalization parameter, where N represents the total number of features in x, so as to simplify the calculation of the gradient. Since the size of the input features varies, it is more appropriate to use the number of features N of x as the normalization parameter.

Preferably, the number of iterations of the algorithm is 3000.

Preferably, the learning rate η has an initial value of 0.001, and the attenuation is 0.5 times that of the previous time per 500 iterations.

Preferably, the blur kernel regularization parameter λ_hIs 2 x 10 as an initial value^-5The 1000 increases per iteration are 1.2 times the previous.

The invention discloses a blind image super-resolution method based on depth prior self-supervision learning, which is used for reconstructing a high-resolution image from end to end. The method combines a network model and a mathematical model, and simultaneously estimates the fuzzy core and the high-resolution image by adopting a combined modeling mode. Estimating a high-resolution image by using a depth convolution neural network DIP-Net, introducing a non-local attention module, providing additional information required for reconstructing the image by using complementary constraint among similar image blocks, and estimating a blur kernel by solving an accurate solution about a blur kernel optimization problem, and alternately and iteratively estimating the blur kernel and the high-resolution image. The invention takes the low-resolution image as the self-supervision signal, and does not need the training process of a data set. The method disclosed by the invention can accurately estimate the fuzzy core and effectively reconstruct the edge and detail information of the image.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a blind image super-resolution reconstruction provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a structure of a blind image super-resolution method based on depth prior provided by an embodiment of the present invention;

FIG. 3 is a flowchart of a method for super-resolution of a blind image based on depth prior according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a non-local attention module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image generation network provided by an embodiment of the present invention;

FIG. 6 is a comparison of the average PSNR and SSIM for various methods on the DIVRK data set provided by embodiments of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the image super-resolution problem, the degradation model between the low-resolution image and the original high-resolution image is usually expressed in the form of a convolution as follows:

y＝(h*x)↓_a+n (1)

in the formula, y is a low-resolution image, h is a blur kernel, x is a high-resolution image, a is a down-sampling factor, n is additive noise, and x is a two-dimensional convolution operation. Under the convolution model, the image blind super-resolution method is to study how to estimate the blur kernel h and the high-resolution image x from the low-resolution image y at the same time, as shown in fig. 1.

The invention uses the image generation network to reconstruct the high-resolution image, and introduces the regularization constraint term of the fuzzy kernel h as fuzzy kernel prior, and the objective function can be expressed as:

wherein h is a fuzzy kernel, f (z; theta) is an image generation network with the network input being z and the network parameter being theta, and lambda_hIs a regularization parameter. The former item in the target function is a data fidelity item, and the reconstructed image is ensured to accord with a degradation model; the latter term is a regularization constraint term of the fuzzy kernel h, and the smoothness of the fuzzy kernel is constrained.

Fig. 2 is an overall structure of the blind image super-resolution method based on depth prior disclosed by the invention, and the optimization problem shown in formula (2) is solved by an alternative iteration method in a mode of combining a network model and a mathematical model. The image is estimated by using DIP-Net, the process of calculating the loss function and updating the network parameter theta is the image updating process, the fuzzy kernel is estimated by using the optimality condition of the fuzzy kernel minimization problem, the high-resolution image and the fuzzy kernel are alternately updated in an iterative mode, and finally the network output x ═ f (z; theta) is the reconstructed high-resolution image.

The embodiment of the invention discloses a depth prior-based blind image super-resolution method, which is used for realizing blind super-resolution reconstruction of a low-resolution image. Referring to fig. 3, the above method includes at least the following 5 steps.

Step 1, constructing an image generation network model

The image generation network DIP-Net implements the random vector z to sharp image x mapping x ═ f (z; θ). The DIP-Net aims at solving the problem of image denoising, a network takes a noisy image as a supervision signal, a loss function is calculated to update network parameters, and a noise-free clear image can be preferentially generated in the process of fitting the noisy image by a random vector. DIP-Net can be effectively extended to solve a variety of image inverse problems, and the image super-resolution problem can be generally modeled as an optimization problem as follows:

wherein y is a low resolution image, h is a blur kernel, x is a high resolution image, ↓, and_afor down-sampling operations, R (x) is a smoothness function, λ_xIs a regular term coefficient. The former item in the target function is a data fidelity item, and the reconstructed image is ensured to accord with a degradation model; the latter term is a smoothness prior constraint term, and suppresses noise in the reconstruction process.

The DIP-Net generates a network f (z; theta) through an image, noise is suppressed by using the network, the method is equivalent to implicitly establishing a smoothness constraint prior term R (x) in a mode (3), and a network loss function is as follows:

where theta is a network parameter, z is a network input,

i.e. the reconstructed image. The DIP-Net network input z is a random vector uniformly distributed on the interval (0; 1), namely z-U (0; 1). Randomly initializing a set of network parameters with a degraded image asUpdating parameters by gradient descent method based on self-supervision signal

Causing the loss function to converge. DIP-Net is essentially a regularization method, network parameters need to be estimated for each image, and the parameter update process is actually the solution process to the optimization problem. DIP-Net uses the degraded image itself as a supervision signal, without the need for a training process and training data set, and can be considered as self-supervised learning.

According to the invention, non-local constraint is introduced into DIP-Net, and additional information contained in similar image blocks is explicitly obtained through mutual constraint among the similar image blocks. There is a redundant similar image structure in a natural image, and the similar structure is embodied as similar image blocks in the image, and 80% of the image blocks in the same image have 9 or more similar image blocks. A large number of similar image blocks embedded in the natural image can provide additional information for super-resolution of the image. For any image block in an image, there are multiple similar image blocks in the entire image or a range of search windows. The non-local regularized super-resolution method utilizes complementary information provided by the similar image blocks in the form of regularization constraints by constructing non-local constraint terms. For the target image block x_iSearching for L image blocks similar thereto

And estimating the target by using the similar image blocks, wherein the difference between the target value and the estimated value is a non-local constraint term.

Non-local operations are defined as follows:

in the formula, x_iIn order to be able to process the features,

as output characteristics, x_jIs x_iThe neighborhood feature of (c), ρ (x)_i；x_j) As a function of similarity, for measuring x_iAnd x_jThe correlation between g (-) and c (x) is a normalization parameter.

The similarity function generally has four forms, namely a gaussian function, an embedded gaussian function, an inner product and a cascade, and the four forms of similarity functions are described below.

1. Gaussian function

The similarity function in the form of a Gaussian function is defined as a function of the non-local mean and bilateral filtering

Inner product operation can be replaced by Euclidean distance generally, but the inner product operation is easier to realize in a deep convolutional neural network, and the normalization parameter is defined as

2. Embedded gaussian function

A simple variant of the Gaussian function is to compute the similarity in the embedded space, i.e.

ρ(x_i，x_j)＝exp(θ(x_i)^Tφ(x_j)) (8)

In the formula, θ (-) and φ (-) are feature extraction functions, and the normalization parameters are defined as in the Gaussian function.

3. Inner product form

The similarity function can also be defined directly in the form of an inner product, i.e.

ρ(x_i；x_j)＝θ(x_i)^Tφ(x_j) (9)

In the formula, c (x) ═ N is used as the normalization parameter, where N represents the total number of features in x, which can simplify the gradient calculation. Since the size of the input features varies, it is more appropriate to use the number of features N of x as the normalization parameter.

4. Form of cascade

Using cascades as similarity functions, i.e.

Wherein [:]represents a cascade, w_pFor the weight vector, the concatenated vector may be converted into a scalar, where c (x) is also set to N.

The similarity function of this embodiment adopts an inner product form, and similarity functions of other forms all belong to the protection scope of the present invention. FIG. 4 is a network structure of a non-local attention module used in the present embodiment, wherein W_θ、W_φAnd W_gThe convolution matrices, theta (-), phi (-), and g (-), respectively, are implemented by a 1 × 1 convolution. In the drawings

It is shown that the matrix multiplication operation,

indicating an element-by-element addition operation, and 1/N indicating an operation of scaling the features after the convolution operation by 1/N. The non-local attention module has no requirement on the size of the input feature, can be flexibly embedded into any position of the deep convolutional neural network, constructs constraint through the similarity of the feature to be processed and the neighborhood features of different positions, and utilizes the deep convolutional neural network to acquire additional information contained in the self-similarity of the image to reconstruct the height of the high-resolution imageFrequency details.

The image generation network DIP-Net used by the invention has a U-shaped coding and decoding structure, the specific structure of which is shown in figure 5, and comprises five groups of downsampling and upsampling convolution structures, wherein the third, fourth and fifth groups of downsampling layer features of the network are added with a non-local attention module. Each group of convolution operations fuses the features of the down-sampling layer and the features of the up-sampling layer with the same corresponding dimension through cross-layer connection, and the number of channels connected in the cross-layer mode is fixed to be 16. The network input z is a random vector subject to uniform distribution, the size of the random vector is consistent with that of a high-resolution image, the number of channels is generally set to be 8 or 16, and the invention is set to be 8. The size of the input vector in the graph is 8 × 271 × 271, the size of the input feature map is reduced from 271 × 271 to 9 × 9 through the first to sixth groups of convolution layers, and the input feature map is restored to the same size as the input random vector through the first to fifth groups of upsampling layers.

Step 2, initializing network parameters

Randomly initializing a parameter θ of an image generation network₀I.e. to obtain an initial estimate x of the high resolution image₀＝f(z；θ₀). Setting learning rate eta, down-sampling factor a, fuzzy kernel size s, network input random vector z and fuzzy kernel regularization parameter lambda_hAnd a maximum iteration number K. Since the blind super-resolution method does not know the true size of the blur kernel, the size of the blur kernel needs to be estimated or preset when reconstructing a high-resolution image. The invention solves the network parameter theta and the fuzzy kernel h in the formula (2) by adopting an alternative solving mode, namely, the network parameter theta of the high-resolution image is fixedly estimated firstly_k-1Solving the fuzzy kernel h_kThen fix the estimate h of the blur kernel_kSolving for the network parameter θ of the estimated high resolution image_kUntil convergence or a maximum number of iterations is reached.

Step 3, estimating fuzzy core

Fixed network parameter theta_k-1Estimating a blur kernel h_kIn this case, the optimization problem can be expressed as:

wherein f (z; theta)_k-1) Is the network output for the (k-1) th iteration. To simplify the mathematical expression, let x_k-1＝f(z；θ_k-1) Equation (11) can be written as:

converting equation (12) into the form of a matrix-vector product:

where D is a down-sampling matrix, X_k-1For high resolution image x_k-1The corresponding block circulant matrix. Equation (13) is a quadratic function with respect to h, there is a closed solution:

equation (14) requires computation of large-scale matrix inversions, and the present invention employs solving a closed solution in the frequency domain. Is provided with

Having a diagonal element of X_k-1Fourier coefficients of the first column, denoted Λ as a block diagonal matrix, i.e.

Each sub-block

Also a diagonal matrix, denoted Γ as

Derived from equation (14):

wherein F is a Fourier transform matrix, F^HIs an inverse fourier transform matrix. The frequency domain solution of the blur kernel can be written according to equation (15):

in the formula (I), the compound is shown in the specification,

which represents the fourier transform of the signal,

which represents the inverse of the fourier transform,

Step 4, estimating high-resolution image

Step 4.1 calculate loss function:

fixing the estimate h of the current blur kernel_kGiven of theta_k-1Update theta_kAt this time, the objective function is simplified as:

with updated fuzzy kernel h_kWith estimated high-resolution image x_k-1The down-sampled image is generated by a variable step convolution operation. Calculating a loss function according to the low-resolution image and the down-sampled image, wherein an objective function in the formula (17) is a loss function of the network:

equation (18) is the mean square error loss function, but other continuously derivable functions may be used as the loss function of the network.

Step 4.2, updating the image generation network parameters:

the gradient of the loss function of the calculation formula (18) with respect to the network parameter is updated by a gradient descent method_k，

In the formula, η represents a learning rate. In order to simplify the mathematical expressions, let the gradient

The invention updates network parameters by using an Adam gradient descent method, and the Adam algorithm uses momentum v_kAnd second order momentum s in the RMSProp algorithm_k. Initialization v₀＝s ₀0, given a hyperparameter of 0 ≦ β₁< 1, momentum v of the k-th iteration_kExpressed as the gradient g_k-1Exponentially weighted moving average of (d):

v_k＝β₁v_k-1+(1-β₁)g_k-1 (20)

given a hyperparameter of 0 ≦ β₂＜1，s_kExpressed as the gradient squared term g_k-1⊙g_k-1Exponentially weighted moving average of (d):

s_k＝β₂s_k-1+(1-β₂)g_k-1⊙g_k-1 (21)

in the formula, "-" indicates element-by-element multiplication. Due to v₀And s₀Are all initialized to zero, and at the k-th iteration, the momentum v_kExpressed as:

the gradient weights of each previous iteration are added to obtain,

when k is small, the sum of the gradient weights for each iteration is small. To eliminate such effects, for the k-th iteration, v is_kIs divided by

And making the sum of the gradient weights of each past iteration be 1, and called deviation correction. In the Adam algorithm, the variable v is added_kAnd s_kAnd (4) correcting deviation:

adam algorithm uses a bias-corrected variable v'_kAnd s'_kAnd learning rate eta update gradient g'_k-1，

In the formula, eta is a learning rate, and each element of the independent variable in Adam has a different learning rate; e is a constant to avoid the case where the denominator is 0 in equation (26). Use g 'in the kth iteration'_k-1The parameters of the network are updated and,

θ_k＝θ_k-1-g′_k-1 (27)

step 4.3 generating a high resolution image: generating network generator by using image with updated parametersInto a high resolution image x_k＝f(z；θ_k)。

Preferably, the iteration number of the algorithm is set to 3000, and the learning rate is set to be 3000_ηIs 0.001, the attenuation of 500 iterations is 0.5 times that of the last iteration, and the fuzzy kernel regularization parameter lambda is_hIs 2 x 10 as an initial value^-5The 1000-time improvement per iteration is 1.2 times that of the previous time, and the blur kernel sizes of 2-time and 4-time super-resolution are set to 11 × 11 and 23 × 23, respectively.

For video memory reasons, the present invention constructs a DIVRK simulated low resolution dataset screening 20 different classes of images, including animals, sculptures, aerial photographs, buildings, plants, text, and people, of size 816 × 816 to 904 × 904 from the DIV2K dataset for evaluation of blind image super-resolution algorithms. According to the construction mode of a DIV2KRK data set, different and randomly generated anisotropic Gaussian blur kernels are used for each image to carry out convolution and down-sampling operation, and low-resolution images with down-sampling factors of 2 and 4 are generated respectively. The generation process of the anisotropic Gaussian blur kernel comprises the following steps: randomly setting the variance λ in the horizontal and vertical directions₁；λ₂U (0:35, 10), randomly rotating by an angle theta to U (-pi, pi), and performing normalization, wherein U (a, b) represents an interval [ a, b ]]Is uniformly distributed. When the down-sampling factor is 2, the blur kernel size is 11 × 11, and when the down-sampling factor is 4, the blur kernel size is 23 × 23.

The peak signal-to-noise ratio PSNR and the structural similarity SSIM are used as quantitative evaluation indexes. PSNR calculates the average pixel error between a reconstructed image and a true value image, SSIM calculates the structural similarity between the reconstructed image and the true value image, the result is between [0 and 1], and the higher the two indexes are, the better the reconstruction quality is. SRGAN and DRN are currently generally accepted non-blind image super-resolution methods, and Kernelgan + ZSSR represents a method for realizing blind image super-resolution by using a fuzzy kernel estimated by Kernelgan and combining ZSSR. Fig. 6 lists the average PSNR and SSIM of various image super-resolution algorithms on the DIVRK data set. As can be seen from the table, for the super-resolution reconstruction of 2 times and 4 times, the average PSNR and SSIM of the invention are both the highest, and the quantitative experiment result proves that the invention achieves better image reconstruction quality.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. a blind image super-resolution method based on depth prior, is characterized in that, comprises following 5 steps:

Step 1. Build an image generation network model

Use DIP-Net as an image generation network to realize the mapping of random vector z to high-resolution image x x = f(z; θ), use the network itself to suppress noise, and implicitly model the smoothness constraint prior term; DIP-Net The network input z is a random vector uniformly distributed in the interval (0, 1), that is, z～U(0, 1); a set of network parameters are randomly initialized, and the degraded image is used as a self-supervision signal, and the parameters are updated by the gradient descent method

make the loss function converge;

Non-local constraints are introduced into DIP-Net, and the additional information contained in similar image blocks is explicitly obtained through mutual constraints between similar image blocks; non-local operations are defined as follows:

where x _i is the feature to be processed,

is the output feature, x _j is the neighborhood feature of x _i , ρ(x _i , x _j ) is the similarity function, used to measure the correlation between x _i and x _j , g( ) is the feature extraction function, C(x) is the normalization parameter; among them, the similarity function usually has four forms: Gaussian function, embedded Gaussian function, inner product and cascade;

On the basis of non-local operations, the non-local attention module is defined as follows:

The non-local attention module uses the residual connection " _xi +", so that it can be embedded in any pre-trained network without affecting the task of the network, that is, when W _z = 0, the network maintains the original structure;

The used image generation network DIP-Net has a U-shaped codec structure, including five groups of downsampling and upsampling convolution structures, in which the third, fourth, and fifth groups of downsampling layer features of the network are added to the non-local attention module; each The group convolution operation fuses the features of the down-sampling layer with the features of the up-sampling layer corresponding to the same dimension through cross-layer connections, and the number of channels for cross-layer connections is fixed at 16; the network input z is a random vector that obeys a uniform distribution, and its The size is consistent with the high-resolution image, and the number of channels is generally set to 8 or 16, and set to 8;

Step 2. Initialize network parameters

Randomly initialize the parameters θ ₀ of the image generation network to obtain the initial estimate of the high-resolution image x ₀ =f(z; θ ₀ ); set the learning rate η, the downsampling factor a, the size of the blur kernel s, and the network input random vector z, the blur kernel regularization parameter λ _h , the maximum number of iterations K; since the blind super-resolution method does not know the true size of the blur kernel, it is necessary to estimate the size of the blur kernel when reconstructing a high-resolution image, or pre-set the size of the blur kernel;

Step 3. Estimating the blur kernel

With the network parameters θ _k-1 fixed, the blur kernel h _k is estimated by the following equation:

In the formula,

represents the Fourier transform,

represents the inverse Fourier transform,

represents the complex conjugate of the Fourier transform, ⊙ _a represents the element-wise point multiplication of a×a image blocks,

It is an average processing operation on a×a image blocks, ↑ _a means a times upsampling by zero padding;

Step 4. Estimating high-resolution images

Step 4.1 Calculate the loss function:

Fix the current blur kernel estimate h _k , given θ _k-1 , update θ _k ; use the updated blur kernel h _k and the estimated high-resolution image x _k-1 to generate downsampling through a variable-step convolution operation Image, calculate the loss function based on the low-resolution image and the downsampled image:

The above formula is the mean square error loss function, and other continuously derivable functions can also be used as the loss function of the network;

Step 4.2 Update image generation network parameters:

Calculate the gradient of the loss function with respect to the network parameters, and update θ _k using the gradient descent method shown in the following equation,

where η represents the learning rate; the parameters are updated using the Adam gradient descent algorithm;

Step 4.3 Generate a high-resolution image: generate a high-resolution image x _k =f(z; θ _k ) using the image generation network after updating the parameters;

Step 5. Judging convergence, output the estimation of blur kernel and high-resolution image

Through steps 3 and 4, an iterative solution of the objective function is performed to obtain the estimated blur kernel h _k , and the estimated high-resolution image x _k-1 is updated to x _k ; if the maximum number of iterations or iterations is reached at this time If it converges, stop the iteration and output the final blur kernel and high-resolution image estimation; otherwise, let k=k+1, and then repeat steps 3 and 4.

2. a kind of blind image super-resolution method based on depth prior as claimed in claim 1, is characterized in that, the similarity function of described algorithm non-local attention module adopts the inner product form shown in the following formula,

ρ(x _i , x _j )=θ(x _i ) ^T φ(x _j )

In the formula, θ( ) and φ( ) are the feature extraction functions, and the normalization parameter selects C(x)=N, where N represents the total number of features in x, which simplifies the calculation of the gradient; since the size of the input feature is Variation, take the number of features N of x as the normalization parameter.

3 . The blind image super-resolution method based on depth prior according to claim 1 , wherein the number of iterations of the algorithm is 3000. 4 .

4 . The blind image super-resolution method based on depth prior according to claim 1 , wherein the initial value of the learning rate η is 0.001, and the decay is 0.5 times of the previous time every 500 iterations. 5 .

5 . The blind image super-resolution method based on depth prior according to claim 1 , wherein the initial value of the blur kernel regularization parameter λ _h is 2×10 ⁻⁵ , and each iteration is 1000 times. 6 . The increase is 1.2 times the previous time.