Background
In the image acquisition processes of video monitoring, mobile phone camera shooting, medical imaging, remote sensing imaging, video capturing and the like, a high-resolution image cannot be acquired generally due to the limitation of an imaging mechanism, an imaging environment or imaging equipment. The high-resolution image can provide more detailed information, and is beneficial to analysis and understanding tasks such as image interpretation. The method for improving the resolution of the acquired image in terms of hardware quality has the problems of cost, process and the like, and under the premise that hardware conditions cannot be changed, the image super-resolution method breaks through the limitation of the inherent sampling frequency of a sensor through an image processing means, so that the expansion of a low-resolution image on the spatial resolution or cut-off frequency is realized, and the purpose of improving the spatial resolution of the image is achieved. The single image super-resolution only utilizes the single low-resolution image to reconstruct a high-resolution image, recovers high-frequency components lost in the down-sampling process, and improves the effective resolution of the image.
The process of acquiring images by the imaging device can be regarded as a degradation process from a high-resolution image to a low-resolution image, and the image super-resolution reconstruction is the inverse process of the degradation process, namely the process of reconstructing the high-resolution image by a low-resolution observation image, which is a pathological inverse problem, the solution is not unique, and therefore, a space of feasible solution is required to be constrained by image prior. Traditional priors generally mathematically model the statistical properties of natural images, whereas depth priors refer to prior information that learns or represents images through a depth convolutional neural network. The mathematical expression models complex image prior difficultly in display, a deep network can be used for implicitly learning the mapping relation between data through a deep convolutional neural network, and the deep convolutional neural network can be flexibly used as a submodule of a regularization method. Depth priors can be divided into explicit and implicit modeled image priors. One type of method requires training the network using a data set to learn some potential prior information of the image; the other is to realize the prior constraint of the image by using a network structure without supervision training.
The image super-resolution method can be divided into a non-blind super-resolution method and a blind super-resolution method according to whether a fuzzy kernel is known or not. When the fuzzy kernel is known, the method is a non-blind image super-resolution method, and when the fuzzy kernel is unknown, the method is a blind image super-resolution method. Non-blind image super-resolution methods typically assume a blur kernel as a gaussian function or a bicubic interpolation function. Since the blur kernel in the real scene is complex, when there is a large difference between the real blur kernel and the assumed blur kernel, the reconstruction effect cannot be guaranteed. The blind image super-resolution method estimates a real fuzzy kernel in a degradation process by utilizing potential information in a low-resolution image and reconstructs a high-resolution image.
Although the research of deep learning in image super-resolution has made remarkable progress, most of the existing image super-resolution methods based on deep learning are non-blind, and the blind image super-resolution research work is not much. The image super-resolution method based on deep learning is divided into a method of supervised learning and a method of unsupervised learning according to whether a training data set is needed or not. The supervised learning method constructs a high-resolution image data set and a low-resolution image data set through a predefined degradation process, and trains a network by using the data sets. The self-supervised learning is a common non-supervised learning method in the image super-resolution problem, a low-resolution image is used as a supervision signal, additional information is mined by the image to reconstruct the image, and a true value training network of the high-resolution image or a fuzzy core is not needed. The DRN needs a training network of a paired simulation image data set and an unpaired real image data set, and comprises two network structures of a DRNS with a smaller parameter quantity and a DRNL with a larger parameter quantity. The SRGAN introduces a content loss function training network into the generating countermeasure network, wherein the content loss is defined as the Euclidean distance between the characteristics of a high-resolution reconstructed image and the characteristics of a true image, and the content loss function is used for reconstructing the high-frequency content of the image and improving the visual effect of the reconstructed image. DRN and SRGAN are non-blind image super-resolution methods for supervised learning. The ZSSR uses the assumed or estimated fuzzy core to construct an image pyramid of the low-resolution images according to the self-similarity of the images, and uses the high-low resolution images formed by the multi-layer low-resolution images and the downsampled images to train network parameters. In the ZSSR, although fuzzy kernel information is introduced into the image reconstruction process in the reconstruction process, the fuzzy kernel is not estimated, and the ZSSR does not belong to a blind image super-resolution method in a strict sense.
The blind image super-resolution method based on deep learning can be divided into independent solving and combined modeling according to a fuzzy kernel solving process. In the independently solved blind image super-resolution method, fuzzy kernel estimation and high-resolution image estimation are two independent stages, the error of the fuzzy kernel estimation stage can influence the result of the subsequent high-resolution image estimation, and the result of the high-resolution image estimation cannot correct the error of the fuzzy kernel estimation. And the fuzzy kernel and the high-resolution image can be estimated simultaneously by jointly modeling, the estimation problem of the fuzzy kernel and the high-resolution image is expressed as an optimization problem comprising two decision variables, and the fuzzy kernel and the high-resolution image are corrected mutually by alternately solving. The KernelGAN model proposed by Bell-Kligler et al uses a generative confrontation network, uses a deep linear network as a generator to model an image degradation process and generate a down-sampled version of a low-resolution image, uses a discriminator to judge the cross-scale self-similarity between the low-resolution image and the down-sampled image, and estimates a blur kernel by maximizing the cross-scale self-similarity therebetween. The method belongs to a blind image super-resolution method for self-supervised learning for independently solving a fuzzy kernel, and the existing blind image super-resolution method for combined modeling is few.
The invention discloses a blind image super-resolution method based on depth prior self-supervision learning, which combines a network model and a mathematical model and adopts a combined modeling mode to simultaneously estimate a fuzzy kernel and a high-resolution image. The method estimates the high-resolution image by using a depth convolution neural network DIP-Net, introduces a non-local attention module, provides additional information required by image reconstruction by using complementary constraint among similar image blocks, estimates a blur kernel by solving an accurate solution about a blur kernel optimization problem, and alternately and iteratively estimates the blur kernel and the high-resolution image. The depth prior has the smoothness constraint effect on the image, and the convolutional neural network has the invariance, implicitly utilizes the self-similarity of the image, and combines a non-local attention module to explicitly utilize the self-similarity of the image. The invention takes the low-resolution image as the self-supervision signal, does not need the high-resolution image or the fuzzy check value and has no training process. The method disclosed by the invention can accurately estimate the fuzzy core and effectively reconstruct the edge and detail information of the image.
Disclosure of Invention
In view of this, the embodiment of the present invention provides a depth prior-based blind image super-resolution method, so as to implement blind super-resolution reconstruction of a low-resolution image.
In order to achieve the above object, an embodiment of the present invention provides the following solutions:
a blind image super-resolution method based on depth prior is characterized by comprising the following 5 steps:
step 1, constructing an image generation network model
The method uses DIP-Net as an image generation network to realize the mapping x (z; theta) from a random vector z to a high-resolution image x, utilizes the network to inhibit noise, and implicitly models a smoothness constraint prior term. The DIP-Net network input z is a random vector uniformly distributed on the interval (0; 1), namely z-U (0; 1). Randomly initializing a group of network parameters, using the degraded image as a self-monitoring signal, and updating the parameters by using a gradient descent method
Causing the loss function to converge.
According to the invention, non-local constraint is introduced into DIP-Net, and additional information contained in similar image blocks is explicitly obtained through mutual constraint among the similar image blocks. Non-local operations are defined as follows:
in the formula, x
iIn order to be able to process the features,
as output characteristics, x
jIs x
iThe neighborhood feature of (c), ρ (x)
i;x
j) As a function of similarity, for measuring x
iAnd x
jThe correlation between the two, g (-) is a feature extraction function,c (x) is a normalization parameter. The similarity function generally has four forms of a gaussian function, an embedded gaussian function, an inner product and a cascade.
The non-local attention module is defined as the following form on the basis of non-local operation:
non-local attention module uses residual connection' xi+ "so that it can be embedded in any pre-trained network without affecting the network's task, i.e., when W iszWhen 0, the network keeps the original structure.
The image generation network DIP-Net used by the invention has a U-shaped coding and decoding structure and comprises five groups of downsampling and upsampling convolution structures, wherein the third, fourth and fifth groups of downsampling layer characteristics of the network are added with a non-local attention module. Each group of convolution operations fuses the features of the down-sampling layer and the features of the up-sampling layer with the same corresponding dimension through cross-layer connection, and the number of channels connected in the cross-layer mode is fixed to be 16. The network input z is a random vector subject to uniform distribution, the size of the random vector is consistent with that of a high-resolution image, the number of channels is generally set to be 8 or 16, and the invention is set to be 8.
Step 2, initializing network parameters
Randomly initializing a parameter θ of an image generation network0I.e. to obtain an initial estimate x of the high resolution image0=f(z;θ0). Setting learning rate eta, down-sampling factor a, fuzzy kernel size s, network input random vector z and fuzzy kernel regularization parameter lambdahAnd a maximum iteration number K. Since the blind super-resolution method does not know the true size of the blur kernel, the size of the blur kernel needs to be estimated or preset when reconstructing a high-resolution image.
Step 3, estimating fuzzy core
Fixed network parameter thetak-1Estimate the blur kernel h byk:
In the formula (I), the compound is shown in the specification,
which represents the fourier transform of the signal,
which represents the inverse of the fourier transform,
complex conjugate representing Fourier transform, etc
aRepresenting the elemental dot product operation on a x a image blocks,
carrying out average processing operation on a multiplied by a image blocks, ° c
aIndicating a-fold upsampling with zero padding.
Step 4, estimating high-resolution image
Step 4.1 calculate loss function:
fixing the estimate h of the current blur kernelkGiven of thetak-1Update thetak. With updated fuzzy kernel hkWith estimated high-resolution image xk-1Generating a down-sampled image through variable step convolution operation, and calculating a loss function according to the low-resolution image and the down-sampled image:
the above equation is a mean square error loss function, and other continuously derivable functions may be used as the loss function of the network.
Step 4.2, updating the image generation network parameters:
calculating the gradient of the loss function with respect to the network parameter, and updating theta by using a gradient descent method shown in the following formulak,
In the formula, η represents a learning rate. The invention updates the parameters using the Adam gradient descent algorithm.
Step 4.3 generating a high resolution image: generation of high resolution image x using updated parameter image generation networkk=f(z;θk)。
Step 5, judging convergence, outputting fuzzy core and estimating high-resolution image
Through the steps 3 and 4, the objective function is solved in an iterative mode once to obtain the estimation h of the fuzzy kernelkAnd estimate x of the high resolution imagek-1Is updated to xk. If the maximum iteration times or iteration convergence is reached, stopping iteration and outputting a final fuzzy kernel and high-resolution image estimation; otherwise, let k be k +1, and then repeat steps 3 and 4.
Preferably, the similarity function of the non-local attention module of the algorithm takes the form of an inner product represented by the following formula,
ρ(xi,xj)=θ(xi)Tφ(xj)
in the formula, θ (-) and Φ (-) are feature extraction functions, and c (x) ═ N is used as a normalization parameter, where N represents the total number of features in x, so as to simplify the calculation of the gradient. Since the size of the input features varies, it is more appropriate to use the number of features N of x as the normalization parameter.
Preferably, the number of iterations of the algorithm is 3000.
Preferably, the learning rate η has an initial value of 0.001, and the attenuation is 0.5 times that of the previous time per 500 iterations.
Preferably, the blur kernel regularization parameter λhIs 2 x 10 as an initial value-5The 1000 increases per iteration are 1.2 times the previous.
The invention discloses a blind image super-resolution method based on depth prior self-supervision learning, which is used for reconstructing a high-resolution image from end to end. The method combines a network model and a mathematical model, and simultaneously estimates the fuzzy core and the high-resolution image by adopting a combined modeling mode. Estimating a high-resolution image by using a depth convolution neural network DIP-Net, introducing a non-local attention module, providing additional information required for reconstructing the image by using complementary constraint among similar image blocks, and estimating a blur kernel by solving an accurate solution about a blur kernel optimization problem, and alternately and iteratively estimating the blur kernel and the high-resolution image. The invention takes the low-resolution image as the self-supervision signal, and does not need the training process of a data set. The method disclosed by the invention can accurately estimate the fuzzy core and effectively reconstruct the edge and detail information of the image.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the image super-resolution problem, the degradation model between the low-resolution image and the original high-resolution image is usually expressed in the form of a convolution as follows:
y=(h*x)↓a+n (1)
in the formula, y is a low-resolution image, h is a blur kernel, x is a high-resolution image, a is a down-sampling factor, n is additive noise, and x is a two-dimensional convolution operation. Under the convolution model, the image blind super-resolution method is to study how to estimate the blur kernel h and the high-resolution image x from the low-resolution image y at the same time, as shown in fig. 1.
The invention uses the image generation network to reconstruct the high-resolution image, and introduces the regularization constraint term of the fuzzy kernel h as fuzzy kernel prior, and the objective function can be expressed as:
wherein h is a fuzzy kernel, f (z; theta) is an image generation network with the network input being z and the network parameter being theta, and lambdahIs a regularization parameter. The former item in the target function is a data fidelity item, and the reconstructed image is ensured to accord with a degradation model; the latter term is a regularization constraint term of the fuzzy kernel h, and the smoothness of the fuzzy kernel is constrained.
Fig. 2 is an overall structure of the blind image super-resolution method based on depth prior disclosed by the invention, and the optimization problem shown in formula (2) is solved by an alternative iteration method in a mode of combining a network model and a mathematical model. The image is estimated by using DIP-Net, the process of calculating the loss function and updating the network parameter theta is the image updating process, the fuzzy kernel is estimated by using the optimality condition of the fuzzy kernel minimization problem, the high-resolution image and the fuzzy kernel are alternately updated in an iterative mode, and finally the network output x ═ f (z; theta) is the reconstructed high-resolution image.
The embodiment of the invention discloses a depth prior-based blind image super-resolution method, which is used for realizing blind super-resolution reconstruction of a low-resolution image. Referring to fig. 3, the above method includes at least the following 5 steps.
Step 1, constructing an image generation network model
The image generation network DIP-Net implements the random vector z to sharp image x mapping x ═ f (z; θ). The DIP-Net aims at solving the problem of image denoising, a network takes a noisy image as a supervision signal, a loss function is calculated to update network parameters, and a noise-free clear image can be preferentially generated in the process of fitting the noisy image by a random vector. DIP-Net can be effectively extended to solve a variety of image inverse problems, and the image super-resolution problem can be generally modeled as an optimization problem as follows:
wherein y is a low resolution image, h is a blur kernel, x is a high resolution image, ↓, andafor down-sampling operations, R (x) is a smoothness function, λxIs a regular term coefficient. The former item in the target function is a data fidelity item, and the reconstructed image is ensured to accord with a degradation model; the latter term is a smoothness prior constraint term, and suppresses noise in the reconstruction process.
The DIP-Net generates a network f (z; theta) through an image, noise is suppressed by using the network, the method is equivalent to implicitly establishing a smoothness constraint prior term R (x) in a mode (3), and a network loss function is as follows:
where theta is a network parameter, z is a network input,
i.e. the reconstructed image. The DIP-Net network input z is a random vector uniformly distributed on the interval (0; 1), namely z-U (0; 1). Randomly initializing a set of network parameters with a degraded image asUpdating parameters by gradient descent method based on self-supervision signal
Causing the loss function to converge. DIP-Net is essentially a regularization method, network parameters need to be estimated for each image, and the parameter update process is actually the solution process to the optimization problem. DIP-Net uses the degraded image itself as a supervision signal, without the need for a training process and training data set, and can be considered as self-supervised learning.
According to the invention, non-local constraint is introduced into DIP-Net, and additional information contained in similar image blocks is explicitly obtained through mutual constraint among the similar image blocks. There is a redundant similar image structure in a natural image, and the similar structure is embodied as similar image blocks in the image, and 80% of the image blocks in the same image have 9 or more similar image blocks. A large number of similar image blocks embedded in the natural image can provide additional information for super-resolution of the image. For any image block in an image, there are multiple similar image blocks in the entire image or a range of search windows. The non-local regularized super-resolution method utilizes complementary information provided by the similar image blocks in the form of regularization constraints by constructing non-local constraint terms. For the target image block x
iSearching for L image blocks similar thereto
And estimating the target by using the similar image blocks, wherein the difference between the target value and the estimated value is a non-local constraint term.
Non-local operations are defined as follows:
in the formula, x
iIn order to be able to process the features,
as output characteristics, x
jIs x
iThe neighborhood feature of (c), ρ (x)
i;x
j) As a function of similarity, for measuring x
iAnd x
jThe correlation between g (-) and c (x) is a normalization parameter.
The non-local attention module is defined as the following form on the basis of non-local operation:
non-local attention module uses residual connection' xi+ "so that it can be embedded in any pre-trained network without affecting the network's task, i.e., when W iszWhen 0, the network keeps the original structure.
The similarity function generally has four forms, namely a gaussian function, an embedded gaussian function, an inner product and a cascade, and the four forms of similarity functions are described below.
1. Gaussian function
The similarity function in the form of a Gaussian function is defined as a function of the non-local mean and bilateral filtering
Inner product operation can be replaced by Euclidean distance generally, but the inner product operation is easier to realize in a deep convolutional neural network, and the normalization parameter is defined as
2. Embedded gaussian function
A simple variant of the Gaussian function is to compute the similarity in the embedded space, i.e.
ρ(xi,xj)=exp(θ(xi)Tφ(xj)) (8)
In the formula, θ (-) and φ (-) are feature extraction functions, and the normalization parameters are defined as in the Gaussian function.
3. Inner product form
The similarity function can also be defined directly in the form of an inner product, i.e.
ρ(xi;xj)=θ(xi)Tφ(xj) (9)
In the formula, c (x) ═ N is used as the normalization parameter, where N represents the total number of features in x, which can simplify the gradient calculation. Since the size of the input features varies, it is more appropriate to use the number of features N of x as the normalization parameter.
4. Form of cascade
Using cascades as similarity functions, i.e.
Wherein [:]represents a cascade, wpFor the weight vector, the concatenated vector may be converted into a scalar, where c (x) is also set to N.
The similarity function of this embodiment adopts an inner product form, and similarity functions of other forms all belong to the protection scope of the present invention. FIG. 4 is a network structure of a non-local attention module used in the present embodiment, wherein W
θ、W
φAnd W
gThe convolution matrices, theta (-), phi (-), and g (-), respectively, are implemented by a 1 × 1 convolution. In the drawings
It is shown that the matrix multiplication operation,
indicating an element-by-element addition operation, and 1/N indicating an operation of scaling the features after the convolution operation by 1/N. The non-local attention module has no requirement on the size of the input feature, can be flexibly embedded into any position of the deep convolutional neural network, constructs constraint through the similarity of the feature to be processed and the neighborhood features of different positions, and utilizes the deep convolutional neural network to acquire additional information contained in the self-similarity of the image to reconstruct the height of the high-resolution imageFrequency details.
The image generation network DIP-Net used by the invention has a U-shaped coding and decoding structure, the specific structure of which is shown in figure 5, and comprises five groups of downsampling and upsampling convolution structures, wherein the third, fourth and fifth groups of downsampling layer features of the network are added with a non-local attention module. Each group of convolution operations fuses the features of the down-sampling layer and the features of the up-sampling layer with the same corresponding dimension through cross-layer connection, and the number of channels connected in the cross-layer mode is fixed to be 16. The network input z is a random vector subject to uniform distribution, the size of the random vector is consistent with that of a high-resolution image, the number of channels is generally set to be 8 or 16, and the invention is set to be 8. The size of the input vector in the graph is 8 × 271 × 271, the size of the input feature map is reduced from 271 × 271 to 9 × 9 through the first to sixth groups of convolution layers, and the input feature map is restored to the same size as the input random vector through the first to fifth groups of upsampling layers.
Step 2, initializing network parameters
Randomly initializing a parameter θ of an image generation network0I.e. to obtain an initial estimate x of the high resolution image0=f(z;θ0). Setting learning rate eta, down-sampling factor a, fuzzy kernel size s, network input random vector z and fuzzy kernel regularization parameter lambdahAnd a maximum iteration number K. Since the blind super-resolution method does not know the true size of the blur kernel, the size of the blur kernel needs to be estimated or preset when reconstructing a high-resolution image. The invention solves the network parameter theta and the fuzzy kernel h in the formula (2) by adopting an alternative solving mode, namely, the network parameter theta of the high-resolution image is fixedly estimated firstlyk-1Solving the fuzzy kernel hkThen fix the estimate h of the blur kernelkSolving for the network parameter θ of the estimated high resolution imagekUntil convergence or a maximum number of iterations is reached.
Step 3, estimating fuzzy core
Fixed network parameter thetak-1Estimating a blur kernel hkIn this case, the optimization problem can be expressed as:
wherein f (z; theta)k-1) Is the network output for the (k-1) th iteration. To simplify the mathematical expression, let xk-1=f(z;θk-1) Equation (11) can be written as:
converting equation (12) into the form of a matrix-vector product:
where D is a down-sampling matrix, Xk-1For high resolution image xk-1The corresponding block circulant matrix. Equation (13) is a quadratic function with respect to h, there is a closed solution:
equation (14) requires computation of large-scale matrix inversions, and the present invention employs solving a closed solution in the frequency domain. Is provided with
Having a diagonal element of X
k-1Fourier coefficients of the first column, denoted Λ as a block diagonal matrix, i.e.
Each sub-block
Also a diagonal matrix, denoted Γ as
Derived from equation (14):
wherein F is a Fourier transform matrix, FHIs an inverse fourier transform matrix. The frequency domain solution of the blur kernel can be written according to equation (15):
in the formula (I), the compound is shown in the specification,
which represents the fourier transform of the signal,
which represents the inverse of the fourier transform,
complex conjugate representing Fourier transform, etc
aRepresenting the elemental dot product operation on a x a image blocks,
carrying out average processing operation on a multiplied by a image blocks, ° c
aIndicating a-fold upsampling with zero padding.
Step 4, estimating high-resolution image
Step 4.1 calculate loss function:
fixing the estimate h of the current blur kernelkGiven of thetak-1Update thetakAt this time, the objective function is simplified as:
with updated fuzzy kernel hkWith estimated high-resolution image xk-1The down-sampled image is generated by a variable step convolution operation. Calculating a loss function according to the low-resolution image and the down-sampled image, wherein an objective function in the formula (17) is a loss function of the network:
equation (18) is the mean square error loss function, but other continuously derivable functions may be used as the loss function of the network.
Step 4.2, updating the image generation network parameters:
the gradient of the loss function of the calculation formula (18) with respect to the network parameter is updated by a gradient descent methodk,
In the formula, η represents a learning rate. In order to simplify the mathematical expressions, let the gradient
The invention updates network parameters by using an Adam gradient descent method, and the Adam algorithm uses momentum vkAnd second order momentum s in the RMSProp algorithmk. Initialization v0=s 00, given a hyperparameter of 0 ≦ β1< 1, momentum v of the k-th iterationkExpressed as the gradient gk-1Exponentially weighted moving average of (d):
vk=β1vk-1+(1-β1)gk-1 (20)
given a hyperparameter of 0 ≦ β2<1,skExpressed as the gradient squared term gk-1⊙gk-1Exponentially weighted moving average of (d):
sk=β2sk-1+(1-β2)gk-1⊙gk-1 (21)
in the formula, "-" indicates element-by-element multiplication. Due to v0And s0Are all initialized to zero, and at the k-th iteration, the momentum vkExpressed as:
the gradient weights of each previous iteration are added to obtain,
when k is small, the sum of the gradient weights for each iteration is small. To eliminate such effects, for the k-th iteration, v is
kIs divided by
And making the sum of the gradient weights of each past iteration be 1, and called deviation correction. In the Adam algorithm, the variable v is added
kAnd s
kAnd (4) correcting deviation:
adam algorithm uses a bias-corrected variable v'kAnd s'kAnd learning rate eta update gradient g'k-1,
In the formula, eta is a learning rate, and each element of the independent variable in Adam has a different learning rate; e is a constant to avoid the case where the denominator is 0 in equation (26). Use g 'in the kth iteration'k-1The parameters of the network are updated and,
θk=θk-1-g′k-1 (27)
step 4.3 generating a high resolution image: generating network generator by using image with updated parametersInto a high resolution image xk=f(z;θk)。
Step 5, judging convergence, outputting fuzzy core and estimating high-resolution image
Through the steps 3 and 4, the objective function is solved in an iterative mode once to obtain the estimation h of the fuzzy kernelkAnd estimate x of the high resolution imagek-1Is updated to xk. If the maximum iteration times or iteration convergence is reached, stopping iteration and outputting a final fuzzy kernel and high-resolution image estimation; otherwise, let k be k +1, and then repeat steps 3 and 4.
Preferably, the iteration number of the algorithm is set to 3000, and the learning rate is set to be 3000ηIs 0.001, the attenuation of 500 iterations is 0.5 times that of the last iteration, and the fuzzy kernel regularization parameter lambda ishIs 2 x 10 as an initial value-5The 1000-time improvement per iteration is 1.2 times that of the previous time, and the blur kernel sizes of 2-time and 4-time super-resolution are set to 11 × 11 and 23 × 23, respectively.
For video memory reasons, the present invention constructs a DIVRK simulated low resolution dataset screening 20 different classes of images, including animals, sculptures, aerial photographs, buildings, plants, text, and people, of size 816 × 816 to 904 × 904 from the DIV2K dataset for evaluation of blind image super-resolution algorithms. According to the construction mode of a DIV2KRK data set, different and randomly generated anisotropic Gaussian blur kernels are used for each image to carry out convolution and down-sampling operation, and low-resolution images with down-sampling factors of 2 and 4 are generated respectively. The generation process of the anisotropic Gaussian blur kernel comprises the following steps: randomly setting the variance λ in the horizontal and vertical directions1;λ2U (0:35, 10), randomly rotating by an angle theta to U (-pi, pi), and performing normalization, wherein U (a, b) represents an interval [ a, b ]]Is uniformly distributed. When the down-sampling factor is 2, the blur kernel size is 11 × 11, and when the down-sampling factor is 4, the blur kernel size is 23 × 23.
The peak signal-to-noise ratio PSNR and the structural similarity SSIM are used as quantitative evaluation indexes. PSNR calculates the average pixel error between a reconstructed image and a true value image, SSIM calculates the structural similarity between the reconstructed image and the true value image, the result is between [0 and 1], and the higher the two indexes are, the better the reconstruction quality is. SRGAN and DRN are currently generally accepted non-blind image super-resolution methods, and Kernelgan + ZSSR represents a method for realizing blind image super-resolution by using a fuzzy kernel estimated by Kernelgan and combining ZSSR. Fig. 6 lists the average PSNR and SSIM of various image super-resolution algorithms on the DIVRK data set. As can be seen from the table, for the super-resolution reconstruction of 2 times and 4 times, the average PSNR and SSIM of the invention are both the highest, and the quantitative experiment result proves that the invention achieves better image reconstruction quality.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.