CN117522694A - Diffusion model-based image super-resolution reconstruction method and system - Google Patents
Diffusion model-based image super-resolution reconstruction method and system Download PDFInfo
- Publication number
- CN117522694A CN117522694A CN202311584004.4A CN202311584004A CN117522694A CN 117522694 A CN117522694 A CN 117522694A CN 202311584004 A CN202311584004 A CN 202311584004A CN 117522694 A CN117522694 A CN 117522694A
- Authority
- CN
- China
- Prior art keywords
- image
- resolution
- super
- potential
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009792 diffusion process Methods 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 56
- 230000006870 function Effects 0.000 claims description 32
- 230000008569 process Effects 0.000 claims description 26
- 230000004927 fusion Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 10
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 238000009826 distribution Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims 1
- 238000004519 manufacturing process Methods 0.000 claims 1
- 239000004065 semiconductor Substances 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 8
- 230000015556 catabolic process Effects 0.000 abstract description 3
- 238000006731 degradation reaction Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000004913 activation Effects 0.000 description 5
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 241000282320 Panthera leo Species 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 101100365548 Caenorhabditis elegans set-14 gene Proteins 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an image super-resolution reconstruction method based on a diffusion model, and relates to the technical field of computer vision. Comprising acquiring a pair-wise dataset comprising both a high resolution image and a corresponding low resolution image; iteratively training a latent model on the paired data sets, adding a kernel-based attention module to the noise prediction network to fix the latent model, and simultaneously training a diffusion model in a latent space; and inputting the low-image-quality image into the trained potential diffusion model to obtain a corresponding super-resolution generated image. The invention makes the data set based on the degradation characteristic of the high-quality image to the low-quality image, improves the noise prediction network of the diffusion model and carries out iterative training in the potential space, so that the training speed is faster and the effect is better.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to an image super-resolution reconstruction method and system based on a diffusion model.
Background
Image super-resolution technology is an active technology in the field of deep learning, which aims to reconstruct higher quality super-resolution (SR) images by reasoning about the high frequency information of Low Resolution (LR) images. With the development of deep learning, the super-resolution technology based on convolutional neural network has made a great progress.
However, the existing super-resolution network still faces the problem that the generation details are not abundant enough, and cannot effectively capture high-frequency information. In recent years, super-resolution technology based on generation of an countermeasure network is gradually rising, a discriminator is introduced to discriminate the generated high-resolution image, and the generator is forced to generate more realistic details, so that the effect of super-resolution is improved.
Meanwhile, diffusion models, as a class of generative models, also exhibit strong capabilities in image synthesis. It can implicitly learn the data distribution and can be used for conditional image generation. The diffusion model is introduced into the super-resolution task, so that the dilemma of insufficient details of the existing method can be overcome, and a super-resolution image with higher quality can be generated.
Disclosure of Invention
The invention is provided in view of the problems of insufficient details and high-frequency information missing existing in the existing super-resolution technology when processing the super-resolution task of the image.
Therefore, the present invention aims to solve the problem of insufficient super-resolution effect.
In order to solve the technical problems, the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a diffusion model-based image super-resolution reconstruction method, which includes acquiring a pair-wise dataset including both a high-resolution image and a corresponding low-resolution image; iteratively training a latent model on the paired data sets, adding a kernel-based attention module to the noise prediction network to fix the latent model, and simultaneously training a diffusion model in a latent space; and inputting the low-image-quality image into the trained potential diffusion model to obtain a corresponding super-resolution generated image.
As a preferable scheme of the diffusion model-based image super-resolution reconstruction method, the invention comprises the following steps: iteratively training a latent model on a paired dataset comprises the steps of: constructing a pair dataset using the high resolution HR image and the corresponding low resolution HR image; training the potential model by utilizing the paired data sets, and matching the input image with the corresponding potential representation thereof through the encoder in the training process; converting the potential representation into a high resolution reconstructed image by a decoder; optimizing parameters of the potential model according to gradients of the loss function, and determining to stop training according to a preset convergence condition or the maximum iteration number; generating a potential representation of the HR image and the LR image with an encoder of the potential model that has been trained; and introducing a kernel-based attention module into the noise prediction network of the diffusion model, and training according to the difference between the output of the noise prediction network and the real image.
As a preferable scheme of the diffusion model-based image super-resolution reconstruction method, the invention comprises the following steps: adding a kernel-based attention module to fix potential models in a noise prediction network includes the steps of: encoding spatial information by adaptively fusing a learnable kernel basis function, and capturing a spatial mode in an image; predicting a fusion coefficient F of each position by using a lightweight convolution branch network; calculating a fused kernel weight for each spatial location using the predicted fusion coefficient F and the learned kernel basis function; performing convolution transformation on the input feature map X to obtain a feature map which is subjected to self-adaptive convolution by fusing convolution kernel weights, and calculating an output feature map X of the position (i, j) through grouping convolution 0 [i,j]The method comprises the steps of carrying out a first treatment on the surface of the The total loss function of the potential model is calculated by using the reconstruction loss, the representation loss, and the consistency loss.
As a preferable scheme of the diffusion model-based image super-resolution reconstruction method, the invention comprises the following steps: the specific formula of the kernel weight is as follows:
M[i,j]=∑F[i,j,t]*W[t]
where F [ i, j, t ] is the fusion coefficient of the t-th convolution kernel at position (i, j), and W [ t ] is the t-th learnable basis function.
Output characteristic diagram X 0 [i,j]The specific formula of (2) is as follows:
X 0 [i,j]=GroupConv(X e [i,j],M[i,j])
where GroupConv () represents a packet convolution operation, X 0 Representing the output characteristic diagram, X e Feature map representing adaptive convolution, M [ i, j]Representing the fused convolution kernel weights.
As a preferable scheme of the diffusion model-based image super-resolution reconstruction method, the invention comprises the following steps: the specific formula of the total loss function is as follows:
L=L rec (LQ,LQ_real)+L rep (GT,GT_fake)+0.001*L reg (LQ,LQ_l)
wherein L represents the total loss function, L rec Representing reconstruction loss, L rep To represent loss, L reg Representing a loss of consistency, LQ represents a low resolution image, lq_real represents a decoded low resolution image, GT represents a high resolution image, gt_fake represents an image generated using hidden features of low resolution and a potential representation of high resolution, lq_l represents a potential representation of a low resolution image.
Reconstruction loss L rec The specific formula of (2) is as follows:
L rec (LQ,LQ_real)=∑LQ-LQ_real|
LQ_real=LQ_l+LQ_h
wherein L is rec (LQ, lq_real) represents a loss between the low resolution image and the decoded low resolution image, LQ represents the low resolution image, lq_real represents an image generated using the hidden feature of the low resolution and the potential representation of the low resolution, lq_l represents the potential representation of the low resolution image, and lq_h represents the hidden feature of the low resolution image.
Representing loss L rep The specific formula of (2) is as follows:
L rep (GT,GT_fake)=∑|GT-GT_fake|
GT_fake=GT_l+LQ_h
wherein L is rep (GT, GT_fake) represents low resolution image and decodingThe loss between the subsequent low resolution images, GT represents the high resolution image, gt_fake represents the image generated using the hidden feature of the low resolution and the potential representation of the high resolution, gt_l represents the potential representation of the high resolution image, lq_h represents the hidden feature of the low resolution image.
Consistency loss L reg The specific formula of (LQ, LQ_l) is as follows:
L reg (LQ,LQ_l)=∑|LQ_μ-LQ_l_μ|+|LQ_σ-LQ_l_σ|
where lq_μ, lq_l_μ represent the mean of the low resolution picture and its potential representation, lq_σ, lq_l_σ represent the variance of the low resolution picture and its potential representation.
As a preferable scheme of the diffusion model-based image super-resolution reconstruction method, the invention comprises the following steps: the diffusion model comprises a forward process and a reverse process, and the specific formula of the forward process is as follows:
dx=θ t (μ-x)dt+σ(t)dω
wherein dω represents Gaussian noise, θ t Representing a super parameter, σ (t) representing the parameter of the gaussian noise fluctuation over time, dt representing a short period of time, μ representing the low resolution image, x representing the distribution of the generated image, dx representing the amount of change in the generated image within dt.
The specific formula of the reverse process is as follows:
wherein,represents score function, dω represents Gaussian noise, θ t Representing a super parameter, σ (t) representing the parameter of the gaussian noise fluctuation over time, dt representing a short period of time, μ representing the low resolution image, x representing the distribution of the generated image, dx representing the amount of change in the generated image within dt.
As a preferable scheme of the diffusion model-based image super-resolution reconstruction method, the invention comprises the following steps: the specific formulas for obtaining the corresponding super-resolution generated image and carrying out self-adaptive color normalization on the generated result are as follows:
where x represents the generated super-resolution image, y represents the input low-resolution image, adaIN represents the adaptive color normalization, μ (y) represents the mean of the low-resolution image, σ (y) represents the variance of the low-resolution image, μ (x) represents the mean of the low-resolution image, and σ (x) represents the variance of the low-resolution image.
In a second aspect, an embodiment of the present invention provides an image super-resolution reconstruction system based on a diffusion model, which includes a data reading module, configured to read, and perform preprocessing operations on picture data in a data set before starting network training, including adjusting a picture size, random cropping, random horizontal inversion, and normalization; the training module is used for training a potential model of compressed sensing according to HR and LR pictures in the data set, generating a potential representation input diffusion model by using a potential model compressed image for training, and calculating a direct loss of predicted noise and actual noise in the training process and an iterative optimization model; and the image generation module is used for inputting the super-resolution low-quality image into the potential diffusion model after the network training is finished, so as to obtain a super-resolution image corresponding to the original image content.
In a third aspect, embodiments of the present invention provide a computer apparatus comprising a memory and a processor, the memory storing a computer program, wherein: the computer program instructions, when executed by a processor, implement the steps of the diffusion model-based image super-resolution reconstruction method according to the first aspect of the present invention.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, wherein: the computer program instructions, when executed by a processor, implement the steps of the diffusion model-based image super-resolution reconstruction method according to the first aspect of the present invention.
The invention has the beneficial effects that: the invention makes the data set based on the degradation characteristic of the high-quality image to the low-quality image, and introduces a kernel-based attention module and an EAC module into the noise prediction network, thereby improving the accurate prediction capability of the network on the image noise; according to the invention, the image is compressed by the potential model based on the Unet, and the iteration training is performed in the potential space, so that the consumption of the diffusion model training is greatly reduced; compared with the traditional network, the generated network constructed by the invention has better effect in the aspect of generating the high-frequency information of the image, is more suitable for reconstructing the super-resolution image, and has certain practical significance; the method can quickly, effectively and reliably synthesize the super-resolution image with better perceived quality, and opens up new possibility for expanding the super-resolution application scene and improving the visual quality.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
fig. 1 is a flowchart of an image super-resolution reconstruction method based on a diffusion model.
Fig. 2 is a potential model, a diffusion model and an overall architecture diagram of an image super-resolution reconstruction method based on the diffusion model.
Fig. 3 is a KBblock used in a diffusion model of an image super resolution reconstruction method based on the diffusion model.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Example 1
Referring to fig. 1, a first embodiment of the present invention provides a diffusion model-based image super-resolution reconstruction method, which includes,
s1: pairs of data sets are acquired that contain both high resolution images and corresponding low resolution images.
Specifically, the data set is selected according to task requirements, for example, the natural landscape super-resolution may select the data set DIV2K, flickr K, etc., and the face super-resolution may select the data set FFHQ or celeba. And (3) carrying out downsampling processing on the data set by using Bicubic to respectively obtain a corresponding high-resolution image and a corresponding low-resolution image.
S2: the latent model is iteratively trained on the paired data sets and a kernel-based attention module is added to the noise prediction network to fix the latent model while the diffusion model is trained within the latent space.
Specifically, both the latent model and the diffusion model adopt the Unet architecture. The encoder and decoder of the latent model are four layers, each layer is composed of a plurality of Resblock and a lower/upper sampling layer; the noise prediction network of the diffusion model comprises an encoder, a middle layer and a decoder, wherein the encoder and the decoder are also four layers, each layer consists of a plurality of KBlock and a lower/upper sampling layer, and the middle layer consists of a plurality of KBlock.
Further, training is performed using a diffusion model of the potential space, the training process including the following: training the latent model with a dataset of High Resolution (HR) images and Low Resolution (LR) images so that it can effectively combine the latent representation of high resolution with the implicit features of low resolution, thereby generating a high quality high resolution reconstructed image; generating potential representations of the HR image and the LR image (latency_hr and latency_lr) using the encoders of the potential models that have been trained; introducing a kernel-based attention module into a noise prediction network of the diffusion model, and training the noise prediction network according to the latency_HR and the latency_LR; the trained noise prediction network can enable the model to generate an image similar to the HR image in the image diffusion process.
Preferably, the transform. Resolution function is used to resize the picture, with the following specific formulas:
img=Resize(H,W)
wherein Resize denotes a function of resizing the picture, resize, img denotes an input image after resizing, and H and W denote the width and height of an output image, respectively.
Preferably, the input image is randomly clipped by using a slicing mechanism during training, and the specific formula is as follows:
img=img_in[rnd_h:rnd_h+size,rnd_w:rnd_w+size,:]
wherein rnd_h and rnd_w represent randomly obtained starting pixel coordinates, rnd_h: rnd_h+lr_size represents extraction in the height direction, rnd_w: rnd_w+lr_size represents extraction in the width direction: representing extraction of all channels, size represents the size of the output after desired random cropping, img represents the final output picture.
Preferably, the input picture is randomly and horizontally flipped using a transform.
img=Flip(P)
Where img denotes the flipped image, P denotes the probability that the image performs horizontal flipping, flip denotes the random horizontal flipping function transform.
Further, the skip-connection process of the potential model between the encoder and the decoder includes the following: in the encoder, the characteristics of the input residual block are extracted every time the input residual block passes through the encoder, the input residual block is added into the hidden, when each stage of decoding starts, the corresponding encoder characteristics are accessed through the concat operation, when each stage of decoding ends, the corresponding encoder characteristics are accessed again, and the decoder combines the characteristics of different semantic levels of the encoder through multiple skip-connections, so that the detail information can be effectively recovered.
Preferably, the latent model performs feature normalization processing on the input data in each basic block, and the specific formula is as follows:
out=x*(scale+1)+shift
wherein scale represents a scaling parameter, shift represents a displacement parameter, both parameters are obtained during the training process, x represents an input, and out represents an output.
Preferably, the specific formula for the potential model using the Swish activation function is as follows:
where f (x) represents the Swish activation function, and x represents the input vector from the upper layer neural network, which is a parameter that controls the smoothness of the activation function.
Further, in the noise prediction network, simpleGate is used instead of the activation function, and the specific formula is as follows:
SimpleGate(x)=x 1 *x 2 x 1 ,x 2 =chunk(x)
wherein, chunk represents a function of dividing the input feature map into two according to the channel direction, namely two feature maps obtained by chunk.
Further, a kernel-based attention module KBLOCK is used in the noise prediction network, and the specific process comprises the following steps:
kernel Basis Attention: i.e., KBA module, encodes spatial information by adaptively fusing the learnable kernel basis, which is shared across all spatial locations and images to capture a common spatial pattern, a set of learnable kernel basis W is learned by KBA given input profile X. The learnable kernel basis W contains N grouped convolution kernels, the number of channels and kernel size of the grouped convolution kernels are C and K, respectively, the number of packets is set to C/4 to balance performance and efficiency, and KBA then adaptively fuses these basis at each location to encode spatial information.
Fusion Coefficients Prediction: fusion coefficient F prediction, a lightweight convolutional branching network is used to predict F for each location. The convolution branches contain two layers: a 3x3 packet convolutional layer, the number of channels reduced to N, the group size N; a SimpleGate activation function followed by a 3x3 convolutional layer.
Kernel Basis Fusion: the inputs are the predicted fusion coefficient F and the learned kernel weight W, and for each spatial position (i, j), the fused kernel weights M [ i, j ] are obtained by linearly combining kernel basis, with the following specific formulas:
M[i,j]=∑F[i,j,t]*W[t]
wherein F [ i, j, t ] is the fusion coefficient of the t-th kernel at the position (i, j), and W [ t ] is the t-th learnable kernel.
Further, the input feature mapping X is subjected to 1X 1 convolution transformation to obtain a fusion kernel weight M [ i, j ]]Feature map X for adaptive convolution e And calculates an output feature map X of the position (i, j) by grouping convolution 0 [i,j]The specific formula is as follows:
X 0 [i,j]=GroupConv(X e [i,j],M[i,j])
where GroupConv () represents a packet convolution operation, X 0 Representing the output characteristic diagram, X e Feature map representing adaptive convolution, M [ i, j]Representing the fused kernel weights.
Further, by calculating the total loss function of the potential model using the reconstruction loss, the representation loss, and the consistency loss, the specific formulas are as follows:
L=L rec (LQ,LQ_real)+L rep (GT,GT_fake)+0.001*L reg (LQ,LQ_l)
wherein L represents the total loss function, L rec Representing reconstruction loss, L rep To represent loss, L reg Representing a consistency loss, LQ represents a low resolution image, LQ_real represents a decoded low resolutionThe image, GT, represents a high resolution image, gt_fake represents an image generated using hidden features of low resolution and a potential representation of high resolution, lq_l represents a potential representation of a low resolution image.
Specifically, reconstruction loss L rec And represents loss L rep The specific formula of (2) is as follows:
L rec (LQ,LQ_real)=∑|LQ-LQ_real|
LQ_real=LQ_l+LQ_h
wherein L is rec (LQ, lq_real) represents a loss between the low resolution image and the decoded low resolution image, LQ represents the low resolution image, lq_real represents an image generated using the hidden feature of the low resolution and the potential representation of the low resolution, lq_l represents the potential representation of the low resolution image, and lq_h represents the hidden feature of the low resolution image.
L rep (GT,GT_fake)=∑|GT-GT_fake|
GT_fake=GT_l+LQ_h
Wherein L is rep (GT, gt_fake) represents a loss between the low resolution image and the decoded low resolution image, GT represents the high resolution image, gt_fake represents an image generated using hidden features of low resolution and potential representations of high resolution, gt_l represents potential representations of high resolution images, lq_h represents hidden features of low resolution images.
Specifically, consistency loss L reg The specific formula of (LQ, LQ_l) is as follows:
L reg (LQ,LQ_l)=∑|LQ_μ-LQ_l_μ|+|LQ_σ-LQ_l_σ|
where lq_μ, lq_l_μ represent the mean of the low resolution picture and its potential representation, lq_σ, lq_l_σ represent the variance of the low resolution picture and its potential representation.
Further, the specific formula of the loss of the noise prediction network is as follows:
L=∑|δ-δ t |
wherein, delta represents the noise actually added, delta t Representing the predicted noise.
Preferably, the mean regression SDE used by the diffusion model includes a forward process and a reverse process, and the specific formula of the forward process is as follows:
dx=θ t (μ-x)dt+σ(t)dω
wherein dω represents Gaussian noise, θ t Representing a super parameter, σ (t) representing the parameter of the gaussian noise fluctuation over time, dt representing a short period of time, μ representing the low resolution image, x representing the distribution of the generated image, dx representing the amount of change in the generated image within dt.
Specifically, the specific formula of the reverse process is as follows:
wherein,represents a score function, which can be approximated by a formula and predicted using a noise prediction network, dω represents Gaussian noise, θ t Representing a super parameter, σ (t) representing the parameter of the gaussian noise fluctuation over time, dt representing a short period of time, μ representing the low resolution image, x representing the distribution of the generated image, dx representing the amount of change in the generated image within dt.
It should be noted that the forward process is a process of continuously adding noise to the image, and the reverse process is a process of continuously reducing noise.
S3: and inputting the low-image-quality image into the trained potential diffusion model to obtain a corresponding super-resolution generated image.
Further, the specific formula of the color normalization for adaptively performing the generated result is as follows:
where x represents the generated super-resolution image, y represents the input low-resolution image, adaIN represents the adaptive color normalization, μ (y) represents the mean of the low-resolution image, σ (y) represents the variance of the low-resolution image, μ (x) represents the mean of the low-resolution image, and σ (x) represents the variance of the low-resolution image.
Preferably, a progressive patch aggregate sampling algorithm is employed for larger images, the process comprising: the image is divided into a plurality of patches containing overlapping portions, each of which is sampled, and a weighting map is generated for each patch using a central gaussian kernel, and the overlapping pixels are weighted according to the weighting maps.
Further, the embodiment also provides an image super-resolution reconstruction system based on a diffusion model, which comprises a data reading module, a data processing module and a data processing module, wherein the data reading module is used for reading the picture data in the data set and performing preprocessing operation before starting network training, and the preprocessing operation comprises picture size adjustment, random clipping, random horizontal inversion and normalization; the training module is used for training a potential model of compressed sensing according to HR and LR pictures in the data set, generating a potential representation input diffusion model by using a potential model compressed image for training, and calculating a direct loss of predicted noise and actual noise in the training process and an iterative optimization model; and the image generation module is used for inputting the super-resolution low-quality image into the potential diffusion model after the network training is finished, so as to obtain a super-resolution image corresponding to the original image content.
The embodiment also provides a computer device, which is suitable for the situation of the image super-resolution reconstruction method based on the diffusion model, and comprises a memory and a processor; the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the image super-resolution reconstruction method based on the diffusion model as proposed in the embodiment.
The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
The present embodiment also provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of: acquiring paired data sets simultaneously containing high-resolution images and corresponding low-resolution images; iteratively training a latent model on the paired data sets, adding a kernel-based attention module to the noise prediction network to fix the latent model, and simultaneously training a diffusion model in a latent space; and inputting the low-image-quality image into the trained potential diffusion model to obtain a corresponding super-resolution generated image.
In summary, the invention makes a data set based on the degradation characteristics of high-quality to low-quality images, improves the noise prediction network of the diffusion model and carries out iterative training in a potential space, so that the training speed is faster and the effect is better; compared with the traditional network, the generated network constructed by the invention is more suitable for super-resolution image reconstruction and has a certain practical significance; the method can quickly, effectively and reliably synthesize more natural super-resolution images, improve the sense of reality and visual quality of the generated images, and expand the application range and application scenes.
Example 2
Referring to fig. 1 to 3, in order to verify the advantageous effects of the second embodiment of the present invention based on the first embodiment, scientific demonstration is performed through economic benefit calculation and simulation experiments.
Specifically, the training dataset consisted of DIV2K (800 pictures), flickr2K (2650 pictures), OST (10324 pictures), and a total of 13774 pictures. And processing the training data set, and downsampling the picture by using Bicubic according to the super-resolution multiple to obtain a low-resolution picture. Thereby yielding a complete training data set containing pairs of high/low resolution images. The evaluation dataset used DIV2K, which contained 100 pairs of high/low resolution images. The test dataset used was Set5, set14, BSD100, urban100, manga109.
Further, the potential model is constructed based on U-Net, the encoder and the decoder are all 4 layers, each layer comprises a plurality of residual blocks and an up/down sampling layer, and the potential model ensures the consistency of image reconstruction by utilizing a jump connection mechanism of the U-Net. Noise prediction network of diffusion model is based on U-Net of refsion as backbone network, improves network structure: image features are extracted using KBblock as shown in fig. 3 instead of NAFblock. Adding a nuclear-based attention module to increase the sensitivity of the network to the spatial information; the EAC module is used for replacing the SCA module, so that the channel attention is learned more effectively.
Further, the potential model is trained on the constructed training data set, the learning rate is set to 3e-5, the optimizer is set to Lion, the patch_size is set to 512, and the training is iterated 300000 times. And the loss function is L1 loss, and network parameters are iteratively updated according to the loss function, so that a trained potential model is obtained. The potential representation of the high/low resolution image is generated using the encoder of the potential model that has been trained, the potential representation is input into the diffusion model for training, the learning rate is set to 3e-5, the optimizer is set to Lion, the patch_size is set to 512, and the training is iterated 800000 times. And adopting an L1 loss function, and iteratively updating network parameters according to the loss function to obtain a trained diffusion model.
Further, for a low-quality image to be super-resolved, inputting the low-quality image into a latent model to obtain a compressed latent representation Z t The potential representation is then input into a diffusion model, and the diffusion process iterates to obtain the super-divided potential representation Z 0 ,Z 0 And decoding by using the potential model to obtain a final super-resolution image.
Preferably, the method has the advantages that the network structure is improved, the generation effect of the diffusion model is fully utilized, the high-quality super-resolution image can be quickly generated, the method is simple to operate, the training speed is high, the generation effect is good, and a new possibility is opened up for expanding the super-resolution application scene and improving the visual quality.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.
Claims (10)
1. The image super-resolution reconstruction method based on the diffusion model is characterized by comprising the following steps of: comprising the steps of (a) a step of,
acquiring paired data sets simultaneously containing high-resolution images and corresponding low-resolution images;
iteratively training a latent model on the paired data sets, adding a kernel-based attention module to the noise prediction network to fix the latent model, and simultaneously training a diffusion model in a latent space;
and inputting the low-image-quality image into the trained potential diffusion model to obtain a corresponding super-resolution generated image.
2. The diffusion model-based image super-resolution reconstruction method as claimed in claim 1, wherein: the iterative training of the latent model on the paired data sets comprises the steps of:
constructing a pair dataset using the high resolution HR image and the corresponding low resolution HR image;
training the potential model by utilizing the paired data sets, and matching the input image with the corresponding potential representation thereof through the encoder in the training process;
converting the potential representation into a high resolution reconstructed image by a decoder;
optimizing parameters of the potential model according to gradients of the loss function, and determining to stop training according to a preset convergence condition or the maximum iteration number;
generating a potential representation of the HR image and the LR image with an encoder of the potential model that has been trained;
and introducing a kernel-based attention module into the noise prediction network of the diffusion model, and training according to the difference between the output of the noise prediction network and the real image.
3. The diffusion model-based image super-resolution reconstruction method as claimed in claim 1, wherein: the adding a kernel-based attention module to the noise prediction network to fix the potential model comprises the following steps:
encoding spatial information by adaptively fusing a learnable kernel basis function, and capturing a spatial mode in an image;
predicting a fusion coefficient F of each position by using a lightweight convolution branch network;
calculating a fused kernel weight for each spatial location using the predicted fusion coefficient F and the learned kernel basis function;
performing convolution transformation on the input feature map X to obtain a feature map which is subjected to self-adaptive convolution by fusing convolution kernel weights, and calculating an output feature map X of the position (i, j) through grouping convolution 0 [i,j];
The total loss function of the potential model is calculated by using the reconstruction loss, the representation loss, and the consistency loss.
4. The diffusion model-based image super-resolution reconstruction method as claimed in claim 3, wherein: the specific formula of the kernel weight is as follows:
M[i,j]=∑F[i,j,t]*W[t]
wherein Fi, j, t is the fusion coefficient of the t-th convolution kernel at position (i, j), and Wt is the t-th learnable basis function;
the output characteristic diagram X 0 [i,j]The specific formula of (2) is as follows:
X 0 [i,j]=GroupConv(X e [i,j],M[i,j])
where GroupConv () represents a packet convolution operation, X 0 Representing the output characteristic diagram, X e Feature map representing adaptive convolution, M [ i, j]Representing the fused convolution kernel weights.
5. The diffusion model-based image super-resolution reconstruction method as claimed in claim 3, wherein: the specific formula of the total loss function is as follows:
L=L rec (LQ,LQ_real)+L rep (GT,GT_fake)+0.001*L reg (LQ,LQ_l)
wherein L represents the total loss function, L rec Representing reconstruction loss, L rep To represent loss, L reg Representing a loss of consistency, LQ represents a low resolution image, lq_real represents a decoded low resolution image, GT represents a high resolution image, gt_fake represents an image generated using hidden features of low resolution and a potential representation of high resolution, lq_l represents a potential representation of a low resolution image;
the reconstruction loss L rec The specific formula of (2) is as follows:
L rec (LQ,LQ_real)=∑|LQ-LQ_real|
LQ_real=LQ_l+LQ_h
wherein L is rec (LQ, lq_real) represents a loss between the low resolution image and the decoded low resolution image, LQ represents the low resolution image, lq_real represents an image generated using the hidden feature of the low resolution and the potential representation of the low resolution, lq_l represents the potential representation of the low resolution image, lq_h represents the hidden feature of the low resolution image;
the representation loss L rep The specific formula of (2) is as follows:
L rep (GT,GT_fake)=∑|GT-GT_fake|
GT_fake=GT_l+LQ_h
wherein L is rep (GT, gt_fake) represents a loss between the low resolution image and the decoded low resolution image, GT represents the high resolution image, gt_fake represents an image generated using hidden features of low resolution and potential representations of high resolution, gt_l represents potential representations of high resolution image, lq_h represents hidden features of low resolution image;
the consistency loss L reg The specific formula of (LQ, LQ_l) is as follows:
L reg (LQ,LQ_l)=∑|LQ_μ-LQ_l_μ|+|LQ_σ-LQ_l_σ|
where lq_μ, lq_l_μ represent the mean of the low resolution picture and its potential representation, lq_σ, lq_l_σ represent the variance of the low resolution picture and its potential representation.
6. The diffusion model-based image super-resolution reconstruction method as claimed in claim 1, wherein: the diffusion model includes a forward process and a reverse process,
the specific formula of the forward process is as follows:
dx=θ t (μ-x)dt+σ(t)dω
wherein dω represents Gaussian noise, θ t Representing a super parameter, sigma (t) representing a parameter of the gaussian noise fluctuation with time, dt representing a short period of time, μ representing a low resolution image, x representing a distribution of the generated image, dx representing a variation of the generated image within dt;
the specific formula of the reversing process is as follows:
wherein,represents score function, dω represents Gaussian noise, θ t Representing a super parameter, σ (t) representing the parameter of the gaussian noise fluctuation over time, dt representing a short period of time, μ representing the low resolution image, x representing the distribution of the generated image, dx representing the amount of change in the generated image within dt.
7. The diffusion model-based image super-resolution reconstruction method as claimed in claim 1, wherein: the obtaining of the corresponding super resolution generated image includes,
the specific formula of the color normalization for the self-adapting generated result is as follows:
where x represents the generated super-resolution image, y represents the input low-resolution image, adaIN represents the adaptive color normalization, μ (y) represents the mean of the low-resolution image, σ (y) represents the variance of the low-resolution image, μ (x) represents the mean of the low-resolution image, and σ (x) represents the variance of the low-resolution image.
8. An image super-resolution reconstruction system based on a diffusion model, which is based on the image super-resolution reconstruction method based on the diffusion model according to any one of claims 1 to 7, and is characterized in that: also included is a method of manufacturing a semiconductor device,
the data reading module is used for reading the picture data in the data set and performing preprocessing operations before starting the network training, wherein the preprocessing operations comprise picture size adjustment, random cutting, random horizontal inversion and normalization;
the training module is used for training a potential model of compressed sensing according to HR and LR pictures in the data set, generating a potential representation input diffusion model by using a potential model compressed image for training, and calculating a direct loss of predicted noise and actual noise in the training process and an iterative optimization model;
and the image generation module is used for inputting the super-resolution low-quality image into the potential diffusion model after the network training is finished, so as to obtain a super-resolution image corresponding to the original image content.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that: the processor, when executing the computer program, implements the steps of the diffusion model-based image super-resolution reconstruction method according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program when executed by a processor implements the steps of the diffusion model-based image super-resolution reconstruction method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311584004.4A CN117522694A (en) | 2023-11-24 | 2023-11-24 | Diffusion model-based image super-resolution reconstruction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311584004.4A CN117522694A (en) | 2023-11-24 | 2023-11-24 | Diffusion model-based image super-resolution reconstruction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117522694A true CN117522694A (en) | 2024-02-06 |
Family
ID=89760545
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311584004.4A Pending CN117522694A (en) | 2023-11-24 | 2023-11-24 | Diffusion model-based image super-resolution reconstruction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117522694A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117743768A (en) * | 2024-02-21 | 2024-03-22 | 山东大学 | Signal denoising method and system based on denoising generation countermeasure network and diffusion model |
CN117974450A (en) * | 2024-03-28 | 2024-05-03 | 华南理工大学 | Image super-resolution method, system and medium based on gradient optimization diffusion model |
CN118485574A (en) * | 2024-03-25 | 2024-08-13 | 南京航空航天大学 | Blind image super-resolution reconstruction method based on diffusion model and attention mechanism |
CN118536571A (en) * | 2024-05-31 | 2024-08-23 | 北京无问芯穹科技有限公司 | Method, system, equipment and storage medium for constructing diffusion converter model |
-
2023
- 2023-11-24 CN CN202311584004.4A patent/CN117522694A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117743768A (en) * | 2024-02-21 | 2024-03-22 | 山东大学 | Signal denoising method and system based on denoising generation countermeasure network and diffusion model |
CN117743768B (en) * | 2024-02-21 | 2024-05-17 | 山东大学 | Signal denoising method and system based on denoising generation countermeasure network and diffusion model |
CN118485574A (en) * | 2024-03-25 | 2024-08-13 | 南京航空航天大学 | Blind image super-resolution reconstruction method based on diffusion model and attention mechanism |
CN117974450A (en) * | 2024-03-28 | 2024-05-03 | 华南理工大学 | Image super-resolution method, system and medium based on gradient optimization diffusion model |
CN118536571A (en) * | 2024-05-31 | 2024-08-23 | 北京无问芯穹科技有限公司 | Method, system, equipment and storage medium for constructing diffusion converter model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117522694A (en) | Diffusion model-based image super-resolution reconstruction method and system | |
CN111915484B (en) | Reference image guiding super-resolution method based on dense matching and self-adaptive fusion | |
CN110136062B (en) | Super-resolution reconstruction method combining semantic segmentation | |
CN110349087B (en) | RGB-D image high-quality grid generation method based on adaptive convolution | |
Singla et al. | A review on Single Image Super Resolution techniques using generative adversarial network | |
CN110060204B (en) | Single image super-resolution method based on reversible network | |
US11823349B2 (en) | Image generators with conditionally-independent pixel synthesis | |
CN110706303B (en) | Face image generation method based on GANs | |
US20220245910A1 (en) | Mixture of volumetric primitives for efficient neural rendering | |
CN111242846A (en) | Fine-grained scale image super-resolution method based on non-local enhancement network | |
CN113538234A (en) | Remote sensing image super-resolution reconstruction method based on lightweight generation model | |
CN114897694B (en) | Image super-resolution reconstruction method based on mixed attention and double-layer supervision | |
Liu et al. | Facial image inpainting using attention-based multi-level generative network | |
CN114830168B (en) | Image reconstruction method, electronic device, and computer-readable storage medium | |
CN116152061A (en) | Super-resolution reconstruction method based on fuzzy core estimation | |
Yu et al. | MagConv: Mask-guided convolution for image inpainting | |
CN114529793A (en) | Depth image restoration system and method based on gating cycle feature fusion | |
Zhu et al. | In-Domain GAN Inversion for Faithful Reconstruction and Editability | |
CN116385265B (en) | Training method and device for image super-resolution network | |
CN116912148B (en) | Image enhancement method, device, computer equipment and computer readable storage medium | |
CN112261415B (en) | Image compression coding method based on overfitting convolution self-coding network | |
CN114897702A (en) | Image reconstruction method, image reconstruction device, computer equipment and storage medium | |
CN117896526B (en) | Video frame interpolation method and system based on bidirectional coding structure | |
CN118333861B (en) | Remote sensing image reconstruction method, system, device and medium | |
CN117557462A (en) | Training and video playing method and device of image reconstruction model and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |