CN111292264B

CN111292264B - Image high dynamic range reconstruction method based on deep learning

Info

Publication number: CN111292264B
Application number: CN202010072803.3A
Authority: CN
Inventors: 肖春霞; 刘文焘
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2023-04-21
Anticipated expiration: 2040-01-21
Also published as: CN111292264A

Abstract

The invention discloses an image high dynamic range reconstruction method based on deep learning, and belongs to the field of computational photography and digital image processing. The invention establishes a mapping network from a single LDR image to an HDR image by adopting a method based on deep learning. The method first sequentially generates LDR training data, HDR sample labels with aligned brightness units and mask images of high brightness areas from the collected HDR data set. The neural network is then constructed and trained to obtain a network model with an LDR to HDR mapping relationship. And finally, directly inputting the LDR image into the network model by utilizing the generated network model obtained by training, and outputting the reconstructed HDR image. The method can effectively reconstruct the dynamic range of the real scene from a single common digital image, and can be used for HDR simulation effect display of the common digital image or providing more realistic rendering effect for the image illumination technology.

Description

Image high dynamic range reconstruction method based on deep learning

Technical Field

The invention belongs to the field of computational photography and digital image processing, and relates to a high dynamic range reconstruction method of an image, in particular to a high dynamic range reconstruction method of an image based on deep learning.

Background

The high dynamic range imaging (High Dynamic Range Imaging, HDRI) technique is an image representation method used to achieve a larger exposure range than ordinary digital images, and the high dynamic range (High Dynamic Range, HDR) image can provide a larger brightness variation range and more shading detail than ordinary digital images, which enables the HDR image to present brightness variation information more closely to a real scene. In recent years, as the continuous evolution of display devices and the demand for physical-based rendering have increased, high dynamic range imaging techniques have become increasingly important in practical applications. However, the current method of directly acquiring HDR images requires high expertise, is costly and time consuming. For methods that reconstruct HDR from a single common digital image, conventional methods can only reduce the problem's inappropriateness as much as possible by adding constraints, which makes them effective only for certain specific application scenarios. Some scholars have also done some very effective work based on deep learning, but they have failed to take into account factors such as luminance level invariance between HDR pictures, resulting in limited reconstruction effectiveness. The method can effectively reconstruct the dynamic range of the real scene from a single common digital image, and can be used for HDR simulation effect display of the common digital image or providing more realistic rendering effect for the image illumination technology.

Disclosure of Invention

The invention aims to recover the high dynamic range image of the original scene as much as possible from a single common digital image. Here, the normal digital image refers to a low dynamic range (Low Dynamic Range, LDR) image stored in a 256-tone scale at 8-bit color depth, and the high dynamic range image refers to a high dynamic range image stored in a format of ". EXR" or ". HDR" which approximates a change in the brightness of a real scene.

In order to achieve the above purpose, the invention establishes a mapping network from an LDR image to an HDR image by adopting a deep learning-based method, and carries out learning training on the network through training data so as to establish an end-to-end mapping relationship from the LDR image to the HDR image, and the whole frame diagram is shown in figure 1. The algorithm is divided into two parts, data preprocessing and deep neural network training. The data preprocessing section includes three sections of generation of training sample pairs, alignment of HDR image luminance units, and generation of image highlight region masks. The neural network architecture is divided into a basic HDR reconstruction network and a training optimization network, as shown in fig. 2. The loss function comprises three terms, namely, scale invariant loss of the HDR reconstructed image, cross entropy loss of highlight region classification and generation countermeasure loss.

The method specifically comprises the following contents and steps:

1. data preprocessing

1) Generating LDR training sample input

Before using the deep neural network for supervised training, a training data set corresponding to the network input and output needs to be acquired. The training dataset comprises a number of LDR-HDR image pairs, wherein the HDR image data may use existing available HDR pictures, the data being a label for training samples for training of the supervising network; the generation method of the LDR image data, which is used as sample input corresponding to the HDR image, has two ways, namely, the generation from the HDR image to the LDR image is completed by using a tone mapping algorithm, and the LDR image is obtained by performing simulated shooting by taking the HDR image as a simulated scene alignment in a mode of constructing a virtual camera.

Generating an LDR image using a tone mapping algorithm: selecting a proper tone mapping algorithm, and directly taking the HDR image as an algorithm input to obtain a corresponding LDR image output.

Obtaining an LDR image by constructing a virtual camera: firstly, determining a value range of a dynamic range of a virtual camera based on a common digital single-phase inverter, and randomly selecting one value in the range for each acquisition of an LDR image as the dynamic range of the camera for the analog shooting; then the virtual camera automatically exposes according to the input HDR image, takes a boundary value for pixels with brightness exceeding the dynamic range of the virtual camera, and then linearly maps the boundary value to the low dynamic range of the LDR image; and finally, mapping the obtained image into a common digital image from a linear space through a randomly selected approximate camera response function, and obtaining the required LDR image.

2) Aligning HDR sample tags

For HDR images stored in the relative luminance domain, they are aligned in luminance units before they are labeled as training samples. Let the original HDR image be H, the LDR image is converted to linear space and normalized to [0,1]]Let L, H _l,c ,L _l,c The pixel values of the image at the position l and the channel c are aligned by the following methods:

wherein the method comprises the steps of

For an aligned HDR image, m _l,c Is defined as

Where τ is a constant of [0,1 ]. The aligned HDR image and the corresponding LDR image form a training sample pair for training the neural network.

3) Generating a high light mask image

After the aligned HDR image is obtained, a mask image of a high brightness area in the image can be obtained through a binarization mode. The region with the median value of 1 in the mask image represents an object or surface with higher brightness in the scene, such as a light source, a strong light reflecting surface, and the like. These highlight regions tend to be cropped in the LDR image due to overexposure, and the highlight mask image generated from the HDR image can be used as a sample label for the training portion of the network optimization for optimizing the training process of the neural network.

2. Training of neural networks

1) Neural network structure

The network structure is mainly divided into two parts of a generation network and a discrimination network, and the structure schematic diagram is shown in figure 3. The generating network is in a U-Net structure, the network receives an LDR image as input, and after a coding network formed by a ResNet50 model and a 6-layer 'up-sampling+convolution layer' module form a decoding network, the network outputs an HDR image and a high light mask image respectively. The HDR image output by the network is an HDR reconstruction image obtained according to the LDR input image, and the highlight mask image is used as the prediction of the network to the highlight region in the LDR image and can be used as the data for optimizing the network training. The discriminator network is a full convolution network composed of 4 layers of convolution layers, the discriminator receives an HDR image and a high light mask image as inputs, and outputs a feature map representing the probability that the input HDR image is a true HDR image or a false HDR image generated by the network, and the feature map can be used as data for training the neural network.

2) Training method of neural network

The invention adopts a supervised learning mode to train the network. The training adopts an Adam optimizer to respectively and successively perform back propagation optimization on a generating network and a judging network, wherein the generating network comprises three groups of optimization functions, and the loss functions are defined as follows:

L _G ＝α ₁ L _hdr +α ₂ L _mask +α ₃ L _gan

the loss is controlled by three loss functions, including scale invariant loss of the HDR reconstructed image, cross entropy loss of the highlight region classification, and generation of contrast loss, respectively.

Scale invariant loss of HDR reconstructed image: the term loss function is based on the HDR image having a scale invariance proposal in the relative luminance domain, which controls the HDR image output by the network to be as close as possible to the HDR label value. The definition is as follows:

where y represents the HDR image output by the network,

for the target image, subscripts l, c denote pixel position and color channel, respectively, +.>

Representing the difference between the network output and the sample label in the log domain, e is a small value that prevents log calculation to zero. The first term of the loss is common L ₂ After introducing the second term, the term loss is only affected by the difference between the predicted value and the sample label in the calculation, and is irrelevant to the predicted value or the actual size of the sample label.

Cross entropy loss for highlight region classification: the loss control network detects a high brightness region in the picture. The network output high light mask image is a classification result of the high light region or the non-high light region in the input LDR image, and the result should be as close as possible to the high light mask image generated in the preprocessing step. The loss function is defined as follows:

wherein m is a number selected from the group consisting of,

respectively a network predicted value and a tag value.

Generating a countering loss: this loss enables the network predicted HDR image to be distributed as closely as possible to the real HDR image, preventing the prediction result from being reduced in value only from the label result and disregarding the overall distribution difference. The loss function is defined as follows:

wherein D (y) is a calculation result for discriminating the network to generate a network output as an input.

The loss function of the judging network is standard WGAN-GP loss, and the judging network is controlled to judge whether the image input into the judging network is a real HDR image or not as accurately as possible. The definition is as follows:

where E is a random number of [0,1 ].

According to the training method, the network is trained, and after the loss function converges, the network is generated, so that a mapping relation from a single LDR image to an HDR image is established. The trained generation network can be utilized to complete the aim of reconstructing the real high dynamic range image from a single common digital image as far as possible.

Compared with the prior art, the invention has the following advantages:

1. the invention constructs an end-to-end neural network, and can reconstruct a true HDR image by only one picture without manual interaction;

2. the invention aligns and trains the network to the HDR data based on the scale invariance, can achieve better rebuilding effect;

3. according to the invention, the network training is optimized by generating the highlight mask image, so that a better reconstruction effect of the high-brightness region can be obtained.

Drawings

FIG. 1 is a general frame diagram of the present invention;

FIG. 2 is a block diagram of a deep neural network of the present invention;

fig. 3 is a schematic view of the effect of the present invention.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1, the invention provides a deep learning-based image high dynamic range reconstruction method, which comprises the following steps:

step 1, preprocessing an HDR data set to construct training data of a neural network. Firstly, generating LDR data from a collected HDR data set, then using the LDR-HDR data to align the HDR data, then generating a high-light mask image by utilizing the aligned HDR image, and finally integrating the LDR data, the HDR data and the HDR data as training data of a neural network. Wherein the LDR data is data input and the aligned HDR data and high light mask image are label data.

Step 2, constructing and training a neural network to obtain a network model with an LDR-to-HDR mapping relation. And carrying out back propagation training on the generating network and the judging network in turn according to the training strategy until the loss function converges. At this time, the generated network part is the network model finally used for reconstructing the high dynamic range image.

And 3, directly inputting the LDR image to be reconstructed into the network model according to the generated network model obtained by training in the step 2, and outputting the reconstructed HDR image.

The method is described in detail below with reference to examples.

1. Data preprocessing

The HDR data set is composed of the existing public HDR data sets, and a group of data sets with the same size and type are formed by cutting, scaling and other operations on the collected data. According to this integrated dataset, LDR data is acquired by applying a display-adapted tone mapping (Display Adaptive Tone Mapping) algorithm and the aforementioned virtual camera shooting approach, respectively. Regarding tone mapping, the method of this embodiment is selected according to the type of picture at the time of actual application, for example, if most of the pictures at the time of application are pictures taken without post-processing. If the picture is a picture which is processed, a mode which is similar to the post-processing effect is selected. Specifically, for each HDR image, an LDR image is obtained by applying a tone mapping algorithm adapted to display, and then a virtual camera with random parameters is applied to perform shooting once to obtain one LDR image, that is, each HDR image generates two LDR images obtained by different methods. The generated LDR image is converted to linear space and normalized from integer values in the range [0,255] to fractional values of [0,1 ].

The HDR image therein is then luminance aligned based on each pair of LDR-HDR images. The alignment method is performed with reference to the formula in the description:

wherein the method comprises the steps of

For an aligned HDR image, m _l,c Is defined as

Where τ may be a constant of 0,1, where τ=0.08 is taken. After applying this formula to each pair of LDR-HDR images, aligned HDR image data is obtained, which will be the sample label when training the network, while LDR image data is input as samples.

Finally, based on the aligned HDR data, the highlight mask image is calculated by the following formula:

wherein the method comprises the steps of

For the channel mean image of the aligned HDR image, t is a constant, where t=e is taken ^0.1 . The mask image will serve as another sample tag in training the network, together with the HDR image, to supervise the learning process of the network.

2. Neural network training

The structure of the neural network is constructed as shown in fig. 2. Specifically, for the ResNet50 model, the model trained in the existing classification task based on the ImageNet dataset is used here as the initialization value for the partial network weights; each of the remaining network blocks represents a convolution block consisting of a convolution layer and an instance normalization operation and a relu activation function; the input of the decoder part in the generating network is formed by splicing the output of the convolution block of the upper layer and the output of the symmetrical position of the decoder; for the discriminator network part, the input of the discriminator network part consists of the output of the generating network, and after four layers of convolution blocks, a probability feature map for judging whether the input HDR image is a real image is output.

The output of the generating network and the sample label data corresponding to the input are combined with the probability feature diagram calculated by the judging network according to the output data, and the generator loss is calculated according to the following formula:

L _G ＝α ₁ L _hdr +α ₂ L _mask +α ₃ L _gan

where y represents the HDR image output by the network,

wherein m is a number selected from the group consisting of,

respectively a network predicted value and a tag value.

where E is a random number of [0,1 ].

Wherein alpha is ₁ ,α ₂ ,α ₃ 1,0.1,0.02, the specific calculation formula of each loss is described in the summary of the invention. And according to the calculated loss value, carrying out back propagation on the network by using an Adam optimization algorithm and updating the weight. In addition, after the weight of the generated network is updated each time, the loss of the discrimination network needs to be calculated continuously and the weight of the discrimination network needs to be updated, and the update algorithm of the method also uses the Adam algorithm, and the specific formula is described in the summary of the invention.

According to the training method, one or more pairs of training data are delivered for network training each time, and the whole training data set is sequentially and circularly iterated until the loss function converges, and the network is trained.

(III) network model application

And after the network training is finished, extracting the network generating the network part and the weight parameters thereof to obtain a final reconstruction model of the high dynamic range image. With the model, only one LDR picture is required as input at a time, and an approximately real HDR picture can be obtained. Fig. 3 shows an example of an application, in which a network model, i.e. a generated network part in a neural network trained by the method described above, can accept LDR pictures of arbitrary size as input and directly output HDR reconstructed images at the same size.

The invention provides a method for reconstructing a high dynamic range of a single common digital image based on deep learning, which can reconstruct the image of a general scene effectively and realistically. The invention has wide application, can adapt to scenes with different requirements according to training different data sets, and can be applied for multiple times by training the same training set once.

Claims

1. The image high dynamic range reconstruction method based on deep learning is characterized by comprising the following steps of:

step 1, a neural network is established based on a deep learning method, and comprises a generation network from a low dynamic range image to a high dynamic range image and a discrimination network for discriminating whether the high dynamic network image is real;

step 2, preprocessing an HDR data set to form training data, wherein the data preprocessing is divided into three parts of generation of LDR data, alignment of HDR image brightness units and generation of image highlight region masks, the LDR data obtained after preprocessing is used as training input data for generating a network and is output as an HDR image and a highlight mask image, the aligned HDR data and the highlight mask image after the data preprocessing are used as sample label data for training, the network is judged to accept one HDR image and the highlight mask image as input, and a feature map representing the probability that the input HDR image is a real HDR image or a false HDR image generated by the network is output;

step 3, training the neural network based on three loss functions, training the neural network in a supervised learning mode, performing back propagation optimization on a generation network and a discrimination network successively by adopting an Adam optimizer, wherein the three loss functions are respectively a scale invariant loss function of an HDR reconstructed image, a cross entropy loss function of highlight region classification and a generation antagonism loss function, and the loss functions are defined as follows:

L _G ＝α ₁ L _hdr +α ₂ L _mask +α ₃ L _gan

the scale invariant loss function of the HDR reconstructed image is defined as follows:

where y represents the HDR image output by the network,

for an aligned HDR image, +.>

Representing the difference between the network output and the sample label in the logarithmic domain, e being a small value preventing logarithmic calculation to zero, subscript l, c representing pixel position and color channel, respectively;

a cross entropy loss function for highlight region classification, defined as follows:

wherein the method comprises the steps of

Respectively a network predicted value and a label value;

the challenge loss function is generated as defined below:

wherein D (y) is a calculation result for discriminating the network to generate network output as input;

training the network according to the loss function, and extracting and generating a network model as a final algorithm model after the loss function converges.

2. The method according to claim 1, characterized in that: the low dynamic range image refers to a low dynamic range image stored at 8-bit color depth, 256-tone scale, and the high dynamic range image refers to a high dynamic range image stored in ". EXR" or ". HDR" format that approximates the change in the brightness of a real scene.

3. The method according to claim 1, characterized in that: the neural network described in the step 1 comprises a generating network and a judging network, wherein the generating network is of a U-Net structure, the network receives an LDR image as input, and after a decoding network is formed by a coding network formed by a ResNet50 model and a 6-layer 'up-sampling+convolution layer' module, an HDR image and a highlight mask image are respectively output; the judging network is a full convolution network formed by 4 layers of convolution layers, receives an HDR image and a high light mask image as inputs, and outputs a feature map representing the probability that the input HDR image is a real HDR image or a false HDR image generated by the network.

4. The method according to claim 1, characterized in that: the data preprocessing described in the step 2 comprises the following specific processes:

step 2.1, generating LDR data, generating LDR training sample input, respectively shooting each HDR image by using a tone mapping algorithm and a virtual camera to obtain an LDR image, selecting a proper tone mapping algorithm, and directly inputting the HDR image as the algorithm to obtain a corresponding LDR image output; meanwhile, obtaining an LDR image by constructing a virtual camera, firstly determining a value range of a dynamic range of the virtual camera based on a common digital single-phase inverter, randomly selecting a value in the range as the dynamic range of the camera for simulating shooting every time the LDR image is obtained, then automatically exposing the virtual camera according to an input HDR image, taking a boundary value for a pixel with brightness exceeding the dynamic range of the virtual camera, linearly mapping the pixel to a low dynamic range of the LDR image, and finally mapping the obtained image from a linear space to a common digital image through a randomly selected approximate camera response function to obtain the required LDR image;

step 2.2, alignment of HDR image luminance units for preservingHDR images with relative luminance domains exist, aligning their luminance units before they are labeled as training samples; let the original HDR image be H, the LDR image is converted to linear space and normalized to [0,1]]Let L, H _l，c ，L _l，c The pixel values of the image at the position l and the channel c are aligned by the following methods:

wherein the method comprises the steps of

For an aligned HDR image, m _l，c Is defined as

Wherein τ is a constant of [0,1], and the aligned HDR image and the corresponding LDR image form a training sample pair for training of the neural network;

step 2.3, generating an image highlight region mask, obtaining an aligned HDR image brightness unit, and obtaining a mask image of a highlight region in the image in a binarization mode, wherein the formula is as follows:

wherein the method comprises the steps of

For the channel mean image of the aligned HDR image, t is a constant, and the region with the median value of 1 in the mask image represents an object or surface with higher brightness in the scene, including a light source and a strong light reflecting surface. />