CN111161360B

CN111161360B - Image defogging method of end-to-end network based on Retinex theory

Info

Publication number: CN111161360B
Application number: CN201911304471.0A
Authority: CN
Inventors: 杨爱萍; 刘瑾; 王海新; 何宇清
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2023-04-28
Anticipated expiration: 2039-12-17
Also published as: CN111161360A

Abstract

The invention discloses an image defogging method of an end-to-end network based on a Retinex theory, which mainly comprises the steps of firstly establishing a defogging network, then embedding a Retinex model into the defogging network to realize end-to-end learning, extracting characteristic information from a foggy image to estimate a brightness map and a reflection map, and further recovering a clear image according to the Retinex model. The brightness map and the reflection map of the image can be jointly estimated by using the network in the invention, and then the haze-free image is restored according to the Retinex model. The defogging network provided by the invention does not depend on an atmospheric scattering model, so that the problem of image quality reduction after defogging caused by inaccurate transmission diagram estimation is avoided, and the Retinex theory is a color perception model based on human vision.

Description

Image defogging method of end-to-end network based on Retinex theory

Technical Field

The invention belongs to a computer image processing method, in particular to a defogging method for an image.

Background

Outdoor images shot under the weather conditions such as fog, haze and the like can generate degradation phenomena such as contrast reduction, detail blurring, color distortion and the like due to interference such as absorption and scattering of atmospheric suspended particles, and seriously influence the performances of visual systems such as outdoor video monitoring, target recognition and the like. Image defogging is therefore particularly important in the fields of computer vision applications and digital image processing.

With the wide application of Convolutional Neural Networks (CNNs) in computer vision tasks, a deep learning-based method is the main stream of image defogging research. The defogging method directly learns the mapping relation between the foggy image and the non-foggy image to realize defogging, or realizes defogging by estimating parameters such as a transmission diagram, an atmospheric light value and the like and utilizing an imaging model. Qu et al ^[1] An enhanced pix2pix defogging network (EPDN) is proposed, which is directly learnedThe mapping relationship between the foggy image and the foggy image is learned. Chen et al ^[2] A multi-scale adaptive defogging network (MADN) is provided, wherein the network comprises an adaptive distillation network and a multi-scale enhancement network, and the multi-scale enhancement network fuses information through a pyramid pooling module, so that the restored defogging image is more refined. Cai et al ^[3] An end-to-end network (DehazeNet) is proposed to estimate the transmission map and then recover the haze free image according to the atmospheric scattering model. Li and the like ^[4] The atmospheric scattering model is reconstructed, the transmission map and atmospheric light values are integrated into one parameter K, and the AOD-Net is designed to estimate the parameter K, thereby recovering the haze-free image. Zhang et al ^[5] An end-to-end densely connected pyramid network architecture (DCPDN) is proposed to implement defogging by estimating transmission map and atmospheric light values respectively through two subnetworks.

Although these methods give good defogging results, the methods ^[1-2] The image distortion phenomenon after defogging exists because the image distortion phenomenon is caused by the lack of theoretical basis. Although the method ^[3-5] The haze-free image is restored according to the atmospheric scattering model, but transmission maps estimated through the network often contain too much detailed information, and most methods set atmospheric light values to globally consistent constants, resulting in degradation of the restored haze-free image quality.

[ reference ]

[1]Qu,Y.,Chen,Y.,Huang,J.,&Xie,Y.(2019).Enhanced Pix2pix Dehazing Network.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(pp.8160-8168).

[2]Chen,Shuxin,et al."Multi-Scale Adaptive Dehazing Network."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2019.

[3]Kui Jia Chunmei Qing Dacheng Tao Bolun Cai,Xiangmin Xu.Dehazenet:an end-to-end system for single image haze removal.IEEE Transactions on Image Processing,25(11):5187–5198,2016.

[4]Boyi Li,Xiulian Peng,Zhangyang Wang,Jizheng Xu,and Dan Feng.Aod-net:All-in-one dehazing network.In IEEE International Conference on Computer Vision,pages 4780–4788,2017.

[5]He Zhang and Vishal M Patel.Densely connected pyramid dehazing network.In The IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2018

[6]E.H.Land.The Retinex theory of color vision.Scientifific American,237(6):108–129,1977.

[7]Wang J,Lu K,Xue J,et al.Single image dehazing based on the physical model and MSRCR algorithm[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2190-2199.

[8]Li B,Ren W,Fu D,et al.Benchmarking single-image dehazing and beyond[J].IEEE Transactions on Image Processing,2018,28(1):492-505.

[9]Zhang Y,Tian Y,Kong Y,et al.Residual dense network for image super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2472-2481.

[10]O.Ronneberger,P.Fischer,and T.Brox.U-net:Convolutional networks for biomedical image segmentation.In International Conference on Medical Image Computing and Computer-Assisted Intervention,pages 234–241.Springer,2015.

[11]Karen Simonyan and Andrew Zisserman.Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,2014.

Disclosure of Invention

The above-mentioned prior art method DehazeNet [3]]According to the prior knowledge in the traditional defogging method, the network structure is designed to estimate the transmission diagram, so that the defogging image is better restored. Inspired by this method, according to Retinex theory in the traditional defogging method ^[6] The invention designs an end-to-end network image defogging method based on the Retinex theory, and the network can jointly estimate the brightness map and the reflection map of an image and then recover the defogging image according to the Retinex model. The invention provides a device for removingThe fog network does not depend on an atmospheric scattering model, so that the problem of image quality reduction after defogging caused by inaccurate transmission diagram estimation is avoided, and the Retinex theory is a color perception model based on human vision. Experimental results show that the network can effectively recover clear fog-free images.

In order to solve the technical problems, the image defogging method of the end-to-end network based on the Retinex theory is mainly characterized in that a defogging network is established, then a Retinex model is embedded into the defogging network to realize end-to-end learning, characteristic information is extracted from a foggy image to estimate a brightness map and a reflection map, and further a clear image is restored according to a formula (1):

S＝L×R (1)

in the formula (1), S is a clear image, L is a luminance map of the clear image, R is a reflection map of the clear image, and x represents a dot product operation.

The defogging network structure comprises a brightness map estimation sub-network, a reflection map estimation sub-network and an image restoration part by using a Retinex model, and the specific contents are as follows:

1) Luminance map estimation subnetwork:

the brightness map estimation sub-network adopts a cascaded residual error dense network structure and comprises five convolution layers and five dense residual error blocks; the first and second convolution layers are cov, cov2, the convolution kernels are 3*3, the step sizes are 1, the padding is set to 1, the number of output channels is 64, and the second convolution layer is followed by a RELU activation function; the convolution kernel size of the third convolution layer cov3 is 3*3, the step size is 2, and padding is set to 1; five dense residual blocks are sequentially arranged behind the third convolution layer cov3, and the five dense residual blocks are respectively marked as RDB1, RDB2, RDB3, RDB4 and RDB5; each dense residual block has the same structure and comprises four convolution layers which are sequentially connected, wherein the convolution kernels of the first three convolution layers are 3*3, the step length is 1, the padding is 1, and the number of output channels is 1Setting 16, each convolution layer is followed by a RELU activation function, the convolution kernel size of the last convolution layer is 1*1, the step length is 1, the padding is set to 0, and the output channel number is 16; immediately after the last convolution layer in the five dense residual blocks, one deconvolution layer decov1 and a convolution layer cov, wherein the convolution kernel size of the deconvolution layer decov1 is 3*3, the step size is 2, the padding is set to 1, the convolution kernel size of the convolution layer cov4 is 1*1, the step size is 1, and the padding is set to 0; generating a luminance map of the haze-free image through the luminance map estimation sub-network

2) Reflection map estimation subnetwork:

the reflection map estimation sub-network adopts a Unet structure and comprises a feature extraction part and an up-sampling part; the characteristic extraction part comprises four scales, wherein the first three scales comprise two convolution layers and a maximum pooling layer, the convolution kernels of the convolution layers are 3*3, the step sizes are 1, the padding is 1, the number of output channels is 64, 128 and 256 respectively, and the kernel sizes of the maximum pooling layer are 2 x 2; the fourth scale comprises two convolution layers, the convolution kernels are 3*3 in size, the step sizes are 1, the packing is 1, and the number of output channels is 512; the up-sampling part comprises three scales, and the feature images obtained by up-sampling each time are stacked with the feature images with the same scale as the feature extraction part and used as the feature input on the scale; the feature map obtained by the up-sampling operation of the output of the fourth scale of the feature extraction part is used as the input of the up-sampling part; the first two dimensions of the up-sampling part comprise a deconvolution layer and two convolution layers, the convolution kernels of the deconvolution layers are 3*3, the step sizes are 2, the padding is 1, and the output channel numbers are 256 and 128 respectively; the convolution kernels of the two convolution layers are 3*3 in size, the step sizes are 1, the padding is set to be 1, and the output channel numbers are 256 and 128 respectively; the last scale of the up-sampling part comprises three convolution layers, the convolution kernels are 3*3, 3*3 and 1*1 respectively, the step sizes are 1, the padding is set to 1, 1 and 0, and the output channel numbers are 64, 64 and 3 respectively; estimating a subnetwork by the reflection mapGenerating a reflection map of a haze-free image

3) Image restoration using the Retinex model:

luminance map of foggy image generated by extracting characteristic information from foggy image according to steps 1) and 2)

And reflection of haze-free images->

Substitution of the Retinex model, i.e. luminance map +.>

And reflection pattern->

Performing dot multiplication to obtain recovered foggless image +.>

Training the defogging network, comprising: selecting 20-50% of outdoor clear images and corresponding hazy images from a training set of a public dataset RESIDE as the training set of the outdoor hazy images; 80-100% of indoor clear images and corresponding hazy images are selected from a training set of a public dataset RESIDE to serve as the training set of indoor hazy images; the brightness map L and the reflection map R of each selected outdoor and indoor clear image S are obtained by means of average analysis;

training the defogging network using a loss function that uses a smoothed L1 loss and a perceived loss, wherein the smoothed L1 loss is expressed as:

wherein ,

the perceived loss is expressed as:

wherein ,

defogging image representing defogging network generation +.>

Luminance map of haze-free image->

And reflection of haze-free images->

I.e. < ->

I is +.>

Corresponding target images, namely a clear image S, a brightness image L of the clear image and a reflection image R of the clear image, namely I epsilon { S, L, R };

and φ_j (I) Respectively represent image +.>

VGG16 feature map corresponding to target image I, C _j ，H _j and W_j Respectively indicate->

and φ_j (I) Channel number, height and width;

the smooth L1 loss term for the defogging network is expressed as:

the perceived loss term of the defogging network is expressed as:

the total loss function is expressed as:

L＝L _S +λL _P (7)

where λ is a weight factor, which is set to 0.4.

Drawings

FIG. 1 is a view showing a Retinex decomposition, wherein (a) is a clear image, (b) is a clear luminance image, (c) is a clear reflection image, (e) is a hazy image, (f) is a hazy luminance image, and (g) is a hazy reflection image;

FIG. 2 is a schematic flow chart of the method of the present invention;

FIG. 3 is a schematic diagram of the overall network architecture of the present invention;

FIG. 4 is a schematic diagram of a reflection map estimation sub-network structure in the present invention;

FIG. 5 is a comparison of experimental example 1 results, wherein (a) is a blurred image, (b) an AOD-Net effect graph, (c) an EPDN effect graph, and (d) a defogging effect graph of the present invention;

FIG. 6 is a comparison of the results of Experimental example 2, (a) blurred image, (b) AOD-Net effect image, (c) EPDN effect image, (d) defogging effect image of the present invention;

FIG. 7 is a comparative plot of the results of Experimental example 3, (a) blurred image, (b) AOD-Net effect plot, (c) EPDN effect plot, and (d) defogging effect plot of the present invention.

Detailed Description

The invention will now be further described with reference to the accompanying drawings and specific examples, which are in no way limiting.

In the prior art, according to document [6], an image imaging model based on Retinex theory is shown as follows:

S＝L×R (1)

where S is a clear image, L is a luminance map of the clear image, R is a reflection map of the clear image, and x represents a dot product operation. From equation (1), it is known that the key to recovering the haze-free image according to the Retinex model is to obtain the luminance map and the reflection map of the clear image.

Referring to the method for obtaining the luminance map in document [7], the original image is converted from RGB space to YcrCb space to extract the luminance component, which is the luminance map of the image, and the reflection map of the image can be obtained by formula (1), as shown in fig. 1.

According to the image defogging method of the end-to-end network based on the Retinex theory, firstly, a defogging network is established, then, a Retinex model is embedded into the defogging network to realize end-to-end learning, characteristic information is extracted from a foggy image to estimate a brightness map and a reflection map, and further, a clear image is restored according to the formula (1). The main purpose is to extract characteristic information from the foggy image (e) to estimate the luminance map (b) and the reflection map (c) as shown in fig. 1, and to restore the clear image (a) according to equation (1).

The invention provides a Retinex-based end-to-end network image defogging method, which comprises the following specific steps (shown in figure 2):

obtaining a training set and a test set for verification:

the invention mainly uses the common data set RESIDE ^[8] The training sets are ITS (Indoor Training Set) and OTS (outdoor Training Set), wherein ITS is an indoor training set and contains 1399 clear images and 13990 foggy images, OTS is an outdoor training set and contains 8477 clear imagesThe clear images and 296695 foggy images, about 23% of which, i.e. 2061 clear images and corresponding 72135 foggy images, were selected as training sets for outdoor foggy images, all images of all indoor training sets were selected, and literature [7] was adopted]Respectively acquiring brightness images and reflection images corresponding to clear images in the ITS data set and the OTS data set as training data. The test set in the common data set RESIDE is SOTS, and comprises 500 indoor foggy images and 500 outdoor foggy images, and all the test images in the RESIDE are adopted as the test set in the verification process.

Step 2, defogging a single image based on a convolutional neural network:

the network structure is divided into three parts: a luminance map estimation word network, a reflection map estimation sub-network, and an image restoration section. The luminance map estimation sub-network adopts a cascaded residual error dense network structure ^[9] The reflection map estimation sub-network adopts a Unet structure ^[10] The image restoration section restores the haze-free image according to the Retinex model. The overall structure of the network is shown in fig. 3.

1) Luminance map estimation subnetwork:

the luminance map estimation sub-network comprises five convolutional layers and five dense residual blocks. The first two convolutional layers are cov1 and cov respectively, the convolutional kernels are 3*3 in size, the step sizes are 1, the padding is set to 1, the output channel numbers are 64, and a RELU activation function follows. The convolution kernel size of the third convolution layer cov3 is 3*3, the step size is 2, and padding is set to 1. Five dense residual learning modules are then followed, denoted RDB1, RDB2, RDB3, RDB4, RDB5, respectively. Each residual error dense block has the same structure and comprises four convolution layers, the convolution kernels of the first three convolution layers are 3*3, the step length is 1, the padding is 1, the number of output channels is 16, and each convolution layer is followed by a RELU activation function. The convolution kernel size of the last convolution layer is 1*1, the step size is 1, the padding is set to 0, and the output channel number is 16. Finally, immediately following a deconvolution layer and convolution layer, noted decov1 and cov, the convolution kernel sizes are 3*3 and 1*1, respectively, the step sizes are 2 and 1, respectively, and the padding is set to 1 and 0.

2) Reflection map estimation subnetwork:

the reflection map estimation sub-network references the structure of the Unet, and comprises a feature extraction part and an up-sampling part. The feature extraction part comprises four scales, wherein the first three scales comprise two convolution layers and a maximum pooling layer, the convolution kernels of the convolution layers are 3*3, the step sizes are 1, the padding is set to be 1, the number of output channels is 64, 128 and 256 respectively, and the kernel sizes of the maximum pooling layer are 2 x 2. The fourth scale comprises two convolution layers, the convolution kernels are 3*3 in size, the step sizes are 1, the packing is 1, and the number of output channels is 512. The up-sampling part comprises three scales, and the feature images obtained by up-sampling each time are stacked with the feature images with the same scale as the feature extraction part and are used as feature input on the scale. The feature map obtained by the up-sampling operation of the last scale output of the feature extraction section serves as the input of the up-sampling section. The first two scales comprise a deconvolution layer and two convolution layers, the convolution kernels of the deconvolution layers are 3*3 in size, the step sizes are 2, the padding is 1, and the output channel numbers are 256 and 128 respectively. The convolution kernels of the two convolution layers are 3*3 in size, 1 in step size and 1 in padding, and the number of output channels is 256 and 128, respectively. The last scale consists of three convolutional layers, the convolution kernels are 3*3, 3*3 and 1*1 in size, the step sizes are 1, the padding is set to 1, 1 and 0, and the output channel numbers are 64, 64 and 3, respectively. The structure of this sub-network is shown in particular in fig. 4.

3) An image restoration section:

substituting the brightness map and the reflection map generated by the two sub-networks into the (1), and performing point multiplication operation on the brightness map and the reflection map to obtain a recovered haze-free image.

4) Loss function:

to train the network proposed by the present invention, the loss function uses a smooth L1 loss and a perceptual loss ^[11] Wherein the smoothed L1 loss is expressed as:

the perceived loss is expressed as:

wherein ,

representing a network generated image, I being its corresponding target image.

and φ_j (I) Respectively indicate->

VGG16 feature map corresponding to I, C _j ，H _j and W_j Respectively indicate->

and φ_j (I) Channel number, height and width.

Thus, the smooth L1 penalty term for the network can be expressed as:

the perceptual loss term may be expressed as:

wherein ,

and

And respectively generating a haze-free image, a brightness image and a reflection image of the network. S, L and R are the corresponding target images respectively.

Thus, the overall loss function can be expressed as:

L＝L _S +λL _P (7)

wherein λ is a weight factor.

Compared with the prior art, the invention has the beneficial effects that:

in order to verify the effectiveness of the defogging method provided by the invention, the defogging method provided by the invention is compared with the existing mainstream defogging algorithm. Firstly, comparing with EPDN 1 and AOD-Net 4 algorithm, as shown in FIG. 5, FIG. 6 and FIG. 7, FIG. 5 is a comparative graph of experimental example 1 results, (a) is a blurred image, (b) AOD-Net effect graph, (c) EPDN effect graph, and (d) defogging effect graph; FIG. 6 is a comparison of the results of Experimental example 2, (a) blurred image, (b) AOD-Net effect image, (c) EPDN effect image, (d) defogging effect image of the present invention; FIG. 7 is a comparative plot of the results of Experimental example 3, (a) blurred image, (b) AOD-Net effect plot, (c) EPDN effect plot, and (d) defogging effect plot of the present invention. Experimental results show that the defogging method provided by the invention has clear and natural restored defogging images and rich detailed information, and is more in line with the visual law of human eyes.

In order to objectively evaluate the defogging method, PSNR (dB)/SSIM (peak signal to noise ratio/structural similarity) values are adopted for comparison, and the two indexes can reflect the similarity degree of a restored image and an original image and the similarity of a structure. The data in Table 1 shows that the PSNR and SSIM values of the image defogging method of the present invention are superior to those of EPDN [1], MADN [2], dehazeNet [3], AOD-Net [4] and DCPDN [5].

TABLE 1 PSNR (dB)/SSIM comparison results

Although the invention has been described above with reference to the accompanying drawings, the invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many modifications may be made by those of ordinary skill in the art without departing from the spirit of the invention, which fall within the protection of the invention.

Claims

1. The image defogging method of the end-to-end network based on the Retinex theory is characterized by establishing a defogging network, embedding the Retinex model into the defogging network to realize end-to-end learning, extracting characteristic information from a foggy image to estimate a brightness map and a reflection map, and recovering a clear image according to a formula (1):

S＝L×R (1)

in the formula (1), S is a clear image, L is a brightness map of the clear image, R is a reflection map of the clear image, and x represents a dot multiplication operation;

the defogging network has the following structure: the defogging network comprises a brightness map estimation sub-network, a reflection map estimation sub-network and an image restoration method by using a Retinex model, and the specific contents are as follows:

1) Luminance map estimation subnetwork:

the brightness map estimation sub-network adopts a cascaded residual error dense network structure and comprises five convolution layers and five dense residual error blocks;

the first and second convolution layers are cov, cov2, the convolution kernels are 3*3, the step sizes are 1, the padding is set to 1, the number of output channels is 64, and the second convolution layer is followed by a RELU activation function;

the convolution kernel size of the third convolution layer cov3 is 3*3, the step size is 2, and padding is set to 1;

five dense residual blocks are sequentially arranged behind the third convolution layer cov3, and the five dense residual blocks are respectively marked as RDB1, RDB2, RDB3, RDB4 and RDB5; each dense residual block has the same structure and comprises four convolution layers which are sequentially connected, wherein the convolution kernels of the first three convolution layers are 3*3, the step length is 1, the padding is 1, the number of output channels is 16, each convolution layer is followed by a RELU activation function, the convolution kernel of the last convolution layer is 1*1, the step length is 1, the padding is 0, and the number of output channels is 16;

immediately after the last convolution layer in the five dense residual blocks, one deconvolution layer decov1 and a convolution layer cov, wherein the convolution kernel size of the deconvolution layer decov1 is 3*3, the step size is 2, the padding is set to 1, the convolution kernel size of the convolution layer cov4 is 1*1, the step size is 1, and the padding is set to 0;

generating a luminance map of the haze-free image through the luminance map estimation sub-network

2) Reflection map estimation subnetwork:

the reflection map estimation sub-network adopts a Unet structure and comprises a feature extraction part and an up-sampling part;

the characteristic extraction part comprises four scales, wherein the first three scales comprise two convolution layers and a maximum pooling layer, the convolution kernels of the convolution layers are 3*3, the step sizes are 1, the padding is 1, the number of output channels is 64, 128 and 256 respectively, and the kernel sizes of the maximum pooling layer are 2 x 2; the fourth scale comprises two convolution layers, the convolution kernels are 3*3 in size, the step sizes are 1, the packing is 1, and the number of output channels is 512;

the up-sampling part comprises three scales, and the feature images obtained by up-sampling each time are stacked with the feature images with the same scale as the feature extraction part and used as the feature input on the scale; the feature map obtained by the up-sampling operation of the output of the fourth scale of the feature extraction part is used as the input of the up-sampling part;

the first two dimensions of the up-sampling part comprise a deconvolution layer and two convolution layers, the convolution kernels of the deconvolution layers are 3*3, the step sizes are 2, the padding is 1, and the output channel numbers are 256 and 128 respectively; the convolution kernels of the two convolution layers are 3*3 in size, the step sizes are 1, the padding is set to be 1, and the output channel numbers are 256 and 128 respectively;

the last scale of the up-sampling part comprises three convolution layers, the convolution kernels are 3*3, 3*3 and 1*1 respectively, the step sizes are 1, the padding is set to 1, 1 and 0, and the output channel numbers are 64, 64 and 3 respectively;

generating a reflection map of a haze-free image by the reflection map estimation sub-network

3) Image restoration using the Retinex model:

And reflection of haze-free images->

Substitution of the Retinex model, i.e. luminance map +.>

And reflection pattern->

Performing dot multiplication to obtain recovered foggless image +.>

2. The image defogging method based on the Retinex theory end-to-end network according to claim 1, wherein training the defogging network comprises:

selecting 20-50% of outdoor clear images and corresponding hazy images from a training set of a public dataset RESIDE as the training set of the outdoor hazy images; 80-100% of indoor clear images and corresponding hazy images are selected from a training set of a public dataset RESIDE to serve as the training set of indoor hazy images; the brightness map L and the reflection map R of each selected outdoor and indoor clear image S are obtained by means of average analysis;