CN108537733A

CN108537733A - Super resolution ratio reconstruction method based on multipath depth convolutional neural networks

Info

Publication number: CN108537733A
Application number: CN201810325131.5A
Authority: CN
Inventors: 邵文泽; 陈龙; 葛琦; 王力谦
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2018-04-11
Filing date: 2018-04-11
Publication date: 2018-09-14
Anticipated expiration: 2038-04-11
Also published as: CN108537733B

Abstract

The invention discloses a kind of super resolution ratio reconstruction methods based on multipath depth convolutional neural networks, include the following steps：Total training set, total training set image preprocessing, test set is obtained to prepare and realize image reconstruction with the convolutional layer of convolutional neural networks；Multipath convolutional neural networks structure proposed by the present invention, a plurality of branch is increased on original single path neural net base, the convolution kernel of the characteristics of image different number of different scale can be handled, method more original on reconstruction quality and visual effect has promotion while not increasing population parameter amount.

Description

Super resolution ratio reconstruction method based on multipath depth convolutional neural networks

Technical field：

The present invention relates to a kind of super resolution ratio reconstruction methods based on multipath depth convolutional neural networks, belong at image Manage technical field.

Background technology：

With the development of science and technology with the progress of human society, information interchange and processing are more and more important, compared to word, sound The information such as sound, smell, image information has the characteristics that intuitive, image and contains much information, the study found that complete acquired in the mankind In portion's information, the ratio of visual information is up to 60%, so image, which becomes, obtains the important next of information in people's Working Life Source has great importance to the research and processing of image.

Resolution ratio is the measurement to screen picture precision, it is the quantity of display display pixel.The point of screen picture, Line, face are made of pixel one by one, therefore the pixel for constituting image is more, and screen picture picture is finer clear, similarly Transmissive information content is also abundanter in image-region, and visual effect is also better, and such case is referred to as high-resolution (HighResolution, abbreviation HR), it is on the contrary then be known as low resolution (Low Resolution, abbreviation LR).Resolution ratio reflects The measurement of informative degree and image detail expressive ability height that image is included.

It is a kind of approach for improving image resolution ratio to the hardware device in sampling system improve, and there are mainly two types of sides Method：Improve the density of sensing unit；Increase the quantity of sensing unit.Both methods increases the cost of hardware device, and And the former makes sensing chip size become smaller and bring more interference noises.As a kind of software technology, Image Super-resolution Rate (Super-Resolution, abbreviation SR) algorithm for reconstructing can realize image or video resolution under hardware device constraint Enhancing improves the visual effect of image or video.

The purpose of image super-resolution rebuilding technology is to improve the resolution ratio of input picture, i.e., by enhancing in input picture The clarity of appearance restores the not existing detailed information of input picture itself, the HR images of final output high quality.Oversubscription Resolution reconstruction technique is suitable for being difficult to obtain the occasion of high quality graphic, such as in video monitoring and remote sensing, the former obtains past Toward the low quality face for being only tens pixels, and for the latter's HR numbers

The cost of word imaging device is often excessively high, is limited by hardware cost and people are to high image resolution ratio Unremitting pursuit, image SR reconstruction techniques obtain extensive research concern.

Image SR algorithm for reconstructing is a kind of skill recovering HR images by a secondary LR images or a LR image sequences reconstruct Art.The HR images that this technology obtains can recover the detail of the high frequency of observed image loss, have better vision effect Fruit is conducive to the various subsequent processings to image, and the target of image SR technologies is to solve this typical ill-conditioning problem.

Super-resolution rebuilding algorithm is generally divided into three classes:Super-resolution reconstruction based on interpolation, based on rebuilding and based on study Algorithm is built, the method based on interpolation realizes that simple computation is efficient, but reconstructed results are excessively smooth, and it is thin can not to generate high frequency Section；Method based on reconstruction can not handle the complicated image structure in natural image；Method based on study relies on sample image The mapping between low-resolution image and high-definition picture is established in library, when inputting new low-resolution image, according to this Mapping relations reconstruct high-definition picture, so the key of such method is the learning algorithm itself for sample image library, The present invention is mainly using the super resolution ratio reconstruction method based on study.

Being disclosed in the information of the background technology part, it is only intended to increase understanding of the overall background of the invention, without answering It has been the prior art well known to persons skilled in the art when being considered as recognizing or imply that the information is constituted in any form.

Invention content：

The purpose of the present invention is in view of the above-mentioned problems, exploring the depth convolution god suitable for natural image super-resolution problem It through network structure, accounts in terms of network depth and width two, is improved under the premise of not increasing network population parameter amount The quality and visual effect of image reconstruction.

To achieve the above object, specific implementation step of the present invention is as follows：

A kind of super resolution ratio reconstruction method based on multipath depth convolutional neural networks, includes the following steps：

(1) total training set is obtained：

Training set is in general super-resolution network training collection 91images and Berkeley partitioned data set 200images, original training set are 291 nature pictures, and 0.9,0.8,0.7,0.6 is carried out respectively to each Zhang Xunlian pictures The diminution of multiple, and carry out 90 °, 180 °, 270 ° of rotations and mirror image switch, the nature picture warp in original training set Crossing data enhances to obtain 20 pictures, and total training set size is 291*20=5820 nature pictures；

(4) total training set image preprocessing：

Picture in total training set is carried out 2 times, 3 times and 4 times resolution ratio with BICUBIC methods respectively to reduce, is cut into The picture block of 41*41 sizes, patch is dimensioned to 64, and training set is saved as HDF5 formats as convolutional neural networks Input；

(5) test set prepares：

Setting 4 groups of test sets is respectively：Set5, Set14, B100, Urban100, wherein Set5, Set14, B100 are three A common image super-resolution data set, separately includes 5,14,100 various sizes of natural pictures, and urban100 is 100 different City scenarios pictures；Test set carries out identical pretreatment according to training set in step (2), defeated after BICUBIC Enter network and carry out image super-resolution rebuilding, patch is dimensioned to 2；

(4) convolutional layer of convolutional neural networks is used to realize image reconstruction：

It is divided into three parts：Feature extraction, Feature Mapping and image reconstruction, feature extraction and image recovered part are respectively with one A 3*3 convolution kernels realize that Feature Mapping part is realized with 27 3*3 convolution kernels；First by the coloured silk of RGB in training set and test set Color image is converted into YCbCr color spaces, only calculates the channels Y, that is, luminance channel, is then cut into the patch that size is 41*41 and makees For input；A ReLU activation primitive is closely followed after each convolutional layer.

Image super-resolution refers to by a width low-resolution image (Low Resolution, abbreviation LR) or low resolution figure As sequence recovers high-definition picture (High Resolution, abbreviation HR), which is the warp in computer vision field Allusion quotation problem can increase effective information in original low resolution picture, improve the visual effect of picture.The present invention is for single The super-resolution problem of image, it is intended to be explored using depth convolutional neural networks and a kind of realize that speed is fast, recovery quality is high Image super-resolution method.

The technical solution that the present invention further limits is：

Further：Convolutional neural networks in step (2) and (4) are specially：Include 29 convolutional layers, each layer of volume altogether Product core size is 3*3, and it is to generate 64 kinds of features by convolution that characteristic pattern quantity, which is 64, to keep output characteristic pattern size constant, Padding is set as 1；Characteristic extraction part includes one layer；Feature Mapping part includes 29 layers；Image recovered part includes one Layer, output characteristic pattern quantity are 1, i.e., the high-resolution residual error image that e-learning arrives, longest path receptive field calculates：

RF_n-1=(RF_n- 1) * Stride+Kerneral_size,

Wherein, RF_nFor (n-1)th layer of receptive field of n-th layer pair, Stride is convolution step-length, and Kerneral_size is convolution Core size successively calculates forward receptive field using the calculation of top to down from last layer；Neural network it is last One layer of receptive field to the second last layer be 3, according to before formula to derive receptive field be 41, that is, the high-definition picture generated In each pixel have relationship with 41*41 pixel in input picture.

Further：Super-resolution method in step (3) is specially：In original chain type convolutional neural networks structure base Multichannel gauge structure is introduced on plinth, since second convolutional layer, each two rolls up base's additional convolutional layer in parallel, and in parallel Convolution layer parameter and original convolutional layer parameter sharing, carry out Fusion Features in the end of each parallel-connection structure, it is multiple similar Parallel-connection structure nesting constitute multichannel gauge structure, the reconstruction performance of network can be promoted in the case where not increasing network parameter.

Further：The first part of convolutional neural networks structure is extraction and the character representation of image block, utilizes convolution The feature of the property extraction residual image of network.

Further：Detailed process is indicated with following formula：

F₁(X)=max (0, W₁*X+B₁),

Wherein, W₁And B₁Convolution filter and biasing, W are indicated respectively₁Size be c*f₁*f₁*n₁, c is input picture Port number, f₁For the size of convolution filter, n₁For the quantity of convolution filter；W₁By n₁A convolution filter is applied to image, And each convolution kernel size is c*f₁*f₁；Output is by n₁A character network figure composition；B₁It is a n₁The vector of dimension, each of which Element is related with a convolution filter respectively.

Further：Feature is mapped and merged using multichannel gauge structure in the second part of network, formula is：

F₂(X)=max (0, W₂*F₁(X)+B₂),

F₃(X)=max (0, W₃*F₂(X)+B₃),

F₂₁(X)=max (0, W₂*F₁(X)+B₂),

Wherein, F₂(X), F₃(X) it is second and the output of third convolutional layer, F₂₁(X) it is the output of convolutional layer in parallel, and Join W and B and the second layer parameter sharing of convolutional layer, then with the mode being added by F₂₁(X) and F₃ ^*(X) it merges.Specific such as attached drawing Shown in 2,9 similar structures form final multipath convolutional neural networks, and the final output of network configuration second part is F₁*₉(X)。

Further：The Part III of network try again convolution carry out image reconstruction, similar to being averaged for conventional method Processing, formula are as follows：

F₂₀(X)=W₂₀*F₁₉(X)+B₂₀,

Wherein, F₂₀(X) it is the output of the 20th convolutional layer, the residual image that dimension 1, i.e. e-learning arrive, according to complete Office's residual error learning structure, network inputs X, the high-definition picture that network finally reconstructs are Y, then：

Y=X+F₂₀(X),

Assuming that training set isThat is N is to the high-low resolution images pair of S kind amplification factors, for each To sample, the high-resolution residual error image to be learnt is represented byThe network Nonlinear Mapping to be learnt is F (X), if F (X) parameter is θ={ W^K,b^k, so the loss function of network is represented by：

Using above-mentioned convolutional neural networks, learnt using multichannel gauge structure and global residual error, to reconstruct final height Image in different resolution.

Further：In step (4), ReLU activation primitives, that is, Unit (ReLU, max (0, x)).

Beneficial effects of the present invention：

(1) super-resolution rebuilding result.

Multipath convolutional neural networks structure proposed by the present invention increases on original single path neural net base A plurality of branch can handle the convolution kernel of the characteristics of image different number of different scale, not increase the same of population parameter amount When method more original on reconstruction quality and visual effect have promotion.

(2) data enhance.

The present invention rotates each image in original 291images data sets, overturning and different multiples It reduces, has effectively expanded data set so that deep neural network can be trained up.

(3) the various sizes of amplification of image.

The present invention uses multi-scale method in training network, and original image is carried out 2 times, 3 times and 4 times different resolutions The diminution of size, so in test can also 2 times be carried out to image, the amplification of 3 times and 4 times resolution ratio.

(4) training deep neural network skill

Global residual error structure is added at training depth convolutional Neural network in the present invention in a network, to improve network receipts Speed is held back, and trained iteration is set every 300,000 times to 10 times of diminutions of learning rate progress, to ensure that network is received

It holds back in optimum position.

Super resolution ratio reconstruction method proposed by the present invention based on multipath convolutional neural networks, it is advantageous that：

1. multichannel gauge structure：

Inception modules in Googlenet are in parallel by different convolutional layers, improved while increasing network-wide Adaptability of the network to different scale feature, can convert same input Mapping implementation difference, and all by their result Be connected in single one output, this module effectively improves the performance of the neural network for classification task, the present invention by It inspires in inception modules, in conjunction with the characteristics of this problem of image super-resolution, has built and be used for image super-resolution task Multipath convolutional neural networks structure.

The present invention maps characteristics of image using different number of convolutional layer parallel method, in the end of parallel-connection structure Fusion Features, volume base parameter in parallel and original volume base parameter sharing are carried out, is made of this multiple class formation nesting more Path structure so that the reconstruction quality and visual effect of image are all promoted.

2. residual error learns：

Depth residual error network is that the depth convolutional network proposed in 2015 just harvests figure once being born in ImageNet As the champion of classification, detection, positioning three.The thought of residual error network is that layer is expressed as study residual error function according to input, residual Poor network is easier to optimize, and can improve the accuracy rate of network by increasing comparable depth.Its core is that solve The side effect (degenerate problem) for increasing depth zone, can improve network performance by merely increasing network depth in this way. Residual error network is specifically shown in bibliography K.He, X.Zhang, S.Ren, and J.Sun.Deep residual learning for image recognition.In CVPR 2016。

Convolutional neural networks algorithm is applied to Image Super-resolution field by SRCNN networks first, and achieves more traditional side Method preferably rebuilds effect, but SRCNN networks directly learn mapping of the low-resolution image to high-definition picture, and network is received It holds back slowly, once deepening the network number of plies, training would become hard to be restrained.SRCNN networks are specifically shown in bibliography Dong, C., Loy, C.C.,He,K.,Tang,X.:Learning a deep convolutional network for image super- resolution.In:ECCV.(2014)184–199.The it is proposed of residual error learning structure so that network can learn input with it is defeated Residual error between going out, accelerates the convergence of deep layer convolutional neural networks, and allows network deeper, and effect is better.The present invention Global residual error learning structure is introduced, the residual error between e-learning low-resolution image and full resolution pricture is counted in network layer Up to 20 layers, when convolution kernel number 29, also can be compared with rapid convergence.

3. multiple dimensioned training method：

SRCNN networks can only carry out image the amplification of single scale, and re -training net is needed when different scale being needed to amplify Network, present invention introduces multiple dimensioned training methods, and training set is made to the diminution of different resolution multiple so that the same network structure 2 times, 3 times and 4 times of amplification can be realized to image resolution ratio.

4. algorithm is realized:

Among numerous deep learning frames, the present invention selects caffe to realize proposed network structure.Caffe is one A clear, efficient deep learning frame, kernel language is C++, it supports order line, Python and Matlab interfaces, it was both It can be run on CPU, also can accelerate training and test with GPU.Because caffe is to carry out network with the mode of configuration file to take It builds, so realizing that various convolutional neural networks structures are convenient and efficient in caffe, this is also the reason of present invention is using caffe.

For programming software, the present invention uses VS2013 and Matlab.The place of training set and test set is realized with Matlab Science and engineering is made, and in data test phase, the present invention divides single picture with the Matconvnet frames realization on Matlab platforms Resolution is amplified, and finally carries out statistics and analysis to experimental data.

Description of the drawings：

Attached drawing 1 is flow diagram of the present invention；

Attached drawing 2 is multipath neural network structure figure.

Specific implementation mode：

The specific implementation mode of the present invention is described in detail below, it is to be understood that protection scope of the present invention is not It is restricted by specific implementation.

Unless otherwise explicitly stated, otherwise in entire disclosure and claims, term " comprising " or its change It changes such as "comprising" or " including " etc. and will be understood to comprise stated element or component, and do not exclude other members Part or other component parts.

(1) prepare training set.It will be in general super-resolution network training collection 91images and Berkeley partitioned data set 200images is incorporated as training set, and original training set is 291 nature pictures, in order to make full use of trained picture, experiment In data enhancing has been done to training set, carry out the diminution of 0.9,0.8,0.7,0.6 multiple respectively to each Zhang Xunlian pictures, and Carry out 90 °, 180 °, 270 ° of rotations and mirror image switch, therefore a nature picture in original training set is by data enhancing 20 pictures are obtained, total training set size is 291*20=5820 nature pictures；

(2) training set image preprocessing：Picture in the enhanced training set of data is carried out respectively with BICUBIC methods 2 times, 3 times and 4 times resolution ratio reduce, and cut into the picture block of 41*41 sizes, patch is dimensioned to 64, and training set is protected Save as input of the HDF5 formats as convolutional neural networks；

(3) setup test collection.4 groups of test sets：Set5, Set14, B100, Urban100, wherein urban100 are 100 Different City scenarios pictures；Test set and training set carry out identical pretreatment, and input network progress image is super after BICUBIC Resolution reconstruction, patch are dimensioned to 2；

(4) training network.In caffe deep learning platforms, realized in attached drawing 2 by net.prototxt configuration files Multipath convolutional neural networks structure, by solver.prototxt configuration files be arranged network training relevant parameter, it is excellent Change function is adam, and basic learning rate is set as 10e-4, and learning strategy step, stepsize 300000, gamma are 0.1, i.e., often train 300000 learning rates to fall to original 0.1 times, total iterations are 900000 times, and use gpu Accelerate training, training duration is about 30 hours.Computer is configured to Intel Core i7-6700K, NVIDIA GeForce GTX 980Ti GDDR5 6GB, 32GB RAM,

Operating system win10；

(5) test network.The trained neural network models of caffe are extracted with Matlab, on Matconvnet platforms Network model is imported respectively to Set5, Set14, B100, this 4 test sets of Urban100 are tested, and image reconstruction knot is preserved Fruit and the PSNR values and its mean value for calculating every reconstruction image.PSNR calculation formula are：

Wherein, f is true picture, f^*For super-resolution rebuilding image, M is the number of pixels of f, and the unit of PSNR is dB.

1. test result of the present invention of table and each algorithm the PSNR values that be averaged compare, and unit dB, runic is peak.

Graphical results illustrate that, using PSNR as image reconstruction quality evaluation standard, inventive algorithm takes on 4 test sets Obtained best image reconstruction effect.

The description of the aforementioned specific exemplary embodiment to the present invention is in order to illustrate and illustration purpose.These descriptions It is not wishing to limit the invention to disclosed precise forms, and it will be apparent that according to the above instruction, can much be changed And variation.The purpose of selecting and describing the exemplary embodiment is that explaining the specific principle of the present invention and its actually answering With so that those skilled in the art can realize and utilize the present invention a variety of different exemplary implementation schemes and Various chooses and changes.The scope of the present invention is intended to be limited by claims and its equivalents.

Claims

1. a kind of super resolution ratio reconstruction method based on multipath depth convolutional neural networks, includes the following steps：

(1) total training set is obtained：

Training set is the 200images in general super-resolution network training collection 91images and Berkeley partitioned data set, former The training set of beginning is 291 nature pictures, carries out the diminution of 0.9,0.8,0.7,0.6 multiple respectively to each Zhang Xunlian pictures, And 90 ° are carried out, 180 °, 270 ° of rotations and mirror image switch, a nature picture in original training set is by data enhancing 20 pictures are obtained, total training set size is 291*20=5820 nature pictures；

(2) total training set image preprocessing：

Picture in total training set is carried out 2 times, 3 times and 4 times resolution ratio with BICUBIC methods respectively to reduce, cuts into 41*41 The picture block of size, patch is dimensioned to 64, and training set is saved as HDF5 formats as the defeated of convolutional neural networks Enter；

(3) test set prepares：

Setting 4 groups of test sets is respectively：Set5, Set14, B100, Urban100, wherein Set5, Set14, B100 are three normal With image super-resolution data set, 5,14,100 various sizes of natural pictures are separately included, urban100 is 100 Different City scenarios pictures；Test set carries out identical pretreatment according to training set in step (2), and net is inputted after BICUBIC Network carries out image super-resolution rebuilding, and patch is dimensioned to 2；

It is divided into three parts：Feature extraction, Feature Mapping and image reconstruction, feature extraction and image recovered part are respectively with a 3* 3 convolution kernels realize that Feature Mapping part is realized with 27 3*3 convolution kernels；First by the cromogram of RGB in training set and test set As being converted into YCbCr color spaces, the channels Y, that is, luminance channel is only calculated, it is the patch of 41*41 as defeated to be then cut into size Enter；A ReLU activation primitive is closely followed after each convolutional layer.

2. the super resolution ratio reconstruction method according to claim 1 based on multipath depth convolutional neural networks, feature It is：Convolutional neural networks in step (2) and (4) are specially：Include 29 convolutional layers altogether, each layer of convolution kernel size is 3*3, it is to generate 64 kinds of features by convolution that characteristic pattern quantity, which is 64, and to keep output characteristic pattern size constant, padding is set as 1；Characteristic extraction part includes one layer；Feature Mapping part includes 29 layers；Image recovered part includes one layer, exports characteristic pattern number Amount is 1, i.e., the high-resolution residual error image that e-learning arrives, longest path receptive field calculates：

RF_n-1=(RF_n- 1) * Stride+Kerneral_size,

Wherein, RF_nFor (n-1)th layer of receptive field of n-th layer pair, Stride is convolution step-length, and Kerneral_size is that convolution kernel is big It is small, using the calculation of top to down, receptive field is successively calculated forward from last layer；Last layer of neural network Be 3 to the receptive field of the second last layer, according to before formula to derive receptive field is 41, that is, in the high-definition picture generated Each pixel has relationship with 41*41 pixel in input picture.

3. the super resolution ratio reconstruction method according to claim 1 based on multipath depth convolutional neural networks, feature It is：Super-resolution method in step (3) is specially：Multichannel is introduced on original chain type convolutional neural networks architecture basics Gauge structure, since second convolutional layer, each two rolls up base's additional convolutional layer in parallel, and convolution layer parameter in parallel With original convolutional layer parameter sharing, Fusion Features are carried out in the end of each parallel-connection structure, multiple similar parallel-connection structures are embedding Set constitutes multichannel gauge structure, and the reconstruction performance of network can be promoted in the case where not increasing network parameter.

4. the super resolution ratio reconstruction method according to claim 2 based on multipath depth convolutional neural networks, feature It is：The first part of convolutional neural networks structure is extraction and the character representation of image block, is carried using the property of convolutional network Take the feature of residual image.

5. the super resolution ratio reconstruction method according to claim 4 based on multipath depth convolutional neural networks, feature It is：Detailed process is indicated with following formula：

F₁(X)=max (0, W₁*X+B₁),

Wherein, W₁And B₁Convolution filter and biasing, W are indicated respectively₁Size be c*f₁*f₁*n₁, c is the channel of input picture Number, f₁For the size of convolution filter, n₁For the quantity of convolution filter；W₁By n₁A convolution filter is applied to image, and every A convolution kernel size is c*f₁*f₁；Output is by n₁A character network figure composition；B₁It is a n₁The vector of dimension, each of which element It is related with a convolution filter respectively.

6. the super resolution ratio reconstruction method according to claim 2 based on multipath depth convolutional neural networks, feature It is：Feature is mapped and is merged using multichannel gauge structure in the second part of network,

Its formula is：

F₂(X)=max (0, W₂*F₁(X)+B₂),

F₃(X)=max (0, W₃*F₂(X)+B₃),

F₂₁(X)=max (0, W₂*F₁(X)+B₂),

F₃ ^*(X)=F₃(X)+F₂₁(X),

Wherein, F₂(X), F₃(X) it is second and the output of third convolutional layer, F₂₁(X) it is the output of convolutional layer in parallel, parallel connection volume The W and B of lamination and second layer parameter sharing, then with the mode being added by F₂₁(X) and F₃ ^*(X) it merges.In specific such as attached drawing 2 Shown, 9 similar structures form final multipath convolutional neural networks, and the final output of network configuration second part is

7. the super resolution ratio reconstruction method according to claim 2 based on multipath depth convolutional neural networks, feature It is：The Part III of network try again convolution carry out image reconstruction, be similar to conventional method average treatment, formula is such as Under：

F₂₀(X)=W₂₀*F₁₉(X)+B₂₀,

Wherein, F₂₀(X) for the output of the 20th convolutional layer, the residual image that dimension 1, i.e. e-learning arrive, according to global residual Poor learning structure, network inputs X, the high-definition picture that network finally reconstructs are Y, then：

Y=X+F₂₀(X),

Assuming that training set isThat is N is to the high-low resolution images pair of S kind amplification factors, for every a pair of of sample This, the high-resolution residual error image to be learnt is represented byThe network Nonlinear Mapping to be learnt is F (X), If F (X) parameter is θ={ W^K,b^k, so the loss function of network is represented by：

Using above-mentioned convolutional neural networks, learnt using multichannel gauge structure and global residual error, to reconstruct final high-resolution Rate image.

8. the super resolution ratio reconstruction method according to claim 1 based on multipath depth convolutional neural networks, feature It is：In step (4), ReLU activation primitives, that is, Unit (ReLU, max (0, x)).