[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Color-Adjustable Devices Based on the Surface Plasmons Effect
Previous Article in Journal
In-flight Wind Field Identification and Prediction of Parafoil Systems
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Convolutional Neural Networks Using Skip Connections with Layer Groups for Super-Resolution Image Reconstruction Based on Deep Learning

Intelligent Image Processing Laboratory, Konkuk University, Seoul 05029, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(6), 1959; https://doi.org/10.3390/app10061959
Submission received: 12 February 2020 / Revised: 7 March 2020 / Accepted: 10 March 2020 / Published: 13 March 2020
(This article belongs to the Section Computing and Artificial Intelligence)
Figure 1
<p>VDSR network consisting of one layer group with one skip connection (<math display="inline"><semantics> <mrow> <mi>l</mi> <mo>=</mo> <mn>20</mn> <mo>,</mo> <mi>λ</mi> <mo>=</mo> <mn>1.0</mn> </mrow> </semantics></math>).</p> ">
Figure 2
<p>Proposed network structure with skip connections for each layer group (<math display="inline"><semantics> <mrow> <mi>l</mi> <mo>=</mo> <mn>20</mn> <mo>,</mo> <mi>k</mi> <mo>=</mo> <mn>5</mn> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math>).</p> ">
Figure 3
<p>Comparison of the distribution as a histogram for data generated <span class="html-italic">before ReLU</span> in <math display="inline"><semantics> <msub> <mi>L</mi> <mn>6</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>L</mi> <mn>11</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>L</mi> <mn>16</mn> </msub> </semantics></math> layers. (<b>a</b>) VDSR (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>6</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mo>−</mo> <mn>0.09</mn> <mo>,</mo> </mrow> </semantics></math><math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.17</mn> </mrow> </semantics></math>. (<b>b</b>) VDSR (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>11</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mo>−</mo> <mn>0.12</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.25</mn> </mrow> </semantics></math>. (<b>c</b>) VDSR (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>16</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mo>−</mo> <mn>0.19</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.23</mn> </mrow> </semantics></math>. (<b>d</b>) Proposed method (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>6</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mo>−</mo> <mn>0.61</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>2.85</mn> </mrow> </semantics></math>. (<b>e</b>) Proposed method (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>11</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mo>−</mo> <mn>0.97</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>6.01</mn> </mrow> </semantics></math>. (<b>f</b>) Proposed method (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>16</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mo>−</mo> <mn>0.28</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.89</mn> </mrow> </semantics></math>.</p> ">
Figure 4
<p>Comparison of the distribution as a histogram for data generated <span class="html-italic">after ReLU</span> in <math display="inline"><semantics> <msub> <mi>L</mi> <mn>6</mn> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>L</mi> <mn>11</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>L</mi> <mn>16</mn> </msub> </semantics></math> layers. (<b>a</b>) VDSR (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>6</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>0.09</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.04</mn> </mrow> </semantics></math>. (<b>b</b>) VDSR (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>11</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>0.08</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.05</mn> </mrow> </semantics></math>. (<b>c</b>) VDSR (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>16</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>0.05</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.02</mn> </mrow> </semantics></math>. (<b>d</b>) Proposed method (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>6</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>0.27</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.26</mn> </mrow> </semantics></math>. (<b>e</b>) Proposed method (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>11</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>0.35</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.46</mn> </mrow> </semantics></math>. (<b>f</b>) Proposed method (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>16</mn> </msub> </semantics></math>), <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>0.12</mn> <mo>,</mo> </mrow> </semantics></math> <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.04</mn> </mrow> </semantics></math>.</p> ">
Figure 5
<p>Super-resolution image reconstruction results. (<b>a</b>) Ground truth image. (<b>b</b>) The result of the VDSR method (PSNR = 18.94 dB). (<b>c</b>) The result of the proposed method (PSNR = 19.18 dB).</p> ">
Figure 6
<p>Super-resolution image reconstruction results. (<b>a</b>) Ground truth image. (<b>b</b>) The result of the VDSR method (PSNR = 29.34 dB). (<b>c</b>) The result of the proposed method (PSNR = 30.31 dB).</p> ">
Figure 7
<p>Super-resolution image reconstruction results. (<b>a</b>) Ground truth image. (<b>b</b>) The result of the VDSR method (PSNR = 23.93 dB). (<b>c</b>) The result of the proposed method (PSNR = 24.20 dB).</p> ">
Versions Notes

Abstract

:
In this paper, we propose a deep learning method with convolutional neural networks (CNNs) using skip connections with layer groups for super-resolution image reconstruction. In the proposed method, entire CNN layers for residual data processing are divided into several layer groups, and skip connections with different multiplication factors are applied from input data to these layer groups. With the proposed method, the processed data in hidden layer units tend to be distributed in a wider range. Consequently, the feature information from input data is transmitted to the output more robustly. Experimental results show that the proposed method yields a higher peak signal-to-noise ratio and better subjective quality than existing methods for super-resolution image reconstruction.

1. Introduction

Single image super-resolution (SISR) is a method to reconstruct a super-resolution image from a single low-resolution image [1,2]. The reconstruction of a super-resolution image is generally difficult because of various issues, such as blur and noise. Image processing methods such as interpolation were developed for this purpose before the advent of deep learning. Many applications are based on deep learning in the image processing and computer vision field [1,2,3,4].
The first solution for super-resolution reconstruction from a low-resolution image using deep learning with convolutional neural networks (CNNs) was the super-resolution convolutional neural network (SRCNN) method [1]. However, in the SRCNN method, learning was not performed well in deep layers. The very deep super-resolution (VDSR) method [2] is more efficient for learning in deep layers and achieves better performance than SRCNN for super-resolution reconstruction. Although VDSR has layers much deeper than those in SRCNN, it is efficient because it focuses on generating only residual (high-frequency) information by connecting the input data to the output of the last layer with a skip connection. However, in the VDSR method, the gradient information vanishes, owing to repeated rectified linear unit (ReLU) operations. It was observed that the number of hidden data units with vanishing gradients increases as the training proceeds with many iterations [5]. To resolve the problem of gradient vanishing, batch normalization [6] can be applied, but it may cause data distortion and other negative effects for reconstructing super-resolution images.
The super-resolution image reconstruction performance of the VDSR method [2] is significantly better than that of SRCNN because it uses deep layers and a skip connection. The skip connection is applied only once between the input data and the output of the last layer in the existing VDSR method [2]. In the existing VDSR method, it is difficult to maintain the characteristics of the input data entering the neural network. In each layer, a ReLU operation is performed after a convolution operation.
The ReLU operation is defined as
R e L U ( z ) = max ( 0 , z )
where negative values are clipped to zero.
In VDSR, the data entering network layers would be residual-type with high-frequency components. Approximately half of the output values of the convolution operation are negative in each layer. Because negative values are clipped to zero, approximately half of the output values of ReLU are zero. The remaining positive values after ReLU operation are redistributed into positive and negative values through the convolution operation in the next layer. As the epoch with many iterations proceeds, the percentage of zero and small (close to zero) values increases in the output data of ReLU in each layer, which causes serious extinction of the information used in the learning for reconstruction. This would be the reason for the problem of gradient vanishing after repeated ReLU operations for residual-type data.
Figure 1 shows the network structure of VDSR. In the figure, G denotes the whole layer group, which is composed of l layers ( l = 20 in Figure 1). The layer group G performs whole operations on residual data. Each layer block L i ( i = 1 , , l ) performs a convolution operation and a ReLU activation function operation. In VDSR, the multiplication factor (MF) λ for the skip connection is fixed as 1.0.
Let F ( · ) represent the function of the layer group G . When a low-resolution image is the input data x for the neural network, the operation result of the layer group G in VDSR can be expressed as F ( x ). The output data y , which is the super-resolution image, can be expressed as follows:
y = F ( x ) + λ x .
Skip connections have been used in other residual-type networks [7,8]. He et al. used a skip connection within a building block of residual network (ResNet) [7]. The MF for the skip connection is fixed as 1 for identity connection. One building block is composed of two convolution layers and two ReLU operations. These building blocks are concatenated serially for the whole ResNet. Hence, the data processed by a building block are transferred to the next building block, but the original input data are not transferred to most building blocks. ResNet was developed for image recognition [7].
In [8], skip connections are applied for residual function blocks. There is a fixed scaling (multiplication) factor for each residual function block. The output of residual function is multiplied by this scaling factor, whereas the input is not multiplied by this scaling factor. The data processed by the residual function are added to the input data, and the added data are transferred to the next residual function block. Hence, the original input data are not connected directly to the output of most residual function blocks. The structure in [8] was developed for selective pruning of the unimportant parts of CNNs rather than image reconstruction.
In this paper, we propose a deep learning method with CNNs for reconstructing super-resolution images. The proposed method divides entire layers into several layer groups with skip connections. The proposed method is designed to resolve the problem of gradient vanishing by reducing the extinction of data in hidden layer units through skip connections from the input data to layer groups. Experimental results show that the proposed method yields better results than previous methods for super-resolution image reconstruction.

2. Proposed Method

In this section, we propose a deep learning method with CNNs for super-resolution image reconstruction. In the proposed method, the output of each layer group is connected to input data with a skip connection. Each skip connection has a multiplication factor (MF) with input data, which is a parameter that is also to be learned during the training process. Hence, the proposed method connects input data to layer groups through multiple skip connections at regular intervals, whereas the existing VDSR method utilizes only one skip connection.
If the input image data are repetitively skipped and connected to a predetermined number of layer groups, the problem of data extinction due to ReLU operation at the output of each layer would be significantly alleviated. The first advantage of skip connections with regular intervals is that the data processed in each layer group would retain the characteristics of the input data to be learned without loss of input data information by data extinction and gradient vanishing. This phenomenon has a positive effect on learning for super-resolution reconstruction because the number of contributing units for learning in the proposed method would be greater than that in the structure without repetitive skip connections. The second advantage is that repetitive skip connections to all layer groups would maintain the features of the input image to be learned more robustly. The neural network for super-resolution reconstruction aims to improve the image quality by generating the optimum high-frequency information from the input low-resolution image. The repetitive skip connections would be advantageous to preserving the features of the input image to be learned for super-resolution reconstruction.
In the proposed method, because the number of skip connections with input data is relatively large, the neural network may be overfitted as the epochs roll over during training. To transmit the information of the input data to be learned while preserving their characteristics, each skip connection is associated with an MF, which represents the weight for input data. The MF of the proposed method is a parameter that is also to be learned during the training process. The value of this parameter is to be learned like other parameters, such as filter kernels. We experimentally observed that the super-resolution image reconstruction performance is improved when the MFs are learned and set through the training process.
Figure 2 shows the proposed network structure with a different skip connection for each layer group. Let k represent the number of layers in a layer group and n represent the number of layer groups. The total number of layers l is given by l = k n . In Figure 2, G i ( i = 1 , , n ) represents layer groups, each of which is composed of k layers.
Each λ i represents the MF for layer group G i , and the last MF λ n is fixed to 1.0 as in the existing VDSR method. The MF values of λ 1 through λ n 1 are to be learned and updated as the optimization of parameters in the training process with initialization to some values.
Note that the structure of each L i in the proposed method (Figure 2) is the same as that of each L i in VDSR (Figure 1). Each L i in the proposed method is composed of one convolution layer and one ReLU layer as each L i in VDSR. There are 64 convolutions with 3 × 3 filter kernels in one convolution layer. The parameters relating convolution operations in the proposed method are the same as the parameters relating convolution operations in VDSR. The operation of ReLU units in the proposed method is performed as (1), which is exactly the same as the VDSR.
The only different parameters are the λ 1 , , λ n 1 MFs in the proposed method (Figure 2), which do not exist in the VDSR (Figure 1). The other differences are the skip connections for layer groups and the associated multiplications (represented as ⊗) and additions (represented as ⊕), as shown in Figure 2.
Let F i ( · ) represent the function of layer group G i . The output of layer group G i is F i ( · ) , and the input to the next layer group G i + 1 is F i ( · ) + λ i x . For the case of l = 20 , k = 5 , and n = 4 , as shown in Figure 2, the relation among the input data x , output data y , and function F i ( · ) of each layer group can be expressed as follows:
y = F 4 ( F 3 ( F 2 ( F 1 ( x ) + λ 1 x ) + λ 2 x ) + λ 3 x ) + λ 4 x .

3. Results and Discussion

Experiments on the proposed method were performed using TensorFlow code on a computer with a 3.40 GHz Intel (R) Core (TM) i7-6700 CPU, 16 GB of memory, and an NVIDIA GeForce RTX 2080 graphics card. (The code and dataset are available at https://github.com/hg-a/MFVDSR.)
The training dataset includes 215,316 sub-images created by data argument with 291 images that combine images from image data reported by Yang et al. [9] and the Berkeley Segmentation Dataset [10]. Four datasets, namely, Set5, Set14, B100, and Urban100, were used as the test dataset for performance evaluation. The simulation results were compared with other super-resolution image reconstruction methods, such as A+ [11], SRCNN [1], and VDSR [2], by measuring the average peak signal-to-noise ratio (PSNR) for each test dataset. For the optimization of parameters in training, the Adam optimizer [12] was used for VDSR and the proposed method. In the proposed method, the k value was set to 2, 5, or 10, and different MFs λ i were tested for various cases.
Table 1 presents a comparison of PSNR among the proposed method, A+ [11], SRCNN [1], and VDSR [2]. The A+ method performs super-resolution reconstruction using sparse dictionaries [11]. Table 1 shows that the PSNR performance of the proposed method, having repetitive skip connections with MFs for layer groups, is better than those of A+, SRCNN, and VDSR. Among A+, SRCNN, and VDSR, the VDSR method shows the best PSNR performance for all datasets. Experiments were conducted with different combinations of k, n, and λ i values to find the optimal combination that yields the best performance.
The experimental results show that the proposed method shows the best PSNR performance when k = 5 and n = 4 . Each MF λ i ( i = 1 , , n 1 ) value is initialized as 2.5 and optimized during the training process, while λ n is fixed as 1.0. For the optimization of MF λ i ( i = 1 , , n 1 ) values in training, Adam optimizer [12] is used and the learning rate (step size) is set as 0.0001.
For these datasets, the proposed method shows PSNR improvements of 0.03 0.16 dB over the VDSR method. For scales × 2 , × 3 , and × 4 , the proposed method shows average PSNR improvements of 0.11, 0.08, and 0.06 dB, respectively, over the VDSR method.
Table 2 compares the PSNRs of the proposed method using different combinations of k, n, and λ i . Fixed λ i values are used in Case 1, Case 3, and Case 6. Meanwhile, in Case 2, Case 4, Case 5, and Case 7, some initial values are set for λ i , and these values are learned and updated during the training process. The results in Table 2 indicate that the cases where λ i is set to an initial value and updated during the training process show better results than the cases where λ i is fixed. Hence, we argue that treating the MFs, λ i , as parameters that are updated during the training process would result in improved performance with the proposed method. If we compare the results in terms of k and n, Case 3, Case 4, and Case 5 with k = 5 and n = 4 show better PSNR performances than the other cases.
In the case of k = 5 and n = 4 , as shown in Figure 2, there are four skip connections from input data to the output of four layer groups. Figure 3 and Figure 4 show the data distributions as histograms for data generated at the 60th epoch of the training process in layers L 6 , L 11 , and L 16 , which are the first layers of layer groups G 2 , G 3 , and G 4 , respectively. In the captions of Figure 3 and Figure 4, μ and σ denote the mean and standard deviation, respectively. Figure 3 shows a comparison of distributions for data generated before (as the input of) ReLU in VDSR and the proposed method, while Figure 4 shows a comparison of distributions for data generated after (as the output of) ReLU in VDSR and the proposed method. The comparison of Figure 3 and Figure 4 shows that negative values are clipped to zero, and the standard deviation σ significantly decreases after the ReLU operation for each case.
In Figure 4a–c, the repeated ReLU operations in VDSR force the data distribution to be concentrated in a very narrow range, and the σ values are small. Meanwhile, in Figure 4d–f, the repetitive skip connections from the input data to the layer groups in the proposed method result in data distributions in wider ranges, and the σ values are much larger than in the VDSR method. This effect increases the number of contributing data units for learning and maintains the features of the input image for super-resolution reconstruction more robustly. Therefore, the proposed method shows better super-resolution reconstruction performance than the VDSR method, even though it has the same number of layers.
Table 3 presents the comparison of training and test running time (sec) for SRCNN, VDSR, and the proposed method. The running time for training is very similar for VDSR and the proposed method. Since the network structure of SRCNN is relatively simple, the running times (per epoch and per 60 epochs) for training in SRCNN look smaller than VDSR and the proposed method. The running times for testing per image are very close for SRCNN, VDSR, and the proposed method.
Note that the loss value for training (or test PSNR value) converges at around 60 epochs for VDSR and the proposed method, whereas it does not converge at 60 epochs for SRCNN. For the training of SRCNN, 24,800 sub-images are used [1]. In experimental results in [1], the number of backpropagations should be at least 5 × 10 8 (more than 20,000 epochs) for convergence in training for SRCNN. In our experiments for VDSR and the proposed method, 215,316 sub-images are used. To achieve more than 5 × 10 8 backpropagations with 215,316 sub-images in training, it requires at least 2300 epochs in training. This means that it would require at least 5 × 10 5 s for the convergence of training in SRCNN. Hence, SRCNN would require much more time for training than VDSR and the proposed method. If we compare the convergence time (per 60 epochs) for training, the proposed method takes slightly more time (by 1.3%) than VDSR. It is due to the extra time for the optimization of MF values and the associated multiplication and addition operations.
Figure 5, Figure 6 and Figure 7 show the results of super-resolution image reconstruction using VDSR and the proposed method from bicubic interpolated images with the scale factor × 4 . The bars on the right of Figure 5 are low-brightness straight objects arranged side by side at regular intervals. In the image reconstructed using VDSR, linear objects do not maintain their shape, while the proposed method maintains the linear shape well. In Figure 6, the proposed method shows better performance than VDSR in maintaining the pattern of the grid with the correct shape of the ceiling, where grids with high contrast are arranged regularly. Because the linear objects in the building in Figure 7 are very tightly spaced, the VDSR result is blurry and the straight lines are not well preserved, while the proposed method maintains the straight lines relatively well. These results of super-resolution image reconstruction demonstrate that the proposed method yields better subjective quality than the VDSR method.

4. Conclusions

In this paper, a deep learning method with CNNs using skip connections from input data is proposed to alleviate the problems of gradual extinction of the input data information and very narrow distribution of the data in hidden layer units when the ReLU operation is repeatedly applied to residual-type data. The proposed method divides whole layers into several layer groups and uses skip connections with different MFs to the outputs of layer groups.
The operation of ReLU units in the proposed method is exactly the same as the VDSR. Compared to the VDSR method, the data processed in intermediate hidden layers in the proposed method are distributed in a wider range with greater similarity to a normalized distribution. This effect is obtained by the repetitive skip connections with MFs from input data to the layer groups in the proposed method.
The proposed method shows greater PSNR performance and better subjective quality in super-resolution image reconstruction with a similar amount of computation. In future work, the proposed method can be applied to other residual-type deep learning networks for other applications to improve performance with relatively low computational complexity.

Author Contributions

Conceptualization, H.A. and C.Y.; methodology, H.A.; software, H.A.; validation, H.A.; formal analysis, H.A. and C.Y.; investigation, H.A. and C.Y.; resources, H.A. and C.Y.; data curation, H.A.; writing—original draft preparation, H.A.; writing—review and editing, C.Y.; visualization, H.A.; supervision, C.Y.; project administration, C.Y.; funding acquisition, C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Konkuk University grant number 2018-A019-0691.

Acknowledgments

This paper was supported by Konkuk University in 2018.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; collection, analyses, or interpretation of data; the writing of the manuscript; or the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
SISRsingle image super-resolution
SRCNNsuper-resolution convolutional neural network
VDSRvery deep super-resolution
CNNconvolutional neural network
ReLUrectified linear unit
ResNetresidual network
MFmultiplication factor
PSNRpeak signal-to-noise ratio

References

  1. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Kim, J.; Lee, J.; Lee, K. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the International Conference Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
  3. Jiang, Q.; Tan, D.C.; Li, Y.; Ji, S.; Cai, C.; Zheng, Q. Object detection and classification of metal polishing shaft surface defects based on convolutional neural network deep learning. Appl. Sci. 2019, 10, 87. [Google Scholar] [CrossRef] [Green Version]
  4. Dang, L.M.; Min, K.; Lee, S.; Han, D.; Moon, H. Tampered and computer-generated face images identification based on deep learning. Appl. Sci. 2020, 10, 505. [Google Scholar] [CrossRef] [Green Version]
  5. Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 1998, 6, 107–116. [Google Scholar] [CrossRef] [Green Version]
  6. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proc. Int. Conf. Mach. Learn. 2015, 37, 448–456. [Google Scholar]
  7. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the International Conference Computer Vision Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  8. Huang, Z.; Wang, N. Data-driven sparse structure selection for deep neural networks. arXiv 2018, arXiv:1707.01213. [Google Scholar]
  9. Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image super-resolution via sparse representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef] [PubMed]
  10. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; pp. 416–423. [Google Scholar]
  11. Timofte, R.; Smet, V.D.; Gool, L.V. A+: Adjusted anchored neighborhood regression for fast super-resolution. In Asian Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 111–126. [Google Scholar]
  12. Kingma, D.P.; Ba, J. Adam: A Method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Figure 1. VDSR network consisting of one layer group with one skip connection ( l = 20 , λ = 1.0 ).
Figure 1. VDSR network consisting of one layer group with one skip connection ( l = 20 , λ = 1.0 ).
Applsci 10 01959 g001
Figure 2. Proposed network structure with skip connections for each layer group ( l = 20 , k = 5 , n = 4 ).
Figure 2. Proposed network structure with skip connections for each layer group ( l = 20 , k = 5 , n = 4 ).
Applsci 10 01959 g002
Figure 3. Comparison of the distribution as a histogram for data generated before ReLU in L 6 , L 11 and L 16 layers. (a) VDSR ( L 6 ), μ = 0.09 , σ = 0.17 . (b) VDSR ( L 11 ), μ = 0.12 , σ = 0.25 . (c) VDSR ( L 16 ), μ = 0.19 , σ = 0.23 . (d) Proposed method ( L 6 ), μ = 0.61 , σ = 2.85 . (e) Proposed method ( L 11 ), μ = 0.97 , σ = 6.01 . (f) Proposed method ( L 16 ), μ = 0.28 , σ = 0.89 .
Figure 3. Comparison of the distribution as a histogram for data generated before ReLU in L 6 , L 11 and L 16 layers. (a) VDSR ( L 6 ), μ = 0.09 , σ = 0.17 . (b) VDSR ( L 11 ), μ = 0.12 , σ = 0.25 . (c) VDSR ( L 16 ), μ = 0.19 , σ = 0.23 . (d) Proposed method ( L 6 ), μ = 0.61 , σ = 2.85 . (e) Proposed method ( L 11 ), μ = 0.97 , σ = 6.01 . (f) Proposed method ( L 16 ), μ = 0.28 , σ = 0.89 .
Applsci 10 01959 g003
Figure 4. Comparison of the distribution as a histogram for data generated after ReLU in L 6 , L 11 and L 16 layers. (a) VDSR ( L 6 ), μ = 0.09 , σ = 0.04 . (b) VDSR ( L 11 ), μ = 0.08 , σ = 0.05 . (c) VDSR ( L 16 ), μ = 0.05 , σ = 0.02 . (d) Proposed method ( L 6 ), μ = 0.27 , σ = 0.26 . (e) Proposed method ( L 11 ), μ = 0.35 , σ = 0.46 . (f) Proposed method ( L 16 ), μ = 0.12 , σ = 0.04 .
Figure 4. Comparison of the distribution as a histogram for data generated after ReLU in L 6 , L 11 and L 16 layers. (a) VDSR ( L 6 ), μ = 0.09 , σ = 0.04 . (b) VDSR ( L 11 ), μ = 0.08 , σ = 0.05 . (c) VDSR ( L 16 ), μ = 0.05 , σ = 0.02 . (d) Proposed method ( L 6 ), μ = 0.27 , σ = 0.26 . (e) Proposed method ( L 11 ), μ = 0.35 , σ = 0.46 . (f) Proposed method ( L 16 ), μ = 0.12 , σ = 0.04 .
Applsci 10 01959 g004
Figure 5. Super-resolution image reconstruction results. (a) Ground truth image. (b) The result of the VDSR method (PSNR = 18.94 dB). (c) The result of the proposed method (PSNR = 19.18 dB).
Figure 5. Super-resolution image reconstruction results. (a) Ground truth image. (b) The result of the VDSR method (PSNR = 18.94 dB). (c) The result of the proposed method (PSNR = 19.18 dB).
Applsci 10 01959 g005
Figure 6. Super-resolution image reconstruction results. (a) Ground truth image. (b) The result of the VDSR method (PSNR = 29.34 dB). (c) The result of the proposed method (PSNR = 30.31 dB).
Figure 6. Super-resolution image reconstruction results. (a) Ground truth image. (b) The result of the VDSR method (PSNR = 29.34 dB). (c) The result of the proposed method (PSNR = 30.31 dB).
Applsci 10 01959 g006
Figure 7. Super-resolution image reconstruction results. (a) Ground truth image. (b) The result of the VDSR method (PSNR = 23.93 dB). (c) The result of the proposed method (PSNR = 24.20 dB).
Figure 7. Super-resolution image reconstruction results. (a) Ground truth image. (b) The result of the VDSR method (PSNR = 23.93 dB). (c) The result of the proposed method (PSNR = 24.20 dB).
Applsci 10 01959 g007
Table 1. Comparison of peak signal-to-noise ratio (PSNR) among the proposed method, A+ [11], SRCNN [1], and VDSR [2]. (The results of A+ and SRCNN are taken from [2].) For the proposed method, k = 5 , n = 4 , and λ i = 2.5  (initial).
Table 1. Comparison of peak signal-to-noise ratio (PSNR) among the proposed method, A+ [11], SRCNN [1], and VDSR [2]. (The results of A+ and SRCNN are taken from [2].) For the proposed method, k = 5 , n = 4 , and λ i = 2.5  (initial).
Dataset/ScaleA+SRCNNVDSRProposed
× 2 36.5436.6637.1637.30
Set5 × 3 32.5832.7533.2633.34
× 4 30.2830.4830.9231.03
× 2 32.2832.4232.6932.85
Set14 × 3 29.1329.2829.5229.62
× 4 27.3227.4927.7927.82
× 2 31.2131.3631.6931.75
B100 × 3 28.2928.4128.6428.68
× 4 26.8226.9027.1227.15
× 2 29.2029.5030.2930.36
Urban100 × 3 26.0326.2426.6826.78
× 4 24.3224.5224.8424.91
× 2 32.3132.4932.9633.07
Average × 3 29.0129.1729.5329.61
× 4 27.1927.3527.6727.73
Table 2. Comparison of PSNRs of the proposed method with different combinations of k, n, and λ i . Case 1: k = 2 , n = 10 , λ i = 1.0 (fixed); Case 2: k = 2 , n = 10 , λ i = 0.5 (initial); Case 3: k = 5 , n = 4 , λ i = 1.0 (fixed); Case 4: k = 5 , n = 4 , λ i = 1.5 (initial); Case 5: k = 5 , n = 4 , λ i = 2.5 (initial); Case 6: k = 10 , n = 2 , λ i = 1.0 (fixed); Case 7: k = 10 , n = 2 , λ i = 1.0 (initial).
Table 2. Comparison of PSNRs of the proposed method with different combinations of k, n, and λ i . Case 1: k = 2 , n = 10 , λ i = 1.0 (fixed); Case 2: k = 2 , n = 10 , λ i = 0.5 (initial); Case 3: k = 5 , n = 4 , λ i = 1.0 (fixed); Case 4: k = 5 , n = 4 , λ i = 1.5 (initial); Case 5: k = 5 , n = 4 , λ i = 2.5 (initial); Case 6: k = 10 , n = 2 , λ i = 1.0 (fixed); Case 7: k = 10 , n = 2 , λ i = 1.0 (initial).
Dataset/ScaleCase 1Case 2Case 3Case 4Case 5Case 6Case 7
× 2 37.2437.2637.1937.2337.3037.1737.21
Set5 × 3 33.2733.3233.3033.2733.3433.2033.20
× 4 30.9930.9530.8730.9731.0330.6730.91
× 2 32.8332.7632.8032.8232.8532.6832.76
Set14 × 3 29.5829.5929.6329.6229.6229.4829.59
× 4 27.7727.8027.7927.8027.8227.6527.77
× 2 31.7331.7331.7531.7531.7531.6931.71
B100 × 3 28.6828.6928.7028.7028.6828.6328.68
× 4 27.1427.1527.1427.1627.1527.0727.12
× 2 30.3130.3330.3530.3830.3630.2530.25
Urban100 × 3 26.7326.7826.7626.8226.7826.6326.73
× 4 24.8424.8824.8324.9224.9124.7224.84
× 2 33.0333.0233.0233.0433.0732.9532.98
Average × 3 29.5629.6029.6029.6029.6129.4929.55
× 4 27.6927.7027.6627.7127.7327.5327.66
Table 3. Comparison of training and testing running times (sec) for SRCNN [1], VDSR [2], and the proposed method.
Table 3. Comparison of training and testing running times (sec) for SRCNN [1], VDSR [2], and the proposed method.
Running CaseSRCNNVDSRProposed
Training (per epoch)220432441
Training (per 60 epoch)13,76925,15325,487
Test (per image)0.1030.1010.105

Share and Cite

MDPI and ACS Style

Ahn, H.; Yim, C. Convolutional Neural Networks Using Skip Connections with Layer Groups for Super-Resolution Image Reconstruction Based on Deep Learning. Appl. Sci. 2020, 10, 1959. https://doi.org/10.3390/app10061959

AMA Style

Ahn H, Yim C. Convolutional Neural Networks Using Skip Connections with Layer Groups for Super-Resolution Image Reconstruction Based on Deep Learning. Applied Sciences. 2020; 10(6):1959. https://doi.org/10.3390/app10061959

Chicago/Turabian Style

Ahn, Hyeongyeom, and Changhoon Yim. 2020. "Convolutional Neural Networks Using Skip Connections with Layer Groups for Super-Resolution Image Reconstruction Based on Deep Learning" Applied Sciences 10, no. 6: 1959. https://doi.org/10.3390/app10061959

APA Style

Ahn, H., & Yim, C. (2020). Convolutional Neural Networks Using Skip Connections with Layer Groups for Super-Resolution Image Reconstruction Based on Deep Learning. Applied Sciences, 10(6), 1959. https://doi.org/10.3390/app10061959

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop