Abstract
With the rise of deep neural network, convolutional neural networks show superior performances on many different computer vision recognition tasks. The convolution is used as one of the most efficient ways for extracting the details features of an image, while the deconvolution is mostly used for semantic segmentation and significance detection to obtain the contour information of the image and rarely used for image classification. In this paper, we propose a novel network named bi-branch deconvolution-based convolutional neural network (BB-deconvNet), which is constructed by mainly stacking a proposed simple module named Zoom. The Zoom module has two branches to extract multi-scale features from the same feature map. Especially, the deconvolution is borrowed to one of the branches, which can provide distinct features differently from regular convolution through the zoom of learned feature maps. To verify the effectiveness of the proposed network, we conduct several experiments on three object classification benchmarks (CIFAR-10, CIFAR-100, SVHN). The BB-deconvNet shows encouraging performances compared with other state-of-the-art deep CNNs.
Similar content being viewed by others
References
Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289
Glorot X, Bordes A, Bengio Y (2012) Deep sparse rectifier neural networks. In: International conference on artificial intelligence and statistics
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. arXiv:1302.4389
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International conference on computer vision, pp 1026– 1034
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on computer vision. Springer, pp 630–645
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: alexnet-level accuracy with 50x fewer parameters and 0.5mb model size. arXiv:1602.07360
Ioffe S (2017) Batch renormalization: towards reducing minibatch dependence in batch-normalized models. arXiv:1702.03275
Ioffe, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1097–1105
Lécun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee CY, Xie S, Gallagher P, Zhang Z, Zhuowen T (2015) Deeply-supervised nets. Artif Intell Statist, 562–570
Li J, Liang X, Shen SM, Xu T, Feng J, Yan S (2015) Scale-aware fast r-cnn for pedestrian detection. arXiv:1510.08160
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, pp 21–37
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. Comput Vis Pattern Recogn, 3431–3440
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: International Conference on international conference on machine learning, pp 807–814
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. Nips Workshop on Deep Learning & Unsupervised Feature Learning
Noh H, Hong S, Han B (2016) Learning deconvolution network for semantic segmentation. In: IEEE International conference on computer vision, pp 1520–1528
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: International Conference on neural information processing systems, pp 91– 99
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv:1412.6550
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Sermanet P, Kavukcuoglu K, Chintala S, Lecun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: IEEE Conference on computer vision and pattern recognition, pp 3626– 3633
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Srivastava R K, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems, pp 2377–2385
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Computer vision and pattern recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Computer Vision and pattern recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, 4278–4284
Wu C, Wen W, Afzal T, Zhang Y, Chen Y, Li H (2017) A compact dnn: approaching googlenet-level accuracy of classification and domain adaptation. arXiv:1703.04071
Xie S, Girshick R, Dollár P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. arXiv:1611.05431
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146
Zagoruyko S, Komodakis N (2017) Diracnets: training very deep neural networks without skip-connections. arXiv:1706.00388
Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. arXiv:1301.3557
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zeiler M D, Krishnan D, Taylor GW, Fergus R (2010) Deconvolutional networks. In: Computer IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 2528–2535
Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: International conference on computer vision, pp 2018–2025
Acknowledgments
This work is supported by the Natural Science Foundation of China (Grant 61572214 and U1536203), Independent Innovation Research Fund Sponsored by Huazhong university of science and technology (Project No. 2016YXMS089).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Guo, J., Yuan, C., Zhao, Z. et al. Bi-branch deconvolution-based convolutional neural network for image classification. Multimed Tools Appl 77, 30233–30250 (2018). https://doi.org/10.1007/s11042-018-6130-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6130-2