More Web Proxy on the site http://driver.im/

research-article

Deep learning for visual understanding

Authors:

Michael S. LewAuthors Info & Claims

Neurocomputing, Volume 187, Issue C

Pages 27 - 48

https://doi.org/10.1016/j.neucom.2015.09.116

Published: 26 April 2016 Publication History

Abstract

Deep learning algorithms are a subset of the machine learning algorithms, which aim at discovering multiple levels of distributed representations. Recently, numerous deep learning algorithms have been proposed to solve traditional artificial intelligence problems. This work aims to review the state-of-the-art in deep learning algorithms in computer vision by highlighting the contributions and challenges from over 210 recent research papers. It first gives an overview of various deep learning approaches and their recent developments, and then briefly describes their applications in diverse vision tasks, such as image classification, object detection, image retrieval, semantic segmentation and human pose estimation. Finally, the paper summarizes the future trends and challenges in designing and training deep neural networks.

References

[1]

A. Bordes, X. Glorot, J. Weston, et al. Joint learning of words and meaning representations for open-text semantic parsing, in: Proceedings of the AISTATS, 2012.

[2]

D.C. Ciresan, U. Meier, J.¿Schmidhuber, Transfer learning for Latin and Chinese characters with deep neural networks, in: Proceedings of the IJCNN, 2012.

[3]

J.S.J. Ren, L.¿Xu, On vectorization of deep convolutional neural networks for vision tasks, in: Proceedings of the AAAI, 2015.

[4]

T. Mikolov, I. Sutskever, K. Chen, et al., Distributed representations of words and phrases and their compositionality, in: Proceedings of the NIPS, 2013.

Digital Library

[5]

D. Ciresan, U. Meier, J.¿Schmidhuber, Multi-column deep neural networks for image classification, in: Proceedings of the CVPR, 2012.

[6]

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Proceedings of the NIPS, 2012.

Digital Library

[7]

{http://www.image-net.org/challenges/LSVRC/2014/results}

[8]

Y. Bengio, Learning deep architectures for AI, Found. Trends¿ Mach. Learn., 2 (2009) 1-127.

Digital Library

[9]

L. Deng, A tutorial survey of architectures, algorithms, and applications for deep learning, APSIPA Trans. Signal Inf. Process., 3 (2014) e2.

[10]

J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw., 61 (2015) 85-117.

Digital Library

[11]

Y. Bengio, Deep Learning of Representations: Looking Forward, Statistical Language and Speech Processing, Springer, Berlin Heidelberg, 2013.

[12]

Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives, Pattern Anal. Mach. Intell. IEEE Trans., 35 (2013) 1798-1828.

Digital Library

[13]

Y. LeCun, Learning invariant feature hierarchies, in: Proceedings of the ECCV workshop, 2012.

Digital Library

[14]

R. Goroshin, Y. LeCun, Saturating auto-encoders, in: Proceedings of the ICLR, 2013.

[15]

H. Li, R. Zhao, X. Wang, Highly efficient forward and backward propagation of convolutional neural networks for pixelwise classification, arXiv preprint, arXiv: 1412.4526, 2014.

[16]

D. Erhan, Y. Bengio, A. Courville, Why does unsupervised pre-training help deep learning?, J. Mach. Learn. Res., 11 (2010) 625-660.

Digital Library

[17]

Y. LeCun, L. Bottou, Y. Bengio, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998) 2278-2324.

[18]

K. He, J. Sun, Convolutional neural networks at constrained time cost, in: Proceedings of the CVPR, 2015.

[19]

M. Zeiler, Hierarchical Convolutional Deep Learning in Computer Vision (Ph.D. thesis), New York University, 2014.

[20]

C. Szegedy, W. Liu, Y. Jia, et al., Going deeper with convolutions, in: Proceedings of the CVPR, 2015.

[21]

Min Lin, Qiang Chen, Shuicheng Yan, Network in network, in: Proceedings of the ICLR, 2013.

[22]

Y.L. Boureau, J. Ponce, Y. LeCun, A theoretical analysis of feature pooling in visual recognition, in: Proceedings of the ICML, 2010.

[23]

D. Scherer, A. Müller, S. Behnke, Evaluation of pooling operations in convolutional architectures for object recognition, in: Proceedings of the ICANN, 2010.

[24]

D.C. Cireşan, U. Meier, J. Masci, et al., High-performance neural networks for visual object classification, in: Proceedings of the IJCAI, 2011

[25]

M.D. Zeiler, R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, in: Proceedings of the ICLR, 2013.

[26]

K. He, X. Zhang, S. Ren, et al., Spatial pyramid pooling in deep convolutional networks for visual recognition, in: Proceedings of the ECCV, 2014.

[27]

W. Ouyang, P. Luo, X. Zeng, et al., DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection, in: Proceedings of the CVPR, 2015.

[28]

Y. Gong, L. Wang, R. Guo, et al., Multi-scale orderless pooling of deep convolutional activation features, in: Proceedings of the ECCV, 2014.

[29]

R. Girshick, J. Donahue, T. Darrell, et al., Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the CVPR, 2014.

Digital Library

[30]

M. Oquab, L. Bottou, I. Laptev, et al., Learning and transferring mid-level image representations using convolutional neural networks, in: Proceedings of the CVPR, 2014.

Digital Library

[31]

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: Proceedings of the ICLR, 2015.

[32]

X. Zeng, W. Ouyang, X. Wang, Multi-stage contextual deep learning for pedestrian detection, in: Proceedings of the ICCV, 2013.

[33]

Y. Sun, X. Wang, X. Tang, Deep convolutional network cascade for facial point detection, in: Proceedings of the CVPR, 2013.

[34]

B. Miclut, Committees of deep feedforward networks trained with few data, Pattern Recognition, Springer International Publishing, pp. 736-742, 2014.

[35]

J. Weston, F. Ratle, H. Mobahi.¿et al., Deep learning via semi-supervised embedding, Neural Networks: Tricks of the Trade, Springer, Berlin Heidelberg, pp. 639-655.

[36]

K. Simonyan, A. Vedaldi, A. Zisserman, Deep Fisher networks for large-scale image classification, in: Proceedings of the NIPS, 2013.

[37]

Q. Chen, Z. Song, Z. Huang, et al., Contextualizing object detection and classification, in: Proceedings of the CVPR, 2011.

[38]

G.E. Hinton, N. Srivastava, A. Krizhevsky, et al., Improving neural networks by preventing co-adaptation of feature detectors, arXiv preprint, arXiv: 1207.0580, 2012.

[39]

P. Baldi, P.J. Sadowski, Understanding dropout, in: Proceedings of the NIPS, 2013.

[40]

J. Ba, B. Frey, Adaptive dropout for training deep neural networks, in: Proceedings of the NIPS, 2013.

[41]

D. McAllester, A PAC-Bayesian tutorial with a dropout bound, arXiv preprint, arXiv: 1307.2118, 2013.

[42]

S. Wager, S. Wang, P. Liang, Dropout training as adaptive regularization, in: Proceedings of the NIPS, 2013.

[43]

S. Wang, C. Manning, Fast dropout training, in: Proceedings of the ICML, 2013.

[44]

N. Srivastava, G. Hinton, A. Krizhevsky, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014) 1929-1958.

Digital Library

[45]

D. Warde-Farley, I.J. Goodfellow, A. Courville, et al., An empirical analysis of dropout in piecewise linear networks, in: Proceedings of the ICLR, 2014.

[46]

L. Wan L, M. Zeiler, S. Zhang, et al., Regularization of neural networks using dropconnect, in: Proceedings of the ICML, 2013.

[47]

A.G. Howard, Some improvements on deep convolutional neural network based image classification, arXiv preprint, arXiv: 1312.5402, 2013.

[48]

A. Dosovitskiy, J.T. Springenberg, T. Brox, Unsupervised feature learning by augmenting single images, arXiv preprint, arXiv: 1312.5242, 2013.

[49]

G. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets, Neural Comput., 18 (2006) 1527-1554.

Digital Library

[50]

C. Poultney, S. Chopra, Y.L. Cun, Efficient learning of sparse representations with an energy-based model, in: Proceedings of the NIPS 2006.

[51]

H.O. Song, Y.J. Lee, S. Jegelka, et al., Weakly-supervised discovery of visual pattern configurations, in: Proceedings of the NIPS, 2014.

[52]

M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional neural networks, in: Proceedings of the ECCV, 2014.

[53]

G.E. Hinton, T.J. Sejnowski, MIT Press, Cambridge, MA, 1986.

[54]

M.A. Carreira-Perpinan, G.E. Hinton, On contrastive divergence learning, in: Proceedings of the tenth international workshop on artificial intelligence and statistics. NP: Society for Artificial Intelligence and Statistics, 2005, pp. 33-40.

[55]

G. Hinton, A practical guide to training restricted Boltzmann machines, Momentum, 9 (2010) 926.

[56]

K.H. Cho, T. Raiko, A.T. Ihler, Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines, in: Proceedings of the ICML, 2011.

[57]

V. Nair, G.E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the ICML, 2010.

[58]

I. Arel, D.C. Rose, T.P. Karnowski, Deep machine learning-a new frontier in artificial intelligence research research frontier, Comput. Intell. Mag. IEEE, 5 (2010) 13-18.

Digital Library

[59]

H. Lee, C. Ekanadham, A.Y. Ng, Sparse deep belief net model for visual area V2, in: Proceedings of the NIPS, 2008.

[60]

V. Nair, G.E. Hinton, 3D object recognition with deep belief nets, in: Proceedings of the NIPS, 2009.

[61]

H. Lee, R. Grosse, R. Ranganath, et al., Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in: Proceedings of the ICML, 2009.

[62]

H. Lee, R. Grosse, R. Ranganath, Unsupervised learning of hierarchical representations with convolutional deep belief networks, Commun. ACM, 54 (2011) 95-103.

Digital Library

[63]

Y. Tang, C. Eliasmith, Deep networks for robust visual recognition, in: Proceedings of the ICML, 2010.

[64]

G.B. Huang, H. Lee, E. Learned-Miller, Learning hierarchical representations for face verification with convolutional deep belief networks, in: Proceedings of the CVPR, 2012.

[65]

R. Salakhutdinov, G.E. Hinton, Deep boltzmann machines, in: Proceedings of the AISTATS, 2009.

[66]

R. Salakhutdinov, H. Larochelle, Efficient learning of deep Boltzmann machines, in: Proceedings of the AISTATS, 2010.

[67]

R. Salakhutdinov, G. Hinton, An efficient learning procedure for deep Boltzmann machines, Neural Comput., 24 (2012) 1967-2006.

Digital Library

[68]

G.E. Hinton, R. Salakhutdinov, A better way to pretrain deep Boltzmann machines, in: Proceedings of the NIPS, 2012.

[69]

K.H. Cho, T. Raiko, A. Ilin, et al., A two-stage pretraining algorithm for deep boltzmann machines, in: Proceedings of the ICANN, 2013.

Digital Library

[70]

G. Montavon K.R. Müller, Deep Boltzmann machines and the centering trick, Neural Networks: Tricks of the Trade, Springer, Berlin Heidelberg 2012, pp. 621-637.

[71]

I.J. Goodfellow, A. Courville, Y. Bengio, Joint training deep boltzmann machines for classification, arXiv preprint, arXiv: 1301.3568, 2013.

[72]

I. Goodfellow, M. Mirza, A. Courville, et al., Multi-prediction deep Boltzmann machines, in: Proceedings of the NIPS, 2013.

[73]

J. Ngiam, Z. Chen, P.W. Koh, et al., Learning deep energy models, in: Proceedings of the ICML, 2011.

[74]

S. Elfwing, E. Uchibe, K. Doya, Expected energy-based restricted Boltzmann machine for classification, Neural Netw. (2014).

[75]

C.Y. Liou, W.C. Cheng, J.W. Liou, Autoencoder for words, Neurocomputing, 139 (2014) 84-96.

Digital Library

[76]

G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 313 (2006) 504-507.

[77]

J. Zhang, S. Shan, M. Kan, et al., Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment, in: Proceedings of the ECCV, 2014.

[78]

X. Jiang, Y. Zhang, W. Zhang, et al., A novel sparse auto-encoder for deep unsupervised learning, in: Proceedings of the ICACI, 2013.

[79]

Y. Zhou, D. Arpit, I. Nwogu, et al., Is joint training better for deep auto-encoders? arXiv preprint, arXiv: 1405,1380, 2014.

[80]

I. Goodfellow, H. Lee, Q.V. Le, et al., Measuring invariances in deep networks, in: Proceedings of the NIPS, 2009.

[81]

J. Ngiam, A. Coates, A. Lahiri, et al., On optimization methods for deep learning, in: Proceedings of the ICML, 2011.

[82]

W.Y. Zou, A.Y. Ng, K. Yu, Unsupervised learning of visual invariance with temporal coherence, in: Proceedings of the NIPS workshop, 2011.

[83]

Simoncelli E P. 4.7 Statistical Modeling of Photographic Images, 2005.

[84]

Q.V. Le, Building high-level features using large scale unsupervised learning, in: Proceedings of the ICASSP, 2013.

[85]

P. Vincent, H. Larochelle, Y. Bengio, et al., Extracting and composing robust features with denoising autoencoders, in: Proceedings of the ICML, 2008.

Digital Library

[86]

P. Vincent, H. Larochelle, I. Lajoie, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res., 11 (2010) 3371-3408.

Digital Library

[87]

S. Rifai, P. Vincent, X. Muller, et al., Contractive auto-encoders: explicit invariance during feature extraction, in: Proceedings of the ICML, 2011.

[88]

G. Alain, Y. Bengio, What regularized auto-encoders learn from the data generating distribution, in: Proceedings of the ICLR, 2013.

[89]

G. Mesnil, Y. Dauphin, X. Glorot, et al., Unsupervised and transfer learning challenge: a deep learning approach, in: Proceedings of the ICML, 2012.

[90]

J. Masci, U. Meier, D. Cireşan, et al., Stacked convolutional auto-encoders for hierarchical feature extraction, in: Proceedings of the ICANN, 2011.

[91]

M. Baccouche, F. Mamalet, C. Wolf, et al., Spatio-temporal convolutional sparse auto-encoder for sequence classification, in: Proceedings of the BMVC, 2012.

[92]

B. Leng, S. Guo, X. Zhang, 3D object retrieval with stacked local convolutional autoencoder, Signal Process. (2014).

[93]

R. Memisevic, K. Konda, D. Krueger, Zero-bias autoencoders and the benefits of co-adapting features, in: Proceedings of the ICLR, 2015.

[94]

B.A. Olshausen, D.J. Field, Sparse coding with an overcomplete basis set: a strategy employed by V1?, Vis. Res., 37 (1997) 3311-3325.

[95]

K. Yu, T. Zhang, Y. Gong, Nonlinear learning using local coordinate coding, in: Proceedings of the NIPS, 2009.

Digital Library

[96]

R. Raina, A. Battle, H. Lee, et al., Self-taught learning: transfer learning from unlabeled data, in: Proceedings of the ICML, 2007.

[97]

J. Wang, J. Yang, K. Yu, et al., Locality-constrained linear coding for image classification, in: Proceedings of the CVPR, 2010.

[98]

J. Yang, K. Yu, Y. Gong, et al., Linear spatial pyramid matching using sparse coding for image classification, in: Proceedings of the CVPR, 2009.

[99]

D.L. Donoho, For most large underdetermined systems of linear equations the minimal ¿1¿norm solution is also the sparsest solution, Commun. Pure Appl. Math., 59 (2006) 797-829.

[100]

Y. Censor, Oxford University Press, Oxford, United Kingdom, 1997.

[101]

D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors, Nature, 323 (1986) 533-536.

[102]

H. Lee, A. Battle, R. Raina, et al., Efficient sparse coding algorithms, in: Proceedings of the NIPS, 2006.

[103]

J. Mairal, F. Bach, J. Ponce, et al., Online dictionary learning for sparse coding, in: Proceedings of the ICML, 2009.

[104]

J. Mairal, F. Bach, J. Ponce, Online learning for matrix factorization and sparse coding, J. Mach. Learn. Res., 11 (2010) 19-60.

Digital Library

[105]

J. Friedman, T. Hastie, H. Höfling, Pathwise coordinate optimization, Ann. Appl. Stat., 1 (2007) 302-332.

[106]

K. Gregor, Y. LeCun, Learning fast approximations of sparse coding, in: Proceedings of the ICML, 2010.

[107]

A. Chambolle, R.A. De Vore, N.Y. Lee, Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage, Image Process. IEEE Trans., 7 (1998) 319-335.

Digital Library

[108]

A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm with application to wavelet-based image deblurring, in: Proceedings of the ICASSP, 2009.

Digital Library

[109]

K. Kavukcuoglu, M.A. Ranzato, Y. LeCun, Fast inference in sparse coding algorithms with applications to object recognition, arXiv preprint, arXiv: 1010.3467, 2010.

[110]

K. Balasubramanian, K. Yu, G. Lebanon, Smooth sparse coding via marginal regression for learning sparse representations, in: Proceedings of the ICML, 2013.

[111]

S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, in: Proceedings of the CVPR, 2006.

[112]

A. Coates, A.Y. Ng, The importance of encoding versus training with sparse coding and vector quantization, in: Proceedings of the ICML, 2011.

[113]

S. Gao, I.W. Tsang, L.T. Chia, et al., Local features are not lonely-Laplacian sparse coding for image classification, in: Proceedings of the CVPR, 2010.

[114]

S. Gao, I.W.H. Tsang, L.T. Chia, Laplacian sparse coding, hypergraph laplacian sparse coding, and applications, Pattern Anal. Mach. Intell. IEEE Trans., 35 (2013) 92-104.

Digital Library

[115]

K. Yu, Y. Lin, J. Lafferty, Learning image representations from the pixel level via hierarchical sparse coding, in:¿Proceedings of the CVPR, 2011.

[116]

M.D. Zeiler, D. Krishnan, G.W. Taylor, et al., Deconvolutional networks, in: Proceedings of the CVPR, 2010.

[117]

M.D. Zeile, G.W. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in: Proceedings of the ICCV, 2011.

[118]

X. Zhou, K. Yu, T. Zhang, et al., Image classification using super-vector coding of local image descriptors, in: Proceedings of the ECCV, 2010.

[119]

Y. Lin, F. Lv, S. Zhu, et al., Large-scale image classification: fast feature extraction and svm training, in: Proceedings of the CVPR, 2011.

Digital Library

[120]

Y. He, K. Kavukcuoglu, Y. Wang, et al., Unsupervised feature learning by deep sparse coding, in: Proceedings of the SDM, 2014.

[121]

C. Szegedy, A. Toshev, D. Erhan, Deep neural networks for object detection, in: Proceedings of the NIPS, 2013.

[122]

P. Agrawal, R. Girshick, J. Malik, Analyzing the performance of multilayer neural networks for object recognition, in: Proceedings of the ECCV, 2014.

[123]

C.F. Cadieu, H. Hong, D.L.K. Yamins, Deep neural networks rival the representation of primate IT cortex for core visual object recognition, PloS Comput. Biol., 10 (2014) e1003963.

[124]

A. Nguyen, J. Yosinski, J. Clune, Deep neural networks are easily fooled: high confidence predictions for unrecognizable images, in: Proceedings of the CVPR 2015.

[125]

O. Firat, E. Aksan, I. Oztekin, et al., Learning deep temporal representations for brain decoding, arXiv preprint, arXiv: 1412.7522, 2014.

[126]

X. Chen, A. Shrivastava, A. Gupta, Neil: extracting visual knowledge from web data, in: Proceedings of the ICCV, 2013.

Digital Library

[127]

S.K. Divvala, A. Farhadi, C. Guestrin, Learning everything about anything: webly-supervised visual concept learning, in: Proceedings of the CVPR, 2014.

[128]

B. Zhou, V. Jagadeesh, R. Piramuthu, ConceptLearner: discovering visual concepts from weakly labeled image collections, in: Proceedings of the CVPR, 2015.

[129]

S. MASTER, Czech Technical University, 2014.

[130]

G. Csurka, C. Dance, L. Fan, et al., Visual categorization with bags of keypoints, in: Proceedings of the ECCV workshop, 2004.

[131]

B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM, 1992.

[132]

N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Proceedings of the CVPR, 2005.

[133]

X. Wang, T.X. Han, S. Yan, An HOG-LBP human detector with partial occlusion handling, in: Proceedings of the ICCV, 2009.

[134]

F. Perronnin, J. Sánchez, T. Mensink, Improving the fisher kernel for large-scale image classification, in: Proceedings of the ECCV, 2010.

[135]

T. Jaakkola, D. Haussler, Exploiting generative models in discriminative classifiers, in: Proceedings of the NIPS, 1999.

[136]

J. Deng, W. Dong, R. Socher, et al., Imagenet: a large-scale hierarchical image database, in: Proceedings of the CVPR, 2009.

[137]

H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the ICCV, 2015.

[138]

B. Hariharan, P. Arbeláez, R. Girshick, et al., Hypercolumns for object segmentation and fine-grained localization, in: Proceedings of the CVPR, 2015.

[139]

M. Mostajabi, P. Yadollahpour, G. Shakhnarovich, Feedforward semantic segmentation with zoom-out features, in: Proceedings of the CVPR, 2015.

[140]

J.L. Chu, A. Krzy¿ak, Analysis of feature maps selection in supervised learning using convolutional neural networks. Advances in Artificial Intelligence, Springer International Publishing, 2014, pp. 59-70.

[141]

W. Yu, K. Yang, Y. Bai, et al., Visualizing and comparing convolutional neural networks, arXiv preprint, arXiv: 1412.6631, 2014.

[142]

J. Hoffman, S. Guadarrama, E. Tzeng, et al., LSDA: Large Scale Detection Through Adaptation, in: Proceedings of the NIPS, 2014.

[143]

J. Hoffman, S. Guadarrama, E. Tzeng, et al., From large-scale object classifiers to large-scale object detectors: an adaptation approach, 2014

[144]

L.C. Chen, G. Papandreou, I. Kokkinos, et al., Semantic image segmentation with deep convolutional nets and fully connected CRFs, in: Proceedings of the ICLR, 2015.

[145]

P. Sermanet, D. Eigen, X. Zhang, et al., Overfeat: integrated recognition, localization and detection using convolutional networks, in: Proceedings of the ICLR, 2014.

[146]

J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the CVPR, 2015.

[147]

D. Erhan, C. Szegedy, A. Toshev, et al., Scalable object detection using deep neural networks, in: Proceedings of the CVPR, 2014.

Digital Library

[148]

J. Dai, K. He, J. Sun, Convolutional feature masking for joint object and stuff segmentation, in: Proceedings of the CVPR, 2015.

[149]

Y. Liu, Y. Guo, S. Wu, et al., Deep index for accurate and efficient image retrieval, in: Proceedings of the ICMR, 2015.

Digital Library

[150]

B. Alexe, T. Deselaers, V. Ferrari, Measuring the objectness of image windows, Pattern Anal. Mach. Intell. IEEE Trans., 34 (2012) 2189-2202.

Digital Library

[151]

J.R.R. Uijlings, K.E.A. van de Sande, T. Gevers, Selective search for object recognition, Int. J. Comput. Vis., 104 (2013) 154-171.

Digital Library

[152]

I. Endres, D. Hoiem, Category independent object proposals, in: Proceedings of the ECCV, 2010.

[153]

M.M. Cheng, Z. Zhang, W.Y. Lin, et al., BING: binarized normed gradients for objectness estimation at 300fps, in: Proceedings of the CVPR, 2014.

Digital Library

[154]

C.L. Zitnick, P. Dollár, Edge boxes: locating object proposals from edges, in: Proceedings of the ECCV, 2014.

[155]

J. Hosang, R. Benenson, B. Schiele, How good are detection proposals, really?, in: Proceedings of the BMVC, 2014.

[156]

Y. Liu, Y. Guo, S. Wu, M. Lew, DeepIndex for accurate and efficient image retrieval, in: Proceedings of the ICMR, 2015.

Digital Library

[157]

L. Zheng, S. Wang, F. He, Q. Tian, Seeing the big picture: deep embedding with contextual evidences, arXiv preprint, arXiv: 1406.0132, 2014.

[158]

Z. Yan, V. Jagadeesh, D. DeCoste, et al., HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification, in: Proceedings of the ICCV, 2015.

[159]

R. Wu, S. Yan, Y. Shan, et al., Deep image: scaling up image recognition, arXiv preprint, arXiv: 1501.02876, 2015.

[160]

J. Ngiam, Z. Chen, D. Chia, et al., Tiled convolutional neural networks, in: Proceedings of the NIPS, 2010.

[161]

L. Younes, On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates, Stoch.: Int. J. Probab. Stoch. Process., 65 (1999) 177-228.

[162]

K. He, X. Zhang, S. Ren, et al., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: Proceedings of the ICCV, 2015.

Digital Library

[163]

S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in: Proceedings of the NIPS, 2015.

[164]

B. Hariharan, P. Arbeláez, R. Girshick, et al., Simultaneous detection and segmentation, in: Proceedings of the ECCV, 2014.

[165]

A.S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson, CNN features off-the-shelf an astounding baseline for recognition, in: Proceedings of the CVPR Workshop, 2014.

[166]

J. Wan, D. Wang, S. Hoi, et al., Deep Learning for content-based image retrieval: a comprehensive study, in: Proceedings of the Multimedia, 2014.

Digital Library

[167]

J. Yosinski, J. Clune, Y. Bengio, et al., How transferable are features in deep neural networks, in: Proceedings of the NIPS, 2014.

[168]

A. Eslami, N. Heess, J. Winn, The shape Boltzmann machine: a strong model of object shape, in: Proceedings of the CVPR, 2012.

[169]

A. Kae, K. Sohn, H. Lee, et al., Augmenting CRFs with Boltzmann machine shape priors for image labeling, in: Proceedings of the CVPR, 2013.

[170]

G.E. Dahl, M.A. Ranzato, A. Mohamed, et al., Phone Recognition with the mean-covariance restricted Boltzmann machine, in: Proceedings of the NIPS, 2010.

[171]

S. Sun, W. Zhou, H. Li, et al., Search by detection-object-level feature for image retrieval, in: Proceedings of the ICIMCS, 2014.

[172]

A. Babenko, A. Slesarev, A. Chigorin, et al., Neural codes for image retrieval, in: Proceedings of the ECCV, 2014.

[173]

M. Oquab, L. Bottou, I. Laptev, et al., Is object localization for free? - Weakly-supervised learning with convolutional neural networks, in: Proceedings of the CVPR, 2015.

[174]

N. Srivastava, R.R. Salakhutdinov, Multimodal learning with deep boltzmann machines, in: Proceedings of the NIPS, 2012.

[175]

M.A. Carreira-Perpinán, W. Wang, Distributed optimization of deeply nested systems, in: Proceedings of the AISTATS, 2014.

[176]

P.F. Felzenszwalb, R.B. Girshick, D. McAllester, Object detection with discriminatively trained part-based models, Pattern Anal. Mach. Intell. IEEE Trans., 32 (2010) 1627-1645.

Digital Library

[177]

R. Girshick, Fast R-CNN, in: Proceedings of the ICCV, 2015.

[178]

S. Ren, K. He, R. Girshick, et al., Faster R-CNN: towards real-time object detection with region proposal networks, in: Proceedings of the NIPS, 2015.

[179]

J. Redmon, S. Divvala, R. Girshick, et al., You only look once: unified, real-time object detection, arXiv preprint, arXiv: 1506.02640, 2015.

[180]

Q. Dai, D. Hoiem, Learning to localize detected objects, in: Proceedings of the CVPR, 2012.

[181]

D. Hoiem, Y. Chodpathumwan, Q. Dai, Diagnosing error in object detectors, in: Proceedings of the ECCV, 2012.

[182]

J. Dong, Q. Chen, S. Yan, et al., Towards unified object detection and semantic segmentation, in: Proceedings of the ECCV, 2014.

[183]

Y. Zhu, R. Urtasun, R. Salakhutdinov, et al., segDeepM: exploiting segmentation and context in deep neural networks for object detection, in: Proceedings of the CVPR, 2015.

[184]

S. Gidaris, N. Komodakis, Object detection via a multi-region and semantic segmentation-aware CNN model, in: Proceedings of the ICCV, 2015.

[185]

Y. Zhang, K. Sohn, R. Villegas, et al., Improving object detection with deep convolutional networks via bayesian optimization and structured prediction, in: Proceedings of the CVPR, 2015.

[186]

S. Ren, K. He, R. Girshick, et al., Object detection networks on convolutional feature maps, arXiv preprint, arXiv: 1504.06066, 2015.

[187]

X. Liang, S. Liu, Y. Wei, et al., Towards computational baby learning: a weakly-supervised approach for object detection, in: Proceedings of the ICCV, 2015.

Digital Library

[188]

S. Xie, Z. Tu,¿Holistically-nested edge detection, in: Proceedings of the ICCV, 2015.

[189]

O. Russakovsky, J. Deng, H. Su, Imagenet large scale visual recognition challenge, Int. J. Comput. Vis., 115 (2015) 211-252.

Digital Library

[190]

X. Wang, L. Zhang, L. Lin, et al., Deep joint task learning for generic object extraction, in: Proceedings of the NIPS, 2014.

[191]

D. Yoo, S. Park, J.Y. Lee, et al., Multi-scale pyramid pooling for deep convolutional representation, in: Proceedings of the CVPR Workshop, 2015.

[192]

A. Jain, J. Tompson, Y. LeCun, et al., Modeep: a deep learning framework using motion features for human pose estimation, in: Proceedings of the ACCV, 2014.

[193]

T. Pfister, K. Simonyan, J. Charles, et al., Deep convolutional neural networks for efficient pose estimation in gesture videos, in: Proceedings of the ACCV, 2015.

[194]

T. Pfister, J. Charles, A. Zisserman, Flowing convnets for human pose estimation in videos, in: Proceedings of the ICCV, 2015.

[195]

J. Yu, Y. Guo, D. Tao, Human pose recovery by supervised spectral embedding, Neurocomputing, 166 (2015) 301-308.

Digital Library

[196]

P.F. Felzenszwalb, D.P. Huttenlocher, Pictorial structures for object recognition, Int. J. Comput. Vis., 99 (2005) 190-214.

[197]

Y. Tian, C.L. Zitnick, S.G. Narasimhan, Exploring the spatial hierarchy of mixture models for human pose estimation, in: Proceedings of the ECCV, 2012.

[198]

F. Wang, Y. Li, Beyond physical connections: tree models in human pose estimation, in: Proceedings of the CVPR, 2013.

[199]

L. Pishchulin, M. Andriluka, P. Gehler, et al., Poselet conditioned pictorial structures, in: Proceedings of the CVPR, 2013.

Digital Library

[200]

M. Dantone, J. Gall, C. Leistner, et al., Human pose estimation using body parts dependent joint regressors, in: Proceedings of the CVPR, 2013.

Digital Library

[201]

B. Sapp, B. Taskar, Modec: multimodal decomposable models for human pose estimation, in: Proceedings of the CVPR, 2013.

Digital Library

[202]

S. Johnson, M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, in: Proceedings of the BMVC, 2010.

[203]

M. Eichner, M. Marin-Jimenez, A. Zisserman, 2d articulated human pose estimation and retrieval in (almost) unconstrained still images, Int. J. Comput. Vis., 99 (2012) 190-214.

Digital Library

[204]

A. Toshev, C. Szegedy, Deeppose: human pose estimation via deep neural networks, in: Proceedings of the CVPR, 2014.

Digital Library

[205]

X. Chen, A.L. Yuille, Articulated pose estimation by a graphical model with image dependent pairwise relations, in: Proceedings of the NIPS, 2014.

[206]

A. Jain, J. Tompson, M. Andriluka, et al., Learning human pose estimation features with convolutional networks, in: Proceedings of the ICLR, 2014.

[207]

J.J. Tompson, A. Jain, Y. LeCun, et al., Joint training of a convolutional network and a graphical model for human pose estimation, in: Proceedings of the NIPS, 2014.

[208]

J. Tompson, R. Goroshin, A. Jain, et al., Efficient object localization using convolutional networks, in: Proceedings of the CVPR, 2015.

[209]

W. Ouyang, X. Chu, X. Wang, Multi-source deep learning for human pose estimation, in: Proceedings of the CVPR, 2014.

[210]

X. Fan, K. Zheng, Y. Lin, et al., Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation, in: Proceedings of the CVPR, 2015.

[211]

J. Carreira, P. Agrawal, K. Fragkiadaki, et al., Human pose estimation with iterative error feedback, arXiv preprint, arXiv: 1507.06550, 2015.

[212]

C.H. Huang, E. Boyer, S. Ilic, Robust human body shape and pose tracking, in: Proceedings of the 3D Vision-3DV, 2013.

[213]

G. Lin, C. Shen, I. Reid, et al., Efficient piecewise training of deep structured models for semantic segmentation, arXiv preprint, arXiv: 1504.01013, 2015.

[214]

S. Zheng, S. Jayasumana, B. Romera-Paredes, et al., Conditional random fields as recurrent neural networks, in: Proceedings of the ICCV, 2015.

Digital Library

[215]

G. Papandreou, L. Chen, K. Murphy, et al., Weakly- and semi-supervised learning of a DCNN for semantic image segmentation, in: Proceedings of the ICCV, 2015.

[216]

J. Dai, K. He, J. Sun, Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation, in: Proceedings of the ICCV, 2015.

Digital Library

Cited By

Lyu ZLi YZhu GXu JVincent Poor HCui S(2025)Rethinking Resource Management in Edge Learning: A Joint Pre-Training and Fine-Tuning Design ParadigmIEEE Transactions on Wireless Communications10.1109/TWC.2024.351041824:2(1584-1601)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1109/TWC.2024.3510418
Zhu DHan ZDu XZuo DCai LXue C(2025)Hybrid Model Integrating Fuzzy Systems and Convolutional Factorization Machine for Delivery Time Prediction in Intelligent LogisticsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.347204333:1(406-417)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TFUZZ.2024.3472043
Grazian CJin QTangari G(2025)Assessing the invertibility of deep biometric representationsExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125848264:COnline publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1016/j.eswa.2024.125848
Show More Cited By

Recommendations

Taxonomy, state-of-the-art, challenges and applications of visual understanding: A review
Abstract
Since the dawn of Humanity, to communicate both abstract and concrete ideas, visualization through visual imagery has been an effective way. With the advancement of scientific technologies, vision has been imparted to machines like ...
A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets
Abstract
The research progress in multimodal learning has grown rapidly over the last decade in several areas, especially in computer vision. The growing potential of multimodal data streams and deep learning algorithms has contributed to the increasing ...
Deep reinforcement learning in computer vision: a comprehensive survey
Abstract
Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 187, Issue C

April 2016

133 pages

ISSN:0925-2312

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 26 April 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

291
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lyu ZLi YZhu GXu JVincent Poor HCui S(2025)Rethinking Resource Management in Edge Learning: A Joint Pre-Training and Fine-Tuning Design ParadigmIEEE Transactions on Wireless Communications10.1109/TWC.2024.351041824:2(1584-1601)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1109/TWC.2024.3510418
Zhu DHan ZDu XZuo DCai LXue C(2025)Hybrid Model Integrating Fuzzy Systems and Convolutional Factorization Machine for Delivery Time Prediction in Intelligent LogisticsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.347204333:1(406-417)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TFUZZ.2024.3472043
Grazian CJin QTangari G(2025)Assessing the invertibility of deep biometric representationsExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125848264:COnline publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1016/j.eswa.2024.125848
Wang KLi ZChen YDong WChen J(2025)Towards open-world recognitionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2025.110042143:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.engappai.2025.110042
Chen CMat Isa NLiu X(2025)A review of convolutional neural network based methods for medical image classificationComputers in Biology and Medicine10.1016/j.compbiomed.2024.109507185:COnline publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1016/j.compbiomed.2024.109507
Singh MSingh M(2025)Content-Based Gastric Image Retrieval Using Fusion of Deep Learning Features with Dimensionality ReductionSN Computer Science10.1007/s42979-025-03713-y6:2Online publication date: 17-Feb-2025
https://dl.acm.org/doi/10.1007/s42979-025-03713-y
Latre-Campo JBueno-Crespo ARodríguez-Bermúdez GPereñíguez-García F(2025)Visual monitoring of landing gear in fighters using deep learningNeural Computing and Applications10.1007/s00521-024-10802-137:6(5141-5154)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s00521-024-10802-1
Liu YQiao XPei YWang LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Deep functional factor modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693352(31709-31727)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693352
Liu LZhang JYang JCai Y(2024)Research on the Application of Artificial Intelligence in Scenic Mobility Scooter Context-Aware Design MethodologyProceedings of the 2024 2nd International Conference on Advances in Artificial Intelligence and Applications10.1145/3712623.3712664(238-244)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3712623.3712664
Farhadloo MSharma AShekhar SMarkovic S(2024)Spatial Computing Opportunities in Biomedical Decision Support: The Atlas-EHR VisionACM Transactions on Spatial Algorithms and Systems10.1145/367920110:3(1-36)Online publication date: 23-Jul-2024
https://dl.acm.org/doi/10.1145/3679201
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents