[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Deep learning for visual understanding

Published: 26 April 2016 Publication History

Abstract

Deep learning algorithms are a subset of the machine learning algorithms, which aim at discovering multiple levels of distributed representations. Recently, numerous deep learning algorithms have been proposed to solve traditional artificial intelligence problems. This work aims to review the state-of-the-art in deep learning algorithms in computer vision by highlighting the contributions and challenges from over 210 recent research papers. It first gives an overview of various deep learning approaches and their recent developments, and then briefly describes their applications in diverse vision tasks, such as image classification, object detection, image retrieval, semantic segmentation and human pose estimation. Finally, the paper summarizes the future trends and challenges in designing and training deep neural networks.

References

[1]
A. Bordes, X. Glorot, J. Weston, et al. Joint learning of words and meaning representations for open-text semantic parsing, in: Proceedings of the AISTATS, 2012.
[2]
D.C. Ciresan, U. Meier, J.¿Schmidhuber, Transfer learning for Latin and Chinese characters with deep neural networks, in: Proceedings of the IJCNN, 2012.
[3]
J.S.J. Ren, L.¿Xu, On vectorization of deep convolutional neural networks for vision tasks, in: Proceedings of the AAAI, 2015.
[4]
T. Mikolov, I. Sutskever, K. Chen, et al., Distributed representations of words and phrases and their compositionality, in: Proceedings of the NIPS, 2013.
[5]
D. Ciresan, U. Meier, J.¿Schmidhuber, Multi-column deep neural networks for image classification, in: Proceedings of the CVPR, 2012.
[6]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Proceedings of the NIPS, 2012.
[7]
{http://www.image-net.org/challenges/LSVRC/2014/results}
[8]
Y. Bengio, Learning deep architectures for AI, Found. Trends¿ Mach. Learn., 2 (2009) 1-127.
[9]
L. Deng, A tutorial survey of architectures, algorithms, and applications for deep learning, APSIPA Trans. Signal Inf. Process., 3 (2014) e2.
[10]
J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw., 61 (2015) 85-117.
[11]
Y. Bengio, Deep Learning of Representations: Looking Forward, Statistical Language and Speech Processing, Springer, Berlin Heidelberg, 2013.
[12]
Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives, Pattern Anal. Mach. Intell. IEEE Trans., 35 (2013) 1798-1828.
[13]
Y. LeCun, Learning invariant feature hierarchies, in: Proceedings of the ECCV workshop, 2012.
[14]
R. Goroshin, Y. LeCun, Saturating auto-encoders, in: Proceedings of the ICLR, 2013.
[15]
H. Li, R. Zhao, X. Wang, Highly efficient forward and backward propagation of convolutional neural networks for pixelwise classification, arXiv preprint, arXiv: 1412.4526, 2014.
[16]
D. Erhan, Y. Bengio, A. Courville, Why does unsupervised pre-training help deep learning?, J. Mach. Learn. Res., 11 (2010) 625-660.
[17]
Y. LeCun, L. Bottou, Y. Bengio, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998) 2278-2324.
[18]
K. He, J. Sun, Convolutional neural networks at constrained time cost, in: Proceedings of the CVPR, 2015.
[19]
M. Zeiler, Hierarchical Convolutional Deep Learning in Computer Vision (Ph.D. thesis), New York University, 2014.
[20]
C. Szegedy, W. Liu, Y. Jia, et al., Going deeper with convolutions, in: Proceedings of the CVPR, 2015.
[21]
Min Lin, Qiang Chen, Shuicheng Yan, Network in network, in: Proceedings of the ICLR, 2013.
[22]
Y.L. Boureau, J. Ponce, Y. LeCun, A theoretical analysis of feature pooling in visual recognition, in: Proceedings of the ICML, 2010.
[23]
D. Scherer, A. Müller, S. Behnke, Evaluation of pooling operations in convolutional architectures for object recognition, in: Proceedings of the ICANN, 2010.
[24]
D.C. Cireşan, U. Meier, J. Masci, et al., High-performance neural networks for visual object classification, in: Proceedings of the IJCAI, 2011
[25]
M.D. Zeiler, R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, in: Proceedings of the ICLR, 2013.
[26]
K. He, X. Zhang, S. Ren, et al., Spatial pyramid pooling in deep convolutional networks for visual recognition, in: Proceedings of the ECCV, 2014.
[27]
W. Ouyang, P. Luo, X. Zeng, et al., DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection, in: Proceedings of the CVPR, 2015.
[28]
Y. Gong, L. Wang, R. Guo, et al., Multi-scale orderless pooling of deep convolutional activation features, in: Proceedings of the ECCV, 2014.
[29]
R. Girshick, J. Donahue, T. Darrell, et al., Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the CVPR, 2014.
[30]
M. Oquab, L. Bottou, I. Laptev, et al., Learning and transferring mid-level image representations using convolutional neural networks, in: Proceedings of the CVPR, 2014.
[31]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: Proceedings of the ICLR, 2015.
[32]
X. Zeng, W. Ouyang, X. Wang, Multi-stage contextual deep learning for pedestrian detection, in: Proceedings of the ICCV, 2013.
[33]
Y. Sun, X. Wang, X. Tang, Deep convolutional network cascade for facial point detection, in: Proceedings of the CVPR, 2013.
[34]
B. Miclut, Committees of deep feedforward networks trained with few data, Pattern Recognition, Springer International Publishing, pp. 736-742, 2014.
[35]
J. Weston, F. Ratle, H. Mobahi.¿et al., Deep learning via semi-supervised embedding, Neural Networks: Tricks of the Trade, Springer, Berlin Heidelberg, pp. 639-655.
[36]
K. Simonyan, A. Vedaldi, A. Zisserman, Deep Fisher networks for large-scale image classification, in: Proceedings of the NIPS, 2013.
[37]
Q. Chen, Z. Song, Z. Huang, et al., Contextualizing object detection and classification, in: Proceedings of the CVPR, 2011.
[38]
G.E. Hinton, N. Srivastava, A. Krizhevsky, et al., Improving neural networks by preventing co-adaptation of feature detectors, arXiv preprint, arXiv: 1207.0580, 2012.
[39]
P. Baldi, P.J. Sadowski, Understanding dropout, in: Proceedings of the NIPS, 2013.
[40]
J. Ba, B. Frey, Adaptive dropout for training deep neural networks, in: Proceedings of the NIPS, 2013.
[41]
D. McAllester, A PAC-Bayesian tutorial with a dropout bound, arXiv preprint, arXiv: 1307.2118, 2013.
[42]
S. Wager, S. Wang, P. Liang, Dropout training as adaptive regularization, in: Proceedings of the NIPS, 2013.
[43]
S. Wang, C. Manning, Fast dropout training, in: Proceedings of the ICML, 2013.
[44]
N. Srivastava, G. Hinton, A. Krizhevsky, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014) 1929-1958.
[45]
D. Warde-Farley, I.J. Goodfellow, A. Courville, et al., An empirical analysis of dropout in piecewise linear networks, in: Proceedings of the ICLR, 2014.
[46]
L. Wan L, M. Zeiler, S. Zhang, et al., Regularization of neural networks using dropconnect, in: Proceedings of the ICML, 2013.
[47]
A.G. Howard, Some improvements on deep convolutional neural network based image classification, arXiv preprint, arXiv: 1312.5402, 2013.
[48]
A. Dosovitskiy, J.T. Springenberg, T. Brox, Unsupervised feature learning by augmenting single images, arXiv preprint, arXiv: 1312.5242, 2013.
[49]
G. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets, Neural Comput., 18 (2006) 1527-1554.
[50]
C. Poultney, S. Chopra, Y.L. Cun, Efficient learning of sparse representations with an energy-based model, in: Proceedings of the NIPS 2006.
[51]
H.O. Song, Y.J. Lee, S. Jegelka, et al., Weakly-supervised discovery of visual pattern configurations, in: Proceedings of the NIPS, 2014.
[52]
M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional neural networks, in: Proceedings of the ECCV, 2014.
[53]
G.E. Hinton, T.J. Sejnowski, MIT Press, Cambridge, MA, 1986.
[54]
M.A. Carreira-Perpinan, G.E. Hinton, On contrastive divergence learning, in: Proceedings of the tenth international workshop on artificial intelligence and statistics. NP: Society for Artificial Intelligence and Statistics, 2005, pp. 33-40.
[55]
G. Hinton, A practical guide to training restricted Boltzmann machines, Momentum, 9 (2010) 926.
[56]
K.H. Cho, T. Raiko, A.T. Ihler, Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines, in: Proceedings of the ICML, 2011.
[57]
V. Nair, G.E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the ICML, 2010.
[58]
I. Arel, D.C. Rose, T.P. Karnowski, Deep machine learning-a new frontier in artificial intelligence research research frontier, Comput. Intell. Mag. IEEE, 5 (2010) 13-18.
[59]
H. Lee, C. Ekanadham, A.Y. Ng, Sparse deep belief net model for visual area V2, in: Proceedings of the NIPS, 2008.
[60]
V. Nair, G.E. Hinton, 3D object recognition with deep belief nets, in: Proceedings of the NIPS, 2009.
[61]
H. Lee, R. Grosse, R. Ranganath, et al., Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in: Proceedings of the ICML, 2009.
[62]
H. Lee, R. Grosse, R. Ranganath, Unsupervised learning of hierarchical representations with convolutional deep belief networks, Commun. ACM, 54 (2011) 95-103.
[63]
Y. Tang, C. Eliasmith, Deep networks for robust visual recognition, in: Proceedings of the ICML, 2010.
[64]
G.B. Huang, H. Lee, E. Learned-Miller, Learning hierarchical representations for face verification with convolutional deep belief networks, in: Proceedings of the CVPR, 2012.
[65]
R. Salakhutdinov, G.E. Hinton, Deep boltzmann machines, in: Proceedings of the AISTATS, 2009.
[66]
R. Salakhutdinov, H. Larochelle, Efficient learning of deep Boltzmann machines, in: Proceedings of the AISTATS, 2010.
[67]
R. Salakhutdinov, G. Hinton, An efficient learning procedure for deep Boltzmann machines, Neural Comput., 24 (2012) 1967-2006.
[68]
G.E. Hinton, R. Salakhutdinov, A better way to pretrain deep Boltzmann machines, in: Proceedings of the NIPS, 2012.
[69]
K.H. Cho, T. Raiko, A. Ilin, et al., A two-stage pretraining algorithm for deep boltzmann machines, in: Proceedings of the ICANN, 2013.
[70]
G. Montavon K.R. Müller, Deep Boltzmann machines and the centering trick, Neural Networks: Tricks of the Trade, Springer, Berlin Heidelberg 2012, pp. 621-637.
[71]
I.J. Goodfellow, A. Courville, Y. Bengio, Joint training deep boltzmann machines for classification, arXiv preprint, arXiv: 1301.3568, 2013.
[72]
I. Goodfellow, M. Mirza, A. Courville, et al., Multi-prediction deep Boltzmann machines, in: Proceedings of the NIPS, 2013.
[73]
J. Ngiam, Z. Chen, P.W. Koh, et al., Learning deep energy models, in: Proceedings of the ICML, 2011.
[74]
S. Elfwing, E. Uchibe, K. Doya, Expected energy-based restricted Boltzmann machine for classification, Neural Netw. (2014).
[75]
C.Y. Liou, W.C. Cheng, J.W. Liou, Autoencoder for words, Neurocomputing, 139 (2014) 84-96.
[76]
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 313 (2006) 504-507.
[77]
J. Zhang, S. Shan, M. Kan, et al., Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment, in: Proceedings of the ECCV, 2014.
[78]
X. Jiang, Y. Zhang, W. Zhang, et al., A novel sparse auto-encoder for deep unsupervised learning, in: Proceedings of the ICACI, 2013.
[79]
Y. Zhou, D. Arpit, I. Nwogu, et al., Is joint training better for deep auto-encoders? arXiv preprint, arXiv: 1405,1380, 2014.
[80]
I. Goodfellow, H. Lee, Q.V. Le, et al., Measuring invariances in deep networks, in: Proceedings of the NIPS, 2009.
[81]
J. Ngiam, A. Coates, A. Lahiri, et al., On optimization methods for deep learning, in: Proceedings of the ICML, 2011.
[82]
W.Y. Zou, A.Y. Ng, K. Yu, Unsupervised learning of visual invariance with temporal coherence, in: Proceedings of the NIPS workshop, 2011.
[83]
Simoncelli E P. 4.7 Statistical Modeling of Photographic Images, 2005.
[84]
Q.V. Le, Building high-level features using large scale unsupervised learning, in: Proceedings of the ICASSP, 2013.
[85]
P. Vincent, H. Larochelle, Y. Bengio, et al., Extracting and composing robust features with denoising autoencoders, in: Proceedings of the ICML, 2008.
[86]
P. Vincent, H. Larochelle, I. Lajoie, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res., 11 (2010) 3371-3408.
[87]
S. Rifai, P. Vincent, X. Muller, et al., Contractive auto-encoders: explicit invariance during feature extraction, in: Proceedings of the ICML, 2011.
[88]
G. Alain, Y. Bengio, What regularized auto-encoders learn from the data generating distribution, in: Proceedings of the ICLR, 2013.
[89]
G. Mesnil, Y. Dauphin, X. Glorot, et al., Unsupervised and transfer learning challenge: a deep learning approach, in: Proceedings of the ICML, 2012.
[90]
J. Masci, U. Meier, D. Cireşan, et al., Stacked convolutional auto-encoders for hierarchical feature extraction, in: Proceedings of the ICANN, 2011.
[91]
M. Baccouche, F. Mamalet, C. Wolf, et al., Spatio-temporal convolutional sparse auto-encoder for sequence classification, in: Proceedings of the BMVC, 2012.
[92]
B. Leng, S. Guo, X. Zhang, 3D object retrieval with stacked local convolutional autoencoder, Signal Process. (2014).
[93]
R. Memisevic, K. Konda, D. Krueger, Zero-bias autoencoders and the benefits of co-adapting features, in: Proceedings of the ICLR, 2015.
[94]
B.A. Olshausen, D.J. Field, Sparse coding with an overcomplete basis set: a strategy employed by V1?, Vis. Res., 37 (1997) 3311-3325.
[95]
K. Yu, T. Zhang, Y. Gong, Nonlinear learning using local coordinate coding, in: Proceedings of the NIPS, 2009.
[96]
R. Raina, A. Battle, H. Lee, et al., Self-taught learning: transfer learning from unlabeled data, in: Proceedings of the ICML, 2007.
[97]
J. Wang, J. Yang, K. Yu, et al., Locality-constrained linear coding for image classification, in: Proceedings of the CVPR, 2010.
[98]
J. Yang, K. Yu, Y. Gong, et al., Linear spatial pyramid matching using sparse coding for image classification, in: Proceedings of the CVPR, 2009.
[99]
D.L. Donoho, For most large underdetermined systems of linear equations the minimal ¿1¿norm solution is also the sparsest solution, Commun. Pure Appl. Math., 59 (2006) 797-829.
[100]
Y. Censor, Oxford University Press, Oxford, United Kingdom, 1997.
[101]
D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors, Nature, 323 (1986) 533-536.
[102]
H. Lee, A. Battle, R. Raina, et al., Efficient sparse coding algorithms, in: Proceedings of the NIPS, 2006.
[103]
J. Mairal, F. Bach, J. Ponce, et al., Online dictionary learning for sparse coding, in: Proceedings of the ICML, 2009.
[104]
J. Mairal, F. Bach, J. Ponce, Online learning for matrix factorization and sparse coding, J. Mach. Learn. Res., 11 (2010) 19-60.
[105]
J. Friedman, T. Hastie, H. Höfling, Pathwise coordinate optimization, Ann. Appl. Stat., 1 (2007) 302-332.
[106]
K. Gregor, Y. LeCun, Learning fast approximations of sparse coding, in: Proceedings of the ICML, 2010.
[107]
A. Chambolle, R.A. De Vore, N.Y. Lee, Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage, Image Process. IEEE Trans., 7 (1998) 319-335.
[108]
A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm with application to wavelet-based image deblurring, in: Proceedings of the ICASSP, 2009.
[109]
K. Kavukcuoglu, M.A. Ranzato, Y. LeCun, Fast inference in sparse coding algorithms with applications to object recognition, arXiv preprint, arXiv: 1010.3467, 2010.
[110]
K. Balasubramanian, K. Yu, G. Lebanon, Smooth sparse coding via marginal regression for learning sparse representations, in: Proceedings of the ICML, 2013.
[111]
S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, in: Proceedings of the CVPR, 2006.
[112]
A. Coates, A.Y. Ng, The importance of encoding versus training with sparse coding and vector quantization, in: Proceedings of the ICML, 2011.
[113]
S. Gao, I.W. Tsang, L.T. Chia, et al., Local features are not lonely-Laplacian sparse coding for image classification, in: Proceedings of the CVPR, 2010.
[114]
S. Gao, I.W.H. Tsang, L.T. Chia, Laplacian sparse coding, hypergraph laplacian sparse coding, and applications, Pattern Anal. Mach. Intell. IEEE Trans., 35 (2013) 92-104.
[115]
K. Yu, Y. Lin, J. Lafferty, Learning image representations from the pixel level via hierarchical sparse coding, in:¿Proceedings of the CVPR, 2011.
[116]
M.D. Zeiler, D. Krishnan, G.W. Taylor, et al., Deconvolutional networks, in: Proceedings of the CVPR, 2010.
[117]
M.D. Zeile, G.W. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in: Proceedings of the ICCV, 2011.
[118]
X. Zhou, K. Yu, T. Zhang, et al., Image classification using super-vector coding of local image descriptors, in: Proceedings of the ECCV, 2010.
[119]
Y. Lin, F. Lv, S. Zhu, et al., Large-scale image classification: fast feature extraction and svm training, in: Proceedings of the CVPR, 2011.
[120]
Y. He, K. Kavukcuoglu, Y. Wang, et al., Unsupervised feature learning by deep sparse coding, in: Proceedings of the SDM, 2014.
[121]
C. Szegedy, A. Toshev, D. Erhan, Deep neural networks for object detection, in: Proceedings of the NIPS, 2013.
[122]
P. Agrawal, R. Girshick, J. Malik, Analyzing the performance of multilayer neural networks for object recognition, in: Proceedings of the ECCV, 2014.
[123]
C.F. Cadieu, H. Hong, D.L.K. Yamins, Deep neural networks rival the representation of primate IT cortex for core visual object recognition, PloS Comput. Biol., 10 (2014) e1003963.
[124]
A. Nguyen, J. Yosinski, J. Clune, Deep neural networks are easily fooled: high confidence predictions for unrecognizable images, in: Proceedings of the CVPR 2015.
[125]
O. Firat, E. Aksan, I. Oztekin, et al., Learning deep temporal representations for brain decoding, arXiv preprint, arXiv: 1412.7522, 2014.
[126]
X. Chen, A. Shrivastava, A. Gupta, Neil: extracting visual knowledge from web data, in: Proceedings of the ICCV, 2013.
[127]
S.K. Divvala, A. Farhadi, C. Guestrin, Learning everything about anything: webly-supervised visual concept learning, in: Proceedings of the CVPR, 2014.
[128]
B. Zhou, V. Jagadeesh, R. Piramuthu, ConceptLearner: discovering visual concepts from weakly labeled image collections, in: Proceedings of the CVPR, 2015.
[129]
S. MASTER, Czech Technical University, 2014.
[130]
G. Csurka, C. Dance, L. Fan, et al., Visual categorization with bags of keypoints, in: Proceedings of the ECCV workshop, 2004.
[131]
B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM, 1992.
[132]
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Proceedings of the CVPR, 2005.
[133]
X. Wang, T.X. Han, S. Yan, An HOG-LBP human detector with partial occlusion handling, in: Proceedings of the ICCV, 2009.
[134]
F. Perronnin, J. Sánchez, T. Mensink, Improving the fisher kernel for large-scale image classification, in: Proceedings of the ECCV, 2010.
[135]
T. Jaakkola, D. Haussler, Exploiting generative models in discriminative classifiers, in: Proceedings of the NIPS, 1999.
[136]
J. Deng, W. Dong, R. Socher, et al., Imagenet: a large-scale hierarchical image database, in: Proceedings of the CVPR, 2009.
[137]
H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the ICCV, 2015.
[138]
B. Hariharan, P. Arbeláez, R. Girshick, et al., Hypercolumns for object segmentation and fine-grained localization, in: Proceedings of the CVPR, 2015.
[139]
M. Mostajabi, P. Yadollahpour, G. Shakhnarovich, Feedforward semantic segmentation with zoom-out features, in: Proceedings of the CVPR, 2015.
[140]
J.L. Chu, A. Krzy¿ak, Analysis of feature maps selection in supervised learning using convolutional neural networks. Advances in Artificial Intelligence, Springer International Publishing, 2014, pp. 59-70.
[141]
W. Yu, K. Yang, Y. Bai, et al., Visualizing and comparing convolutional neural networks, arXiv preprint, arXiv: 1412.6631, 2014.
[142]
J. Hoffman, S. Guadarrama, E. Tzeng, et al., LSDA: Large Scale Detection Through Adaptation, in: Proceedings of the NIPS, 2014.
[143]
J. Hoffman, S. Guadarrama, E. Tzeng, et al., From large-scale object classifiers to large-scale object detectors: an adaptation approach, 2014
[144]
L.C. Chen, G. Papandreou, I. Kokkinos, et al., Semantic image segmentation with deep convolutional nets and fully connected CRFs, in: Proceedings of the ICLR, 2015.
[145]
P. Sermanet, D. Eigen, X. Zhang, et al., Overfeat: integrated recognition, localization and detection using convolutional networks, in: Proceedings of the ICLR, 2014.
[146]
J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the CVPR, 2015.
[147]
D. Erhan, C. Szegedy, A. Toshev, et al., Scalable object detection using deep neural networks, in: Proceedings of the CVPR, 2014.
[148]
J. Dai, K. He, J. Sun, Convolutional feature masking for joint object and stuff segmentation, in: Proceedings of the CVPR, 2015.
[149]
Y. Liu, Y. Guo, S. Wu, et al., Deep index for accurate and efficient image retrieval, in: Proceedings of the ICMR, 2015.
[150]
B. Alexe, T. Deselaers, V. Ferrari, Measuring the objectness of image windows, Pattern Anal. Mach. Intell. IEEE Trans., 34 (2012) 2189-2202.
[151]
J.R.R. Uijlings, K.E.A. van de Sande, T. Gevers, Selective search for object recognition, Int. J. Comput. Vis., 104 (2013) 154-171.
[152]
I. Endres, D. Hoiem, Category independent object proposals, in: Proceedings of the ECCV, 2010.
[153]
M.M. Cheng, Z. Zhang, W.Y. Lin, et al., BING: binarized normed gradients for objectness estimation at 300fps, in: Proceedings of the CVPR, 2014.
[154]
C.L. Zitnick, P. Dollár, Edge boxes: locating object proposals from edges, in: Proceedings of the ECCV, 2014.
[155]
J. Hosang, R. Benenson, B. Schiele, How good are detection proposals, really?, in: Proceedings of the BMVC, 2014.
[156]
Y. Liu, Y. Guo, S. Wu, M. Lew, DeepIndex for accurate and efficient image retrieval, in: Proceedings of the ICMR, 2015.
[157]
L. Zheng, S. Wang, F. He, Q. Tian, Seeing the big picture: deep embedding with contextual evidences, arXiv preprint, arXiv: 1406.0132, 2014.
[158]
Z. Yan, V. Jagadeesh, D. DeCoste, et al., HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification, in: Proceedings of the ICCV, 2015.
[159]
R. Wu, S. Yan, Y. Shan, et al., Deep image: scaling up image recognition, arXiv preprint, arXiv: 1501.02876, 2015.
[160]
J. Ngiam, Z. Chen, D. Chia, et al., Tiled convolutional neural networks, in: Proceedings of the NIPS, 2010.
[161]
L. Younes, On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates, Stoch.: Int. J. Probab. Stoch. Process., 65 (1999) 177-228.
[162]
K. He, X. Zhang, S. Ren, et al., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: Proceedings of the ICCV, 2015.
[163]
S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in: Proceedings of the NIPS, 2015.
[164]
B. Hariharan, P. Arbeláez, R. Girshick, et al., Simultaneous detection and segmentation, in: Proceedings of the ECCV, 2014.
[165]
A.S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson, CNN features off-the-shelf an astounding baseline for recognition, in: Proceedings of the CVPR Workshop, 2014.
[166]
J. Wan, D. Wang, S. Hoi, et al., Deep Learning for content-based image retrieval: a comprehensive study, in: Proceedings of the Multimedia, 2014.
[167]
J. Yosinski, J. Clune, Y. Bengio, et al., How transferable are features in deep neural networks, in: Proceedings of the NIPS, 2014.
[168]
A. Eslami, N. Heess, J. Winn, The shape Boltzmann machine: a strong model of object shape, in: Proceedings of the CVPR, 2012.
[169]
A. Kae, K. Sohn, H. Lee, et al., Augmenting CRFs with Boltzmann machine shape priors for image labeling, in: Proceedings of the CVPR, 2013.
[170]
G.E. Dahl, M.A. Ranzato, A. Mohamed, et al., Phone Recognition with the mean-covariance restricted Boltzmann machine, in: Proceedings of the NIPS, 2010.
[171]
S. Sun, W. Zhou, H. Li, et al., Search by detection-object-level feature for image retrieval, in: Proceedings of the ICIMCS, 2014.
[172]
A. Babenko, A. Slesarev, A. Chigorin, et al., Neural codes for image retrieval, in: Proceedings of the ECCV, 2014.
[173]
M. Oquab, L. Bottou, I. Laptev, et al., Is object localization for free? - Weakly-supervised learning with convolutional neural networks, in: Proceedings of the CVPR, 2015.
[174]
N. Srivastava, R.R. Salakhutdinov, Multimodal learning with deep boltzmann machines, in: Proceedings of the NIPS, 2012.
[175]
M.A. Carreira-Perpinán, W. Wang, Distributed optimization of deeply nested systems, in: Proceedings of the AISTATS, 2014.
[176]
P.F. Felzenszwalb, R.B. Girshick, D. McAllester, Object detection with discriminatively trained part-based models, Pattern Anal. Mach. Intell. IEEE Trans., 32 (2010) 1627-1645.
[177]
R. Girshick, Fast R-CNN, in: Proceedings of the ICCV, 2015.
[178]
S. Ren, K. He, R. Girshick, et al., Faster R-CNN: towards real-time object detection with region proposal networks, in: Proceedings of the NIPS, 2015.
[179]
J. Redmon, S. Divvala, R. Girshick, et al., You only look once: unified, real-time object detection, arXiv preprint, arXiv: 1506.02640, 2015.
[180]
Q. Dai, D. Hoiem, Learning to localize detected objects, in: Proceedings of the CVPR, 2012.
[181]
D. Hoiem, Y. Chodpathumwan, Q. Dai, Diagnosing error in object detectors, in: Proceedings of the ECCV, 2012.
[182]
J. Dong, Q. Chen, S. Yan, et al., Towards unified object detection and semantic segmentation, in: Proceedings of the ECCV, 2014.
[183]
Y. Zhu, R. Urtasun, R. Salakhutdinov, et al., segDeepM: exploiting segmentation and context in deep neural networks for object detection, in: Proceedings of the CVPR, 2015.
[184]
S. Gidaris, N. Komodakis, Object detection via a multi-region and semantic segmentation-aware CNN model, in: Proceedings of the ICCV, 2015.
[185]
Y. Zhang, K. Sohn, R. Villegas, et al., Improving object detection with deep convolutional networks via bayesian optimization and structured prediction, in: Proceedings of the CVPR, 2015.
[186]
S. Ren, K. He, R. Girshick, et al., Object detection networks on convolutional feature maps, arXiv preprint, arXiv: 1504.06066, 2015.
[187]
X. Liang, S. Liu, Y. Wei, et al., Towards computational baby learning: a weakly-supervised approach for object detection, in: Proceedings of the ICCV, 2015.
[188]
S. Xie, Z. Tu,¿Holistically-nested edge detection, in: Proceedings of the ICCV, 2015.
[189]
O. Russakovsky, J. Deng, H. Su, Imagenet large scale visual recognition challenge, Int. J. Comput. Vis., 115 (2015) 211-252.
[190]
X. Wang, L. Zhang, L. Lin, et al., Deep joint task learning for generic object extraction, in: Proceedings of the NIPS, 2014.
[191]
D. Yoo, S. Park, J.Y. Lee, et al., Multi-scale pyramid pooling for deep convolutional representation, in: Proceedings of the CVPR Workshop, 2015.
[192]
A. Jain, J. Tompson, Y. LeCun, et al., Modeep: a deep learning framework using motion features for human pose estimation, in: Proceedings of the ACCV, 2014.
[193]
T. Pfister, K. Simonyan, J. Charles, et al., Deep convolutional neural networks for efficient pose estimation in gesture videos, in: Proceedings of the ACCV, 2015.
[194]
T. Pfister, J. Charles, A. Zisserman, Flowing convnets for human pose estimation in videos, in: Proceedings of the ICCV, 2015.
[195]
J. Yu, Y. Guo, D. Tao, Human pose recovery by supervised spectral embedding, Neurocomputing, 166 (2015) 301-308.
[196]
P.F. Felzenszwalb, D.P. Huttenlocher, Pictorial structures for object recognition, Int. J. Comput. Vis., 99 (2005) 190-214.
[197]
Y. Tian, C.L. Zitnick, S.G. Narasimhan, Exploring the spatial hierarchy of mixture models for human pose estimation, in: Proceedings of the ECCV, 2012.
[198]
F. Wang, Y. Li, Beyond physical connections: tree models in human pose estimation, in: Proceedings of the CVPR, 2013.
[199]
L. Pishchulin, M. Andriluka, P. Gehler, et al., Poselet conditioned pictorial structures, in: Proceedings of the CVPR, 2013.
[200]
M. Dantone, J. Gall, C. Leistner, et al., Human pose estimation using body parts dependent joint regressors, in: Proceedings of the CVPR, 2013.
[201]
B. Sapp, B. Taskar, Modec: multimodal decomposable models for human pose estimation, in: Proceedings of the CVPR, 2013.
[202]
S. Johnson, M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, in: Proceedings of the BMVC, 2010.
[203]
M. Eichner, M. Marin-Jimenez, A. Zisserman, 2d articulated human pose estimation and retrieval in (almost) unconstrained still images, Int. J. Comput. Vis., 99 (2012) 190-214.
[204]
A. Toshev, C. Szegedy, Deeppose: human pose estimation via deep neural networks, in: Proceedings of the CVPR, 2014.
[205]
X. Chen, A.L. Yuille, Articulated pose estimation by a graphical model with image dependent pairwise relations, in: Proceedings of the NIPS, 2014.
[206]
A. Jain, J. Tompson, M. Andriluka, et al., Learning human pose estimation features with convolutional networks, in: Proceedings of the ICLR, 2014.
[207]
J.J. Tompson, A. Jain, Y. LeCun, et al., Joint training of a convolutional network and a graphical model for human pose estimation, in: Proceedings of the NIPS, 2014.
[208]
J. Tompson, R. Goroshin, A. Jain, et al., Efficient object localization using convolutional networks, in: Proceedings of the CVPR, 2015.
[209]
W. Ouyang, X. Chu, X. Wang, Multi-source deep learning for human pose estimation, in: Proceedings of the CVPR, 2014.
[210]
X. Fan, K. Zheng, Y. Lin, et al., Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation, in: Proceedings of the CVPR, 2015.
[211]
J. Carreira, P. Agrawal, K. Fragkiadaki, et al., Human pose estimation with iterative error feedback, arXiv preprint, arXiv: 1507.06550, 2015.
[212]
C.H. Huang, E. Boyer, S. Ilic, Robust human body shape and pose tracking, in: Proceedings of the 3D Vision-3DV, 2013.
[213]
G. Lin, C. Shen, I. Reid, et al., Efficient piecewise training of deep structured models for semantic segmentation, arXiv preprint, arXiv: 1504.01013, 2015.
[214]
S. Zheng, S. Jayasumana, B. Romera-Paredes, et al., Conditional random fields as recurrent neural networks, in: Proceedings of the ICCV, 2015.
[215]
G. Papandreou, L. Chen, K. Murphy, et al., Weakly- and semi-supervised learning of a DCNN for semantic image segmentation, in: Proceedings of the ICCV, 2015.
[216]
J. Dai, K. He, J. Sun, Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation, in: Proceedings of the ICCV, 2015.

Cited By

View all
  • (2024)Spatial Computing Opportunities in Biomedical Decision Support: The Atlas-EHR VisionACM Transactions on Spatial Algorithms and Systems10.1145/367920110:3(1-36)Online publication date: 23-Jul-2024
  • (2024)Parallel Optimization for Accelerating the Generation of Correctly Rounded Elementary FunctionsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673125(21-31)Online publication date: 12-Aug-2024
  • (2024)A Survey of Trustworthy Representation Learning Across DomainsACM Transactions on Knowledge Discovery from Data10.1145/365730118:7(1-53)Online publication date: 12-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 187, Issue C
April 2016
133 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 26 April 2016

Author Tags

  1. Applications
  2. Challenges
  3. Computer vision
  4. Deep learning
  5. Developments
  6. Trends

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Spatial Computing Opportunities in Biomedical Decision Support: The Atlas-EHR VisionACM Transactions on Spatial Algorithms and Systems10.1145/367920110:3(1-36)Online publication date: 23-Jul-2024
  • (2024)Parallel Optimization for Accelerating the Generation of Correctly Rounded Elementary FunctionsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673125(21-31)Online publication date: 12-Aug-2024
  • (2024)A Survey of Trustworthy Representation Learning Across DomainsACM Transactions on Knowledge Discovery from Data10.1145/365730118:7(1-53)Online publication date: 12-Apr-2024
  • (2024)Tiny-CNN: Structuring Convolutional Neural Networks for Accurate Classification of Rice Leaf Diseases in Resource-Constrained EnvironmentsProceedings of the 2024 9th International Conference on Intelligent Information Technology10.1145/3654522.3654570(171-178)Online publication date: 23-Feb-2024
  • (2024)GAS-Norm: Score-Driven Adaptive Normalization for Non-Stationary Time Series Forecasting in Deep LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679822(2282-2291)Online publication date: 21-Oct-2024
  • (2024)First Realization of Batch Normalization in Flash-Based Binary Neural Networks Using a Single Voltage ShifterIEEE Transactions on Nanotechnology10.1109/TNANO.2024.346612823(677-683)Online publication date: 1-Jan-2024
  • (2024)Contextual Dependency Vision Transformer for spectrogram-based multivariate time series analysisNeurocomputing10.1016/j.neucom.2023.127215572:COnline publication date: 1-Mar-2024
  • (2024)Deep learning-powered biomedical photoacoustic imagingNeurocomputing10.1016/j.neucom.2023.127207573:COnline publication date: 16-May-2024
  • (2024)Toward explainable artificial intelligenceNeurocomputing10.1016/j.neucom.2023.126919563:COnline publication date: 1-Jan-2024
  • (2024)Deep learning-based flatness prediction via multivariate industrial data for steel strip during tandem cold rollingExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121777237:PCOnline publication date: 1-Mar-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media