Abstract
Scenes are closely related to the kinds of objects that may appear in them. Objects are widely used as features for scene categorization. On the other hand, landscapes with more spatial structures of scenes are representative of scene categories. In this paper, we propose a deep learning based algorithm for scene categorization. Specifically, we design two-pathway convolutional neural networks for exploiting both object attributes and spatial structures of scene images. Different from conventional deep learning methods, which usually focus on only one aspect of images, each pathway of the proposed architecture is tuned to capture a different aspect of images. As a result, complementary information of image contents can be utilized effectively. In addition, to deal with the feature redundancy problem caused by combining features from different sources, we adopt the ℓ 2,1 norm during classifier training to control selectivity of each type of features. Extensive experiments are conducted to evaluate the proposed method. Obtained results demonstrate that the proposed approach achieves superior performances over conventional methods. Moreover, the proposed method is a general framework, which can be easily extended to more pathways and applied to solve other problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amari S (1993) Backpropagation and stochastic gradient descent method. Neurocomputing 5:185–196
Bay H, Tuytelaars T, Gool L V (2006) Surf: speeded up robust features. In: Proc. Eur. conf. comput. vision, pp 404–417
Bengio Y (2009) Learning deep architectures for ai. Found Trends Mach Learn 2(1):1–127
Bishop C M (2006) Pattern recognition and machine learning. Springer, New York
Boureau Y, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 2559–2566
Byeon W, Breuel T, Raue F, Liwicki M (2015) Scene labeling with lstm recurrent neural networks. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 3547–3555
Chen Z, Chi Z, Fu H (2014) A hybrid holistic/semantic approach for scene classification. In: Proc.IEEE Int. conf. pattern recog., pp 2299–2304
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. IEEE Int. conf. comput. vis. pattern recog., vol 2, pp 886–893
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 248–255
Doersch C, Gupta A, Efros AA (2013) Mid-level visual element discovery as discriminative mode seeking. In: Proc. Adv. neural inf. process. syst.
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: Proc. Int. conf. mach. learn.
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Gemert JC, Geusebroek JM, Veenman CJ, Smeulders AW (2008) Kernel codebooks for scene categorization. In: Proc. Eur. conf. comput. vision
Goh H, Thome N, Cord M, Lim J (2014) Learning deep hierarchical visual feature coding. IEEE Trans Neural Netw Learn Syst 25(12):2212–2225
Gong Y, Wang L, Guo R, Lazebnik S (2015) Multi-scale orderless pooling of deep convolutional activation features. In: Proc. Eur. conf. comput. vision
Izadinia H, Sadeghi F, Farhadi A (2014) Incorporating scene context and object layout into appearance modeling. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 232–239
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/
Juneja M, Vedaldi A, Jawahar C, Zisserman A (2013) Blocks that shout: distinctive parts for scene classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Kavukcuoglu K, Ranzato M, Fergus R, LeCun Y (2009) Learning invariant features through topographic filter maps. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Proc. Adv. neural inf. process. syst.
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 2169–2178
Li L, Su H, Xing EP, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Proc. adv. neural inf. process. syst.
Li L-J, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: Proc. IEEE int. conf. comput. vis., pp 1–8
Lin D, Lu C, Liao R, Jia J (2014) Learning important spatial pooling regions for scene classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l2,1-norm minimization. In: Conference on uncertainty in artificial intelligence
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, New York, p 10036
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. In: Proc. IEEE int. conf. comput. vis.
Pinheiro P, Collobert R (2014) Recurrent convolutional neural networks for scene labeling. In: Proceedings of the 31st international conference on machine learning, pp 82–90
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Proc. IEEE Int. conf. comput. vis. pattern recog
Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T (2007) A thousand words in a scene. IEEE Trans Pattern Anal Mach Intell 29(9):1575–1589
Ranzato M, Huang F, Boureau Y, LeCun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 1–8
Ranzato M, Susskind J, Mnih V, Hinton G (2011) On deep generative models with applications to recognition. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognitione. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 512–519
Singh AAES, Gupta A (2012) Unsupervised discovery of mid-level discriminative patches. In: Proc. eur. conf. comput. vision, pp 73–86
Sadeghi F, Tappen MF (2012) Latent pyramidal regions for recognizing scenes. In: Proc. eur. conf. comput. vision, pp 228–241
Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Shabou A, LeBorgne H (2012) Locality-constrained and spatially regularized coding for scene categorization. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 3618–3625
Szummer M, Picard RW (1998) Indoor-outdoor image classification. In: Proc. IEEE int. workshop content-based access image video database, pp 42–51
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality constrained linear coding for image classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Wang P, Wang J, Zeng G, Xu W, Zha H, Li S (2013) Supervised kernel descriptors for visual recognition. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 2858–2865
Wu J, Rehg J (2011) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501
Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 3485–3492
Zhang L, Zhen X, Shao L (2014) Learning object-to-class kernels for scene classification. IEEE Trans Image Process 23(8):3241–3253
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Proc. adv. neural inf. process. syst.
Acknowledgments
This work was supported in part by the Fundamental Research Funds for the Central Universities (2014JBM017), in part by A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), and in part by Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bai, S., Li, Z. & Hou, J. Learning two-pathway convolutional neural networks for categorizing scene images. Multimed Tools Appl 76, 16145–16162 (2017). https://doi.org/10.1007/s11042-016-3900-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3900-6