More Web Proxy on the site http://driver.im/

research-article

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Authors:

Alexander Chefranov,

Hasan DemirelAuthors Info & Claims

Applied Intelligence, Volume 53, Issue 15

Pages 18431 - 18449

https://doi.org/10.1007/s10489-023-04460-4

Published: 30 January 2023 Publication History

Abstract

Scene recognition is a challenging problem due to intra-class variations and inter-class similarities. Traditional methods and convolutional neural networks (CNN) represent the global spatial structure, which is suitable for general scene classification and object recognition, but show poor presentation for particular indoor or outdoor medium–scale scene datasets. In this manuscript, we study the local and global structures of image scene, and then combine both types of information for indoor and outdoor scenes to improve the scene recognition accuracy. Local region structure indicates sub-part of the scene, such as sky or ground, etc., and global structure indicates whole scene structure, such as sky-background-ground outdoor scene type. For this purpose, the multi-layer convolutional features of inception and residual-based architecture are used at intermediate and higher layers to preserve both local and global structures of image scene. Each layer used for feature extraction, is connected with the global average pooling to obtain a discriminative representation of the image scenes. In this way, local structure is explored at the intermediate convolutional layers, and global spatial structure is obtained from the higher layers. The proposed method is evaluated on 8-scene, 15-scene, UMC-21, MIT67, and 12-scene challenging datasets achieving 98.51%, 96.49%, 99.05%, 80.31%, and 84.88%, respectively, significantly outperforming state-of-the-art approaches.

References

[1]

Anderson CH, Van Essen DC, and Olshausen BA Itti L, Rees G, and Tsotsos JK CHAPTER 3 - directed visual attention and the dynamic control of information flow Neurobiology of attention 2005 Burlington Academic Press 11-17

[2]

Richards W, Jepson A, and Feldman J David CK and Whitman R Priors, preferences and categorical percepts Perception as Bayesian inference 1996 Cambridge University Press 93-122

[3]

Ansari GJ et al. A non-blind Deconvolution semi pipelined approach to understand text in blurry natural images for edge intelligence Inf Process Manag 2021 58 6

[4]

Masood H et al. Recognition and tracking of objects in a clustered remote scene environment Comput Mater Contin 2022 70 1 1699-1719

[5]

Nedovic V et al. Stages as models of scene geometry IEEE Trans Pattern Anal Mach Intell 2010 32 9 1673-1687

[6]

Oliva A and Torralba A Modeling the shape of the scene: a holistic representation of the spatial envelope Int J Comput Vis 2001 42 3 145-175

[7]

Khan A, Chefranov A, and Demirel H Texture gradient and deep features fusion-based image scene geometry identification system using extreme learning machine 2020 3rd international conference of intelligent robotic and control engineering (IRCE) 2020 University of Oxford, UK

[8]

Lazebnik S, Schmid C, and Ponce J Beyond bags of features: spatial pyramid matching for recognizing natural scene categories 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06) 2006

[9]

Yang Y and Newsam S Bag-of-visual-words and spatial extensions for land-use classification Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems 2010 San Jose ACM 270-279

[10]

Lou Z, Gevers T, and Hu N Extracting 3D layout from a single image using global image structures IEEE Trans Image Process 2015 24 10 3098-3108

[11]

Khan A, Chefranov A, and Demirel H Image-level structure recognition using image features, templates, and ensemble of classifiers Symmetry 2020 12 7 1072

[12]

Sanchez J et al. Image classification with the fisher vector: theory and practice Int J Comput Vis 2013 105 3 222-245

[13]

Cheng X et al. Scene recognition with objectness Pattern Recogn 2018 74 474-487

[14]

Zou J et al. Scene classification using local and global features with collaborative representation fusion Inf Sci 2016 348 209-226

[15]

Tang P, Wang H, and Kwong S G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition Neurocomputing 2017 225 188-197

[16]

Liu S, Tian G, and Xu Y A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter Neurocomputing 2019 338 191-206

[17]

Khan A, Chefranov A, and Demirel H Image scene geometry recognition using low-level features fusion at multi-layer deep CNN Neurocomputing 2021 440 111-126

[18]

Zafar B et al. Image classification by addition of spatial information based on histograms of orthogonal vectors PLoS One 2018 13 6

[19]

Ali N et al. A hybrid geometric spatial image representation for scene classification PLoS One 2018 13 9

[20]

Giveki D Scale-space multi-view bag of words for scene categorization Multimed Tools Appl 2021 80 1 1223-1245

[21]

Meng X, Wang Z, and Wu L Building global image features for scene recognition Pattern Recogn 2012 45 1 373-380

[22]

Yuan L et al. Improve scene classification by using feature and kernel combination Neurocomputing 2015 170 213-220

[23]

Ghalyan IFJ Estimation of ergodicity limits of bag-of-words modeling for guaranteed stochastic convergence Pattern Recogn 2020 99

[24]

Zhou L, Zhou Z, and Hu D Scene classification using a multi-resolution bag-of-features model Pattern Recogn 2013 46 1 424-433

[25]

Lin G et al. Visual feature coding based on heterogeneous structure fusion for image classification Inf Fusion 2017 36 C 275-283

[26]

Lowe DG Object recognition from local scale-invariant features Proceedings of the seventh IEEE international conference on computer vision 1999

[27]

Hussain N et al. Intelligent deep learning and improved whale optimization algorithm based framework for object recognition 2021

[28]

Özyurt F, Sert E, and Avcı D An expert system for brain tumor detection: fuzzy C-means with super resolution and convolutional neural network with extreme learning machine Med Hypotheses 2020 134

[29]

Khan MA et al. A resource conscious human action recognition framework using 26-layered deep convolutional neural network Multimed Tools Appl 2021 80 28 35827-35849

[30]

Kwon Y-H, Shin S-B, and Kim S-D Electroencephalography based fusion two-dimensional (2D)-convolution neural networks (CNN) model for emotion recognition system Sensors (Basel, Switzerland) 2018 18 5 1383

[31]

Khan S et al (2021) Human action recognition: a paradigm of best deep learning features selection and serial based extended fusion. Sensors (Basel) 21(23)

[32]

Deng J et al. ImageNet: a large-scale hierarchical image database 2009 IEEE conference on computer vision and pattern recognition 2009

[33]

Szegedy C et al (2015) Going deeper with convolutions, pp 1–9

[34]

Liu S and Deng W Very deep convolutional neural network based image classification using small training sample size 2015 3rd IAPR Asian conference on pattern recognition (ACPR) 2015

[35]

He K et al. Deep residual learning for image recognition 2016 IEEE conference on computer vision and pattern recognition (CVPR) 2016

[36]

Zhou B et al. Places: a 10 million image database for scene recognition IEEE Trans Pattern Anal Mach Intell 2018 40 6 1452-1464

[37]

Azhar I, Sharif M, Raza M, Khan MA, and Yong H-S Decision support system for face sketch synthesis using deep learning and artificial intelligence Sensors 2021 21 8178

[38]

Saleem F et al. Human gait recognition: a single stream optimal deep learning features fusion Sensors (Basel) 2021 21 22 7584

[39]

Wang C, Peng G, and De Baets B Deep feature fusion through adaptive discriminative metric learning for scene recognition Inf Fusion 2020 63 1-12

[40]

Liu B et al. Learning a representative and discriminative part model with deep convolutional features for scene recognition Computer vision -- ACCV 2014 2015 Cham Springer International Publishing

[41]

Wang C, Peng G, and Lin W Robust local metric learning via least square regression regularization for scene recognition Neurocomputing 2021 423 179-189

[42]

Yu W et al. Exploiting the complementary strengths of multi-layer CNN features for image retrieval Neurocomputing 2017 237 235-241

[43]

Herranz L, Jiang S, and Li X Scene recognition with CNNs: objects, scales and dataset Bias 2016 IEEE conference on computer vision and pattern recognition (CVPR) 2016

[44]

Szegedy C et al. Inception-v4, inception-ResNet and the impact of residual connections on learning Proceedings of the thirty-first AAAI conference on artificial intelligence 2017 San Francisco AAAI Press 4278-4284

[45]

Alex K, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Information Process Syst:1097–1105

[46]

LeCun Y, Bengio Y, and Hinton G Deep learning Nature 2015 521 7553 436-444

[47]

He M et al. Performance evaluation of score level fusion in multimodal biometric systems Pattern Recogn 2010 43 5 1789-1800

[48]

Kittler J et al. On combining classifiers IEEE Trans Pattern Anal Mach Intell 1998 20 3 226-239

[49]

Kotsiantis SB, Zaharakis ID, and Pintelas PE Machine learning: a review of classification and combining techniques Artif Intell Rev 2006 26 3 159-190

[50]

Quattoni A and Torralba A Recognizing indoor scenes 2009 IEEE conference on computer vision and pattern recognition 2009

[51]

Hoiem D, Efros AA, and Hebert M Recovering surface layout from an image Int J Comput Vis 2007 75 1 151-172

[52]

Khan SH et al. A discriminative representation of convolutional features for indoor scene recognition IEEE Trans Image Process 2016 25 7 3372-3383

[53]

Hayat M et al. A spatial layout and scale invariant feature representation for indoor scene classification IEEE Trans Image Process 2016 25 10 4829-4841

[54]

Geusebroek J-M and Smeulders AWM A six-stimulus theory for stochastic texture Int J Comput Vis 2005 62 1 7-16

[55]

Geusebroek J-M, Smeulders AWM, and van de Weijer J Fast anisotropic gauss filtering Computer vision — ECCV 2002 2002 Berlin Springer Berlin Heidelberg

[56]

Dalal N and Triggs B Histograms of oriented gradients for human detection 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) 2005

[57]

Xiao J et al. SUN database: Large-scale scene recognition from abbey to zoo 2010 IEEE computer society conference on computer vision and pattern recognition 2010 San Francisco IEEE

[58]

Zafar B et al. Intelligent image classification-based on spatial weighted histograms of concentric circles Comput Sci Inf Syst 2018 15 615-633

[59]

LeCun Y et al. Backpropagation applied to handwritten zip code recognition Neural Comput 1989 1 4 541-551

[60]

Lecun Y et al. Gradient-based learning applied to document recognition Proc IEEE 1998 86 11 2278-2324

[61]

Simonyan K,Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 http://arxiv.org/abs/1409.1556

[62]

Szegedy C et al. Going deeper with convolutions 2015 IEEE conference on computer vision and pattern recognition (CVPR) 2015

[63]

Patalas M and Halikowski A model for generating workplace procedures using a CNN-SVM architecture Symmetry 2019 11 9 1151

[64]

LeCun Y, Cortes C, and Burges CJ [online] MNIST hand-written digit database 2010 AT&T Labs

[65]

Guang-Bin H, Qin-Yu Z, and Chee-Kheong S Extreme learning machine: a new learning scheme of feedforward neural networks 2004 IEEE international joint conference on neural networks (IEEE cat. No.04CH37541) 2004

[66]

Yu Y and Liu F A two-stream deep fusion framework for high-resolution aerial scene classification Comput Intell Neurosci 2018 2018 8639367

[67]

Khan A et al. White blood cell type identification using multi-layer convolutional features with an extreme-learning machine Biomed Signal Process Control 2021 69

[68]

Liang G et al. Combining convolutional neural network with recursive neural network for blood cell image classification IEEE Access 2018 6 36188-36197

[69]

Ioffe S and Szegedy C Batch normalization: accelerating deep network training by reducing internal covariate shift Proceedings of the 32nd international conference on international conference on machine learning - volume 37 2015 Lille JMLR.org 448-456

[70]

Cortes C and Vapnik V Support-vector networks Mach Learn 1995 20 3 273-297

[71]

Eitrich T and Lang B Efficient optimization of support vector machine learning parameters for unbalanced datasets J Comput Appl Math 2006 196 2 425-436

[72]

Mohareb F et al. Ensemble-based support vector machine classifiers as an efficient tool for quality assessment of beef fillets from electronic nose data Anal Methods 2016 8 18 3711-3721

[73]

Tulyakov S et al. Marinai S, Fujisawa H, et al. Review of classifier combination methods Machine learning in document analysis and recognition 2008 Berlin Springer Berlin Heidelberg 361-386

[74]

Liu C-L Classifier combination based on confidence transformation Pattern Recogn 2005 38 1 11-28

[75]

Tax DMJ et al. Combining multiple classifiers by averaging or by multiplying? Pattern Recogn 2000 33 9 1475-1485

[76]

Rosset S Model selection via the AUC Proceedings of the twenty-first international conference on machine learning 2004 Banff ACM 89

[77]

Sun H et al. Scene classification with the discriminative representation 2017 2nd international conference on multimedia and image processing (ICMIP) 2017

[78]

Liu B, Liu J, and Lu H Learning representative and discriminative image representation by deep appearance and spatial coding Comput Vis Image Underst 2015 136 23-31

[79]

Hu F et al. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery Remote Sens 2015 7 11 14680-14707

[80]

Ma C, Mu X, and Sha D Multi-layers feature fusion of convolutional neural network for scene classification of remote sensing IEEE Access 2019 7 121685-121694

[81]

Wu H et al. Self-attention network with joint loss for remote sensing image scene classification IEEE Access 2020 8 210347-210359

[82]

Wang X et al. Remote sensing scene classification using heterogeneous feature extraction and multi-level fusion IEEE Access 2020 8 217628-217641

[83]

Wu J, Lin Z, and Zha H Essential tensor learning for multi-view spectral clustering IEEE Trans Image Process 2019 28 12 5910-5922

Cited By

Sun YLi PSun HXu HWang R(2025)Feature selection through adaptive sparse learning for scene recognitionApplied Soft Computing10.1016/j.asoc.2024.112439169:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.asoc.2024.112439

Recommendations

Multi-stream Convolutional Networks for Indoor Scene Recognition
Computer Analysis of Images and Patterns
Abstract
Convolutional neural networks (CNNs) have recently achieved outstanding results for various vision tasks, including indoor scene understanding. The de facto practice employed by state-of-the-art indoor scene recognition approaches is to use RGB ...
Real-time estimation of 3D scene geometry from a single image

Significant advances have recently been made in the development of computational methods for predicting 3D scene structure from a single monocular image. However, their computational complexity severely limits the adoption of such technologies to ...
A novel feature extraction method for scene recognition based on Centered Convolutional Restricted Boltzmann Machines

Scene recognition is an important research topic in computer vision, while feature extraction is a key step of scene recognition. Although classical Restricted Boltzmann Machines (RBM) can efficiently represent complicated data, it is hard to handle ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Applied Intelligence

Applied Intelligence Volume 53, Issue 15

Aug 2023

850 pages

ISSN:0924-669X

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 30 January 2023

Accepted: 08 January 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun YLi PSun HXu HWang R(2025)Feature selection through adaptive sparse learning for scene recognitionApplied Soft Computing10.1016/j.asoc.2024.112439169:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.asoc.2024.112439

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents