More Web Proxy on the site http://driver.im/

research-article

A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter

Authors:

Yuan XuAuthors Info & Claims

Volume 338, Issue C

Pages 191 - 206

https://doi.org/10.1016/j.neucom.2019.01.090

Published: 21 April 2019 Publication History

Abstract

Scene classification is a significant aspect of computer vision. Convolutional neural networks (CNNs), a development of deep learning, are a well-understood tool for image classification. But training CNNs requires large-scale datasets. Transfer learning addresses this problem and produces a solution for small-scale datasets. Because scene image classification is more complex than common image classification. We propose a novel ResNet based transfer learning model utilizing multi-layer feature fusion, taking full advantage of interlayer discriminating features and fusing them for classification by softmax regression. In addition, a novel data augmentation method with a filter useful for small-scale datasets is presented. New image patches are generated by sliding block cropping of a raw image, which are then filtered to insure that the new images sufficiently represent the original categorization. Our new ResNet based transfer learning model with enhanced data augmentation is evaluated on six benchmark scene datasets (LF, OT, FP, LS, MIT67, SUN397). Extensive experimental results show that on the six datasets our method obtains better accuracy than other state-of-the-art models.

References

[1]

Bai S., Tang H., Categorizing scenes by exploring scene part information without constructing explicit models, Neurocomputing 281 (2018) 160–168.

[2]

M. Brown, S. Susstrunk, Multi-spectral sift for scene category recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 177–184.

[3]

S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 2169–2178.

[4]

Deng J., Dong W., R. Socher, Li L.J., Li K., Li F.F., Imagenet: a large-scale hierarchical image database, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.

[5]

A. Krizhevsky, I. Sutskever, G. E.Hinton, Imagenet classification with deep convolutional neural networks, Proceedings of the International Conference on Neural Information Processing Systems (ICONIP), 2012, pp. 1097–1105.

[6]

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556 (2014).

[7]

He K., Zhang X., Ren S., Sun J., Deep residual learning for image recognition, Proceedings of the Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.

[8]

C. Szegedy, Liu W., Jia Y., et al., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.

[9]

C. Szegedy, V. Vanhoucke, S. Ioffe, et al., Rethinking the inception architecture for computer vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.

[10]

N. Srivastava, G. Hinton, A. Krizhevsky, et al., Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1) (2014) 1929–1958.

Digital Library

[11]

K. Weiss, T.M. Khoshgoftaar, Wang D.D., A survey of transfer learning, J. Big Data 3 (1) (2016) 9.

[12]

L. Herranz, Jiang S., Li X., Scene recognition with CNNs: Objects, scales and dataset bias, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 571–579.

[13]

Yu W., Yang K., Yao H., Sun X., Xu P., Exploiting the complementary strengths of multi-layer CNN features for image retrieval, Neurocomputing 237 (2016) 235–241.

Digital Library

[14]

Xue D.X., Zhang R., Feng H., Wang Y.L., CNN-SVM for microvascular morphological type recognition with data augmentation, J. Med. Biol. Eng. 36 (6) (2016) 755–764.

[15]

D.M. Montserrat, Lin Q., J. Allebach, E.J. Delp, Training object detection and recognition CNN models using data augmentation, Electron. Imaging 2017 (10) (2017) 27–36.

[16]

Li L.J., Li F.F., What, where and who? classifying events by scene and object recognition, Proceedings of the IEEE International Conference on Computer Vision, 2007, pp. 1–8.

[17]

O. Aude, T. Antonio, Modeling the shape of the scene: a holistic representation of the spatial envelope, Int. J. Comput. Vis. 42 (3) (2001) 145–175.

[18]

Li F.F., P. Perona, A Bayesian hierarchical model for learning natural scene categories, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 524–531.

[19]

S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features:spatial pyramid matching for recognizing natural scene categories, Proceedings of the IEEE Computer Vision and Pattern Recognition, 2006, pp. 2169–2178.

[20]

A. Quattoni, A. Torralba, Recognizing indoor scenes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001, pp. 413–420.

[21]

Xiao J., J. Hays, K.A. Ehinger, A. Oliva, A. Torralba, Sun database: large-scale scene recognition from abbey to zoo, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3485–3492.

[22]

Pan S.J., Yang Q., A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22 (10) (2010) 1345–1359.

Digital Library

[23]

R. Chattopadhyay, Ye J., S. Panchanathan, Fan W., I. Davidson, Multi-source domain adaptation and its application to early detection of fatigue, Proceedings of the ACM Sigkdd International Conference on Knowledge Discovery & Data Mining, 2011, pp. 717–725.

[24]

Long M., Wang J., Ding G., Sun J., Yu P.S., Transfer feature learning with joint distribution adaptation, Proceedings of the IEEE International Conference on Computer Vision, 2014, pp. 2200–2207.

[25]

Pan S.J., Tsang I.W., Kwok J.T., Yang Q., Domain adaptation via transfer component analysis, IEEE Trans. Neural Netw. 22 (2) (2011) 199.

Digital Library

[26]

Li F., Pan S.J., Jin O., Yang Q., Zhu X., Cross-domain co-extraction of sentiment and topic lexicons, Proceedings of the Meeting of the Association for Computational Linguistics: Long Papers, 2012, pp. 410–419.

[27]

T. Tommasi, F. Orabona, B. Caputo, Learning categories from few examples with multi model knowledge transfer, IEEE Trans. Pattern Anal. Mach. Intell. 36 (5) (2014) 928–941.

[28]

S. Hoochang, Roth H.R., Gao M., et al., Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning, IEEE Trans. Med. Imaging 35 (5) (2016) 1285.

[29]

Lei H., Han T., Zhou F., et al., A deeply supervised residual network for HEp-2 cell classification via cross-modal transfer learning, Pattern Recognit. 79 (2018) 290–302.

[30]

Han J., Chen H., Liu N., Yan C., Li X., CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion, IEEE Trans. Cybern. PP (99) (2017).

[31]

Y. Lecun, B. Boser, J.S. Denker, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput. 1 (4) (2014) 541–551.

Digital Library

[32]

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324.

[33]

M. Everingham, L.V. Gool, C.K.I. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge, Int. J. Comput. Vis. 88 (2) (2010) 303–338.

Digital Library

[34]

Zhou L., Zhou Z., Hu D., Scene classification using multi-resolution low-level feature combination, Neurocomputing 122 (2013) 284–297.

[35]

Zang M., Wen D., Wang K., Liu T., Song W., A novel topic feature for image scene classification, Neurocomputing 148 (2015) 467–476.

[36]

Yuan L., Chen F., Zhou L., Hu D., Improve scene classification by using feature and kernel combination, Neurocomputing 170 (2015) 213–220.

[37]

Gao J., Yang J., Wang G., Li M., A novel feature extraction method for scene recognition based on centered convolutional restricted Boltzmann machines, Neurocomputing 214 (2015) 708–717.

[38]

Qi X., Li C.G., Zhao G., Hong X., Dynamic texture and scene classification by transferring deep image features, Neurocomputing 171 (2015) 1230–1241.

[39]

Tang P., Wang H., Kwong S., G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition, Neurocomputing 225 (2017) 188–197.

[40]

Hu B., Lai J.H., Guo C.C., Location-aware fine-grained vehicle type recognition using multi-task deep networks, Neurocomputing 243 (2017) 60–68.

[41]

Fergus R., Li F.F., P. Perona, A. Zisserman, Learning object categories from internet image searches, Proc. IEEE 98 (8) (2010) 1453–1466.

[42]

I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., Generative adversarial nets, Proceedings of the International Conference on Neural Information Processing Systems, 2014, pp. 2672–2680.

[43]

T. Team, R. AI-Rfou, G. Alain, et al., Theano: a python framework for fast computation of mathematical expressions, arXiv:1605.02688 (2016).

[44]

Jia Y., E. Shelhamer, J. Donahue, et al., Caffe: Convolutional architecture for fast feature embedding, Proceedings of the ACM International Conference on Multimedia, 2014, pp. 675–678.

Digital Library

[45]

T. Chen, M. Li, Y. Li, et al., Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems, arXiv:1512.01274 (2015).

[46]

Zhou B., A. Lapedriza, A. Khosla, et al., Places: a 10 million image database for scene recognition, IEEE Trans. Pattern Anal. Mach. Intell. 99 (2017) 1–10.

[47]

M. Lin, Q. Chen, S. Yan, Network in network, arXiv:1312.4400 (2013).

[48]

S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, arXiv:1502.03167 (2015).

[49]

T. Dozat, Incorporating Nesterov momentum into Adam, 2015, ([online] Available: http://cs229.stanford.edu/proj2015/054_report.pdf).

[50]

Dalal, Navneet, B. Triggs, Histograms of oriented gradients for human detection, Proceedings of the Computer Vision and Pattern Recognition, 2005, pp. 886–893.

[51]

D. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91–110.

Digital Library

[52]

C. Siagian, L. Itti, Rapid biologically-inspired scene classification using features shared with visual attention., IEEE Trans. Pattern Anal. Mach. Intell. 29 (2) (2007) 300–312.

[53]

Wu J., Rehg J.M., Where am I: place instance and category recognition using spatial pact, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

[54]

Meng X., Wang Z., Wu L., Building global image features for scene recognition, Pattern Recognit. 45 (1) (2012) 373–380.

[55]

Gao C., Sang N., Huang R., Spatial multi-scale gradient orientation consistency for place instance and scene category recognition, Inf. Sci. 372 (2016) 84–97.

[56]

Wu J., Rehg J.M., Centrist: a visual descriptor for scene categorization, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1489–1501.

Digital Library

[57]

A. Bosch, A. Zisserman, X. Muoz, Scene classification using a hybrid generative/discriminative approach, IEEE Trans. Pattern Anal. Mach. Intell. 30 (4) (2008) 712–727.

[58]

S. Battiato, G.M. Farinella, G. Gallo, D. Rav, Spatial hierarchy of textons distributions for scene classification, Adv. Multimed. Model. 5371 (2009) 333–343.

[59]

Zhou X., Zhuang X., Tang H., M. Hasegawa-Johnson, Huang T.S., Novel Gaussianized vector representation for improved natural scene categorization, Pattern Recognit. Lett. 31 (8) (2010) 702–708.

[60]

N. Rasiwasia, N. Vasconcelos, Holistic context models for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell. 34 (5) (2012) 902–917.

[61]

R. Kwitt, N. Vasconcelos, N. Rasiwasia, Scene recognition on the semantic manifold, Proceedings of the European Conference on Computer Vision, 2012, pp. 359–372.

[62]

Song H.O., R. Girshick, S. Zickler, C. Geyer, Generalized sparselet models for real-time multiclass object recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (5) (2015) 1001–1012.

[63]

Li L.J., Su H., Lim Y., Li F.F., Object bank: an object-level image representation for high-level visual recognition, Int. J. Comput. Vis. 107 (1) (2014) 20–39.

[64]

Sun X., Liu Z., Hu Y., Zhang L., R. Zimmermann, Perceptual multi-channel visual feature fusion for scene categorization, Inf. Sci. 429 (2018) 37–48.

[65]

Zhang L., Zhen X., Shao L., Learning object-to-class kernels for scene classification, IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 23 (8) (2014) 3241–3253.

[66]

Zhou B., A. Lapedriza, Xiao J., A. Torralba, A. Oliva, Learning deep features for scene recognition using places database, Proceedings of the International Conference on Neural Information Processing Systems, 2014, pp. 487–495.

[67]

Zuo Z., Wang G., Shuai B., Zhao L., Yang Q., Jiang X., Learning discriminative and shareable features for scene classification, Proceedings of the European Conference on Computer Vision, 2014, pp. 552–568.

[68]

M. Cimpoi, S. Maji, A. Vedaldi, Deep filter banks for texture recognition and segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3828–3836.

[69]

Xie G.S., Zhang X.Y., Yan S., Liu C.L., Hybrid CNN and dictionary-based models for scene recognition and domain adaptation, IEEE Trans. Circuits Syst. Video Technol. 27 (6) (2017) 1263–1274.

[70]

Wang Z., Wang L., Wang Y., Zhang B., Qiao Y., Weakly supervised PatchNets: describing and aggregating local patches for scene recognition, IEEE Trans. Image Process. 26 (4) (2017) 2028–2041.

[71]

Cheng X., Lu J., Feng J., Yuan B., Zhou J., Scene recognition with objectness, Pattern Recognit. 74 (2017) 474–487.

[72]

J. Donahue, Jia Y., O. Vinyals, J. Hoffman, Zhang N., Tzeng E., T. Darrell, Decaf: a deep convolutional activation feature for generic visual recognition, Proceedings of the International Conference on Machine Learning, 2014, pp. 647–655.

[73]

Gong Y., Wang L., Guo R., S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, Proceedings of the European Conference on Computer Vision, 2014, pp. 392–407.

[74]

Wu R., Wang B., Wang W., Yu Y., Harvesting discriminative meta objects with deep CNN features for scene classification, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1287–1295.

Cited By

Giveki DEsfandyari S(2025)Semantic image representation for image recognition and retrieval using multilayer variational auto-encoder, InceptionNet and low-level image featuresThe Journal of Supercomputing10.1007/s11227-024-06792-581:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11227-024-06792-5
Huang HZinnen MLiu SMaier AChristlein VGouet-Brunet VKosti RWeng L(2024)Scene Classification on Fine Arts with Style TransferProceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge Contents10.1145/3689094.3689468(18-27)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689094.3689468
Cheng KSun DJian JQin DChen CLiao GKan YLv C(2024)Deep Learning Approach for Driver Speed Intention Recognition Based on Naturalistic Driving DataIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339808325:10(14546-14559)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TITS.2024.3398083
Show More Cited By

Index Terms

A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Index terms have been assigned to the content through auto-classification.

Recommendations

RETRACTED ARTICLE: Improved transfer learning of CNN through fine-tuning and classifier ensemble for scene classification
Abstract
In high-resolution remote sensing imageries, the scene classification is one of the challenging problems due to the similarity of image structure and available datasets are all small. Performing training with small datasets on new convolutional ...
Ore Image Classification Based on Improved CNN
Abstract
The identification of ore deposits is an important technical task in mining and excavation. However, conventional techniques are time-consuming and tedious. Therefore, data augmentation and transfer learning were used in this topic to ...
Novel Image Data Augmentation Technique for Deep Learning Using Least Significant Bit Encryption
ICMLT '24: Proceedings of the 2024 9th International Conference on Machine Learning Technologies

Deep convolutional neural networks rely heavily on large datasets for training and testing in order to avoid overfitting. However, in fields like medical imaging analysis, big data may not be available. To address this issue, various techniques can be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 338, Issue C

Apr 2019

442 pages

ISSN:0925-2312

Issue’s Table of Contents

Copyright © 2019.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 21 April 2019

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Giveki DEsfandyari S(2025)Semantic image representation for image recognition and retrieval using multilayer variational auto-encoder, InceptionNet and low-level image featuresThe Journal of Supercomputing10.1007/s11227-024-06792-581:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11227-024-06792-5
Huang HZinnen MLiu SMaier AChristlein VGouet-Brunet VKosti RWeng L(2024)Scene Classification on Fine Arts with Style TransferProceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge Contents10.1145/3689094.3689468(18-27)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689094.3689468
Cheng KSun DJian JQin DChen CLiao GKan YLv C(2024)Deep Learning Approach for Driver Speed Intention Recognition Based on Naturalistic Driving DataIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339808325:10(14546-14559)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TITS.2024.3398083
Kim BChoi HJang HKim S(2024)On the ideal number of groups for isometric gradient propagationNeurocomputing10.1016/j.neucom.2023.127217573:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.neucom.2023.127217
Song CWu HMa X(2024)Inter-object discriminative graph modeling for indoor scene recognitionKnowledge-Based Systems10.1016/j.knosys.2024.112371302:COnline publication date: 25-Oct-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112371
Sitaula CShahi TMarzbanrad FAryal J(2024)Recent advances in scene image representation and classificationMultimedia Tools and Applications10.1007/s11042-023-15005-983:3(9251-9278)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-15005-9
Wang QZhu FLin ZWang JLi XZhao P(2024)A single-stream adaptive scene layout modeling method for scene recognitionNeural Computing and Applications10.1007/s00521-024-09772-136:22(13703-13714)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s00521-024-09772-1
Parseh MRahmanimanesh MKeshavarzi PAzimifar Z(2024)Scene representation using a new two-branch neural network modelThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03162-940:9(6219-6244)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00371-023-03162-9
Xu FWang PXu H(2023)Deep pyramidal residual networks with inception sub-structure in image classificationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23056945:4(5885-5906)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/JIFS-230569
Khan AChefranov ADemirel H(2023)Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2Applied Intelligence10.1007/s10489-023-04460-453:15(18431-18449)Online publication date: 30-Jan-2023
https://dl.acm.org/doi/10.1007/s10489-023-04460-4
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents