[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter

Published: 21 April 2019 Publication History

Abstract

Scene classification is a significant aspect of computer vision. Convolutional neural networks (CNNs), a development of deep learning, are a well-understood tool for image classification. But training CNNs requires large-scale datasets. Transfer learning addresses this problem and produces a solution for small-scale datasets. Because scene image classification is more complex than common image classification. We propose a novel ResNet based transfer learning model utilizing multi-layer feature fusion, taking full advantage of interlayer discriminating features and fusing them for classification by softmax regression. In addition, a novel data augmentation method with a filter useful for small-scale datasets is presented. New image patches are generated by sliding block cropping of a raw image, which are then filtered to insure that the new images sufficiently represent the original categorization. Our new ResNet based transfer learning model with enhanced data augmentation is evaluated on six benchmark scene datasets (LF, OT, FP, LS, MIT67, SUN397). Extensive experimental results show that on the six datasets our method obtains better accuracy than other state-of-the-art models.

References

[1]
Bai S., Tang H., Categorizing scenes by exploring scene part information without constructing explicit models, Neurocomputing 281 (2018) 160–168.
[2]
M. Brown, S. Susstrunk, Multi-spectral sift for scene category recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 177–184.
[3]
S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 2169–2178.
[4]
Deng J., Dong W., R. Socher, Li L.J., Li K., Li F.F., Imagenet: a large-scale hierarchical image database, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[5]
A. Krizhevsky, I. Sutskever, G. E.Hinton, Imagenet classification with deep convolutional neural networks, Proceedings of the International Conference on Neural Information Processing Systems (ICONIP), 2012, pp. 1097–1105.
[6]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556 (2014).
[7]
He K., Zhang X., Ren S., Sun J., Deep residual learning for image recognition, Proceedings of the Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
[8]
C. Szegedy, Liu W., Jia Y., et al., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
[9]
C. Szegedy, V. Vanhoucke, S. Ioffe, et al., Rethinking the inception architecture for computer vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
[10]
N. Srivastava, G. Hinton, A. Krizhevsky, et al., Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1) (2014) 1929–1958.
[11]
K. Weiss, T.M. Khoshgoftaar, Wang D.D., A survey of transfer learning, J. Big Data 3 (1) (2016) 9.
[12]
L. Herranz, Jiang S., Li X., Scene recognition with CNNs: Objects, scales and dataset bias, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 571–579.
[13]
Yu W., Yang K., Yao H., Sun X., Xu P., Exploiting the complementary strengths of multi-layer CNN features for image retrieval, Neurocomputing 237 (2016) 235–241.
[14]
Xue D.X., Zhang R., Feng H., Wang Y.L., CNN-SVM for microvascular morphological type recognition with data augmentation, J. Med. Biol. Eng. 36 (6) (2016) 755–764.
[15]
D.M. Montserrat, Lin Q., J. Allebach, E.J. Delp, Training object detection and recognition CNN models using data augmentation, Electron. Imaging 2017 (10) (2017) 27–36.
[16]
Li L.J., Li F.F., What, where and who? classifying events by scene and object recognition, Proceedings of the IEEE International Conference on Computer Vision, 2007, pp. 1–8.
[17]
O. Aude, T. Antonio, Modeling the shape of the scene: a holistic representation of the spatial envelope, Int. J. Comput. Vis. 42 (3) (2001) 145–175.
[18]
Li F.F., P. Perona, A Bayesian hierarchical model for learning natural scene categories, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 524–531.
[19]
S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features:spatial pyramid matching for recognizing natural scene categories, Proceedings of the IEEE Computer Vision and Pattern Recognition, 2006, pp. 2169–2178.
[20]
A. Quattoni, A. Torralba, Recognizing indoor scenes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001, pp. 413–420.
[21]
Xiao J., J. Hays, K.A. Ehinger, A. Oliva, A. Torralba, Sun database: large-scale scene recognition from abbey to zoo, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3485–3492.
[22]
Pan S.J., Yang Q., A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22 (10) (2010) 1345–1359.
[23]
R. Chattopadhyay, Ye J., S. Panchanathan, Fan W., I. Davidson, Multi-source domain adaptation and its application to early detection of fatigue, Proceedings of the ACM Sigkdd International Conference on Knowledge Discovery & Data Mining, 2011, pp. 717–725.
[24]
Long M., Wang J., Ding G., Sun J., Yu P.S., Transfer feature learning with joint distribution adaptation, Proceedings of the IEEE International Conference on Computer Vision, 2014, pp. 2200–2207.
[25]
Pan S.J., Tsang I.W., Kwok J.T., Yang Q., Domain adaptation via transfer component analysis, IEEE Trans. Neural Netw. 22 (2) (2011) 199.
[26]
Li F., Pan S.J., Jin O., Yang Q., Zhu X., Cross-domain co-extraction of sentiment and topic lexicons, Proceedings of the Meeting of the Association for Computational Linguistics: Long Papers, 2012, pp. 410–419.
[27]
T. Tommasi, F. Orabona, B. Caputo, Learning categories from few examples with multi model knowledge transfer, IEEE Trans. Pattern Anal. Mach. Intell. 36 (5) (2014) 928–941.
[28]
S. Hoochang, Roth H.R., Gao M., et al., Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning, IEEE Trans. Med. Imaging 35 (5) (2016) 1285.
[29]
Lei H., Han T., Zhou F., et al., A deeply supervised residual network for HEp-2 cell classification via cross-modal transfer learning, Pattern Recognit. 79 (2018) 290–302.
[30]
Han J., Chen H., Liu N., Yan C., Li X., CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion, IEEE Trans. Cybern. PP (99) (2017).
[31]
Y. Lecun, B. Boser, J.S. Denker, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput. 1 (4) (2014) 541–551.
[32]
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324.
[33]
M. Everingham, L.V. Gool, C.K.I. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge, Int. J. Comput. Vis. 88 (2) (2010) 303–338.
[34]
Zhou L., Zhou Z., Hu D., Scene classification using multi-resolution low-level feature combination, Neurocomputing 122 (2013) 284–297.
[35]
Zang M., Wen D., Wang K., Liu T., Song W., A novel topic feature for image scene classification, Neurocomputing 148 (2015) 467–476.
[36]
Yuan L., Chen F., Zhou L., Hu D., Improve scene classification by using feature and kernel combination, Neurocomputing 170 (2015) 213–220.
[37]
Gao J., Yang J., Wang G., Li M., A novel feature extraction method for scene recognition based on centered convolutional restricted Boltzmann machines, Neurocomputing 214 (2015) 708–717.
[38]
Qi X., Li C.G., Zhao G., Hong X., Dynamic texture and scene classification by transferring deep image features, Neurocomputing 171 (2015) 1230–1241.
[39]
Tang P., Wang H., Kwong S., G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition, Neurocomputing 225 (2017) 188–197.
[40]
Hu B., Lai J.H., Guo C.C., Location-aware fine-grained vehicle type recognition using multi-task deep networks, Neurocomputing 243 (2017) 60–68.
[41]
Fergus R., Li F.F., P. Perona, A. Zisserman, Learning object categories from internet image searches, Proc. IEEE 98 (8) (2010) 1453–1466.
[42]
I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., Generative adversarial nets, Proceedings of the International Conference on Neural Information Processing Systems, 2014, pp. 2672–2680.
[43]
T. Team, R. AI-Rfou, G. Alain, et al., Theano: a python framework for fast computation of mathematical expressions, arXiv:1605.02688 (2016).
[44]
Jia Y., E. Shelhamer, J. Donahue, et al., Caffe: Convolutional architecture for fast feature embedding, Proceedings of the ACM International Conference on Multimedia, 2014, pp. 675–678.
[45]
T. Chen, M. Li, Y. Li, et al., Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems, arXiv:1512.01274 (2015).
[46]
Zhou B., A. Lapedriza, A. Khosla, et al., Places: a 10 million image database for scene recognition, IEEE Trans. Pattern Anal. Mach. Intell. 99 (2017) 1–10.
[47]
M. Lin, Q. Chen, S. Yan, Network in network, arXiv:1312.4400 (2013).
[48]
S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, arXiv:1502.03167 (2015).
[49]
T. Dozat, Incorporating Nesterov momentum into Adam, 2015, ([online] Available: http://cs229.stanford.edu/proj2015/054_report.pdf).
[50]
Dalal, Navneet, B. Triggs, Histograms of oriented gradients for human detection, Proceedings of the Computer Vision and Pattern Recognition, 2005, pp. 886–893.
[51]
D. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91–110.
[52]
C. Siagian, L. Itti, Rapid biologically-inspired scene classification using features shared with visual attention., IEEE Trans. Pattern Anal. Mach. Intell. 29 (2) (2007) 300–312.
[53]
Wu J., Rehg J.M., Where am I: place instance and category recognition using spatial pact, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.
[54]
Meng X., Wang Z., Wu L., Building global image features for scene recognition, Pattern Recognit. 45 (1) (2012) 373–380.
[55]
Gao C., Sang N., Huang R., Spatial multi-scale gradient orientation consistency for place instance and scene category recognition, Inf. Sci. 372 (2016) 84–97.
[56]
Wu J., Rehg J.M., Centrist: a visual descriptor for scene categorization, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1489–1501.
[57]
A. Bosch, A. Zisserman, X. Muoz, Scene classification using a hybrid generative/discriminative approach, IEEE Trans. Pattern Anal. Mach. Intell. 30 (4) (2008) 712–727.
[58]
S. Battiato, G.M. Farinella, G. Gallo, D. Rav, Spatial hierarchy of textons distributions for scene classification, Adv. Multimed. Model. 5371 (2009) 333–343.
[59]
Zhou X., Zhuang X., Tang H., M. Hasegawa-Johnson, Huang T.S., Novel Gaussianized vector representation for improved natural scene categorization, Pattern Recognit. Lett. 31 (8) (2010) 702–708.
[60]
N. Rasiwasia, N. Vasconcelos, Holistic context models for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell. 34 (5) (2012) 902–917.
[61]
R. Kwitt, N. Vasconcelos, N. Rasiwasia, Scene recognition on the semantic manifold, Proceedings of the European Conference on Computer Vision, 2012, pp. 359–372.
[62]
Song H.O., R. Girshick, S. Zickler, C. Geyer, Generalized sparselet models for real-time multiclass object recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (5) (2015) 1001–1012.
[63]
Li L.J., Su H., Lim Y., Li F.F., Object bank: an object-level image representation for high-level visual recognition, Int. J. Comput. Vis. 107 (1) (2014) 20–39.
[64]
Sun X., Liu Z., Hu Y., Zhang L., R. Zimmermann, Perceptual multi-channel visual feature fusion for scene categorization, Inf. Sci. 429 (2018) 37–48.
[65]
Zhang L., Zhen X., Shao L., Learning object-to-class kernels for scene classification, IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 23 (8) (2014) 3241–3253.
[66]
Zhou B., A. Lapedriza, Xiao J., A. Torralba, A. Oliva, Learning deep features for scene recognition using places database, Proceedings of the International Conference on Neural Information Processing Systems, 2014, pp. 487–495.
[67]
Zuo Z., Wang G., Shuai B., Zhao L., Yang Q., Jiang X., Learning discriminative and shareable features for scene classification, Proceedings of the European Conference on Computer Vision, 2014, pp. 552–568.
[68]
M. Cimpoi, S. Maji, A. Vedaldi, Deep filter banks for texture recognition and segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3828–3836.
[69]
Xie G.S., Zhang X.Y., Yan S., Liu C.L., Hybrid CNN and dictionary-based models for scene recognition and domain adaptation, IEEE Trans. Circuits Syst. Video Technol. 27 (6) (2017) 1263–1274.
[70]
Wang Z., Wang L., Wang Y., Zhang B., Qiao Y., Weakly supervised PatchNets: describing and aggregating local patches for scene recognition, IEEE Trans. Image Process. 26 (4) (2017) 2028–2041.
[71]
Cheng X., Lu J., Feng J., Yuan B., Zhou J., Scene recognition with objectness, Pattern Recognit. 74 (2017) 474–487.
[72]
J. Donahue, Jia Y., O. Vinyals, J. Hoffman, Zhang N., Tzeng E., T. Darrell, Decaf: a deep convolutional activation feature for generic visual recognition, Proceedings of the International Conference on Machine Learning, 2014, pp. 647–655.
[73]
Gong Y., Wang L., Guo R., S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, Proceedings of the European Conference on Computer Vision, 2014, pp. 392–407.
[74]
Wu R., Wang B., Wang W., Yu Y., Harvesting discriminative meta objects with deep CNN features for scene classification, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1287–1295.

Cited By

View all
  • (2025)Semantic image representation for image recognition and retrieval using multilayer variational auto-encoder, InceptionNet and low-level image featuresThe Journal of Supercomputing10.1007/s11227-024-06792-581:1Online publication date: 1-Jan-2025
  • (2024)Scene Classification on Fine Arts with Style TransferProceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge Contents10.1145/3689094.3689468(18-27)Online publication date: 28-Oct-2024
  • (2024)Deep Learning Approach for Driver Speed Intention Recognition Based on Naturalistic Driving DataIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339808325:10(14546-14559)Online publication date: 1-Oct-2024
  • Show More Cited By

Index Terms

  1. A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Neurocomputing
        Neurocomputing  Volume 338, Issue C
        Apr 2019
        442 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 21 April 2019

        Author Tags

        1. Scene classification
        2. Transfer learning
        3. ResNet
        4. Data augmentation
        5. CNN

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 29 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Semantic image representation for image recognition and retrieval using multilayer variational auto-encoder, InceptionNet and low-level image featuresThe Journal of Supercomputing10.1007/s11227-024-06792-581:1Online publication date: 1-Jan-2025
        • (2024)Scene Classification on Fine Arts with Style TransferProceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge Contents10.1145/3689094.3689468(18-27)Online publication date: 28-Oct-2024
        • (2024)Deep Learning Approach for Driver Speed Intention Recognition Based on Naturalistic Driving DataIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339808325:10(14546-14559)Online publication date: 1-Oct-2024
        • (2024)On the ideal number of groups for isometric gradient propagationNeurocomputing10.1016/j.neucom.2023.127217573:COnline publication date: 16-May-2024
        • (2024)Inter-object discriminative graph modeling for indoor scene recognitionKnowledge-Based Systems10.1016/j.knosys.2024.112371302:COnline publication date: 25-Oct-2024
        • (2024)Recent advances in scene image representation and classificationMultimedia Tools and Applications10.1007/s11042-023-15005-983:3(9251-9278)Online publication date: 1-Jan-2024
        • (2024)A single-stream adaptive scene layout modeling method for scene recognitionNeural Computing and Applications10.1007/s00521-024-09772-136:22(13703-13714)Online publication date: 1-Aug-2024
        • (2024)Scene representation using a new two-branch neural network modelThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03162-940:9(6219-6244)Online publication date: 1-Sep-2024
        • (2023)Deep pyramidal residual networks with inception sub-structure in image classificationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23056945:4(5885-5906)Online publication date: 1-Jan-2023
        • (2023)Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2Applied Intelligence10.1007/s10489-023-04460-453:15(18431-18449)Online publication date: 30-Jan-2023
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media