Abstract
Terrain texture classification plays an important role in computer vision applications, such as robot navigation, autonomous driving, etc. Traditional methods based on hand-crafted features often have sub-optimal performances due to the inefficiency in modeling the complex terrain variations. In this paper, we propose a residual attention encoding network (RAENet) for terrain texture classification. Specifically, RAENet incorporates a stack of residual attention blocks (RABs) and an encoding block (EB). By generating attention feature maps jointly with residual learning, RAB is different from the usually used which only combine feature from the current layer with the former one layer. RAB combines all the preceding layers to the current layer and is not only minimize the information loss in the convolution process, but also enhance the weights of the features that are conducive to distinguish between different classes. Then EB further adopts orderless encoder to keep the invariance to spatial layout in order to extract feature details before classification. The effectiveness of RAENet is evaluated on two terrain texture datasets. Experimental results show that RAENet achieves state-of-the-art performance.
This work is partially supported by National Natural Science Foundation of China under Grant Nos 61872188, U1713208, 61602244, 61672287, 61702262, 61773215.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
University of Oulu texture database. http://www.outex.oulu.fi/temp/
Abadi, M., et al.: Tensorflow: A system for large-scale machine learning (2016)
Akl, A., Yaacoub, C., Donias, M., Costa, J.P.D., Germain, C.: A survey of exemplar-based texture synthesis methods. Comput. Vis. Image Underst. 172, 12–24 (2018)
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 99, 1 (2017)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 35(8), 1798–1828 (2013)
Cao, C., Liu, X., Yi, Y., Yu, Y., Huang, T.S.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: IEEE International Conference on Computer Vision, ICCV (2016)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. Computer Science (2014)
Chen, L., et al.: SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Cimpoi, M., Maji, S., Kokkinos, I., Vedaldi, A.: Deep filter banks for texture recognition, description, and segmentation. Int. J. Comput. Vis. 118(1), 65–94 (2016)
Csurka, G.: Visual categorization with bags of keypoints. Workshop Stat. Learn. Comput. Vis. 44(247), 1–22 (2004)
Di, H., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 41(6), 765–781 (2011)
Gatys, L.A., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. Adv. Neural Inf. Process. Syst. 70(1), 262–270 (2015)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics (2011)
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Bing, S., Liu, T., Wang, X., Gang, W.: Recent advances in convolutional neural networks. Computer Science (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Hinton, G.E.: Rectified linear units improve restricted boltzmann machines Vinod Nair. In: Proceedings of the 27th International Conference on Machine Learning, ICML (2010)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 99, 1 (2017)
Huang, G., Liu, Z., Laurens, V.D.M., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML (2015)
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: IEEE International Conference on Computer Vision, ICCV (2005)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the Conference on Machine Learning (1998)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Computer Science (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, NIPS (2012)
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: International Conference on Neural Information Processing Systems, NIPS (2010)
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. IJCV 43(1), 29–44 (2001)
Li, F.F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2005)
Lin, M., Chen, Q., Yan, S.: Network in network. Computer Science (2013)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: International Conference on Neural Information Processing Systems, NIPS, vol. 3 (2014)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VIII. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 24(7), 971–987 (2002)
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu., W., Maybank, S.: Learning attentions: Residual attentional siamese network for high performance online visual track. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Raad, L., Davy, A., Desolneux, A., Morel, J.M.: A survey of exemplar-based texture synthesis. Arxiv (2017)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2015)
Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Int. J. Comput. Vis. 62(1–2), 61–81 (2005)
Xu, K.: Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 34nd International Conference on Machine Learning, ICML (2017)
Xue, J., Zhang, H., Dana, K.: Deep texture manifold for ground terrain recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Zhang, H., et al.: Context encoding for semantic segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Zhang, H., Xue, J., Dana, K.: Deep ten: texture encoding network. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Zhang, X., Wang, T., Qi, J.: Progressive attention guided recurrent network for salient object detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, X., Yang, J., Jin, Z. (2020). Residual Attention Encoding Neural Network for Terrain Texture Classification. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-41299-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41298-2
Online ISBN: 978-3-030-41299-9
eBook Packages: Computer ScienceComputer Science (R0)