A Multiscale Self-Adaptive Attention Network for Remote Sensing Scene Classification
"> Figure 1
<p>Examples of object scale variation in remote sensing images.</p> "> Figure 2
<p>Overview of the proposed method.</p> "> Figure 3
<p>Effects of different receptive fields on different samples. (<b>a</b>) Receptive field of 3 × 3 convolutional kernel with dilation rate = 1. (<b>b</b>) Receptive field of 3 × 3 convolutional kernel with dilation rate = 2. (<b>c</b>) Receptive field of 3 × 3 convolutional kernel with dilation rate = 3.</p> "> Figure 4
<p>Illustration of multiscale feature-fusion module.</p> "> Figure 5
<p>Structure of a basic convolution block with attention.</p> "> Figure 6
<p>The channel attention module.</p> "> Figure 7
<p>The spatial attention module.</p> "> Figure 8
<p>Representation of classes in the UC Merced dataset. (1) Agriculture. (2) Airplane. (3) Baseball diamond. (4) Beach. (5) Buildings. (6) Chaparral. (7) Dense residential. (8) Forest. (9) Freeway. (10) Golf course. (11) Harbor. (12) Intersection. (13) Medium residential. (14) Mobile home park. (15) Overpass. (16) Parking lot. (17) River. (18) Runway. (19) Sparse residential. (20) Storage tanks. (21) Tennis court.</p> "> Figure 9
<p>Classification results of different methods for the UC Merced dataset.</p> "> Figure 10
<p>Accuracy in each class of ResNet-18 and multiscale self-adaptive attention network (MSAA-Net) for the UC Merced dataset.</p> "> Figure 11
<p>Representation of classes in the NWPU-RESISC45 dataset. (1) Airplane. (2) Airport. (3) Baseball diamond. (4) Basketball court. (5) Beach. (6) Bridge. (7) Chaparral. (8) Church. (9) Circular farmland. (10) Cloud. (11) Commercial area. (12) Dense residential. (13) Desert. (14) Forest. (15) Freeway. (16) Golf course. (17) Ground track field. (18) Harbor. (19) Industrial area. (20) Intersection. (21) Island. (22) Lake. (23) Meadow. (24) Medium residential. (25) Mobile home park. (26) Mountain. (27) Overpass. (28) Palace. (29) Parking lot. (30) Railway. (31) Railway station. (32) Rectangular farmland. (33) River. (34) Roundabout. (35) Runway. (36) Sea ice. (37) Ship. (38) Snowberg. (39) Sparse residential. (40) Stadium. (41) Storage tank. (42) Tennis court. (43) Terrace. (44) Thermal power station. (45) Wetland.</p> "> Figure 12
<p>Classification results of different methods for NWPU-RESISC45 dataset.</p> "> Figure 13
<p>Accuracy in each class of ResNet-18 and MSAA-Net for the NWPU-RESISC45 dataset.</p> "> Figure 14
<p>Representation of classes in the SIRI-WHU dataset. (1) Agriculture. (2) Commercial. (3) Harbor. (4) Idle land. (5) Industrial. (6) Meadow. (7) Overpass. (8) Park. (9) Pond. (10) Residential. (11) River. (12) Water.</p> "> Figure 15
<p>Classification results of different methods for the Google dataset of SIRI-WHU.</p> "> Figure 16
<p>Accuracy in each class of ResNet-18 and our method for the Google dataset of SIRI-WHU.</p> "> Figure 17
<p>Example of cropped and resized images.</p> ">
Abstract
:1. Introduction
- A novel multiscale features extraction module containing two convolution block branches is proposed to extract features at different scales. The two branches have the same structure, but different receptive fields. Convolutional kernels of different receptive field sizes can capture features of different scales, and no additional parameters are introduced, because the parameters in both branches are shared.
- A multiscale feature-fusion module is designed for the proposed network. In this module, a squeeze process is used to obtain global information and the excitation process is used to learn the weights in different channels. With global information, the proposed method can select more useful information from the two feature maps for adaptive fusion.
- A deep classification module with the attention mechanism is proposed to extract high-level semantic features and generate final classification results. In this module, the skipping connection can well solve the problem of gradient disappearance, and the attention mechanism can perform sparse and enhanced processing on the features.
2. Materials and Methods
2.1. Multiscale Features Extraction Module
2.2. Multiscale Feature-Fusion Module
2.3. Deep Classification Module
2.3.1. Channel Attention
2.3.2. Spatial Attention
3. Experimental Results
3.1. UC Merced Land Use Dataset
3.2. NWPU-RESISC45 Dataset
3.3. SIRI-WHU Dataset
4. Discussion
4.1. Fusion Method
4.2. Dilation Rate
4.3. Depth of Multiscale Extraction Module
4.4. Computational Time
4.5. Self-Adaptive Selection
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Chen, W.; Li, X.; He, H.; Wang, L. A review of fine-scale land use and land cover classification in open-pit mining areas by remote sensing techniques. Remote Sens. 2018, 10, 15. [Google Scholar] [CrossRef] [Green Version]
- Lu, X.; Yuan, Y.; Zheng, X. Joint dictionary learning for multispectral change detection. IEEE Trans. Cybern. 2016, 47, 884–897. [Google Scholar] [CrossRef] [PubMed]
- Lu, X.; Zhang, W.; Li, X. A Hybrid Sparsity and Distance-Based Discrimination Detector for Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1704–1717. [Google Scholar] [CrossRef]
- Bratasanu, D.; Nedelcu, I.; Datcu, M. Bridging the semantic gap for satellite image annotation and automatic mapping applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2010, 4, 193–204. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Du, S. A linear Dirichlet mixture model for decomposing scenes: Application to analyzing urban functional zonings. Remote Sens. Environ. 2015, 169, 37–49. [Google Scholar] [CrossRef]
- Deng, L.; Li, J.; Huang, J.T.; Yao, K.; Yu, D.; Seide, F.; Seltzer, M.; Zweig, G.; He, X.; Williams, J.; et al. Recent advances in deep learning for speech research at Microsoft. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 8604–8608. [Google Scholar]
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–39 October 2017; pp. 2961–2969. [Google Scholar]
- Lu, X.; Chen, Y.; Li, X. Hierarchical recurrent neural hashing for image retrieval with hierarchical convolutional features. IEEE Trans. Image Process. 2017, 27, 106–120. [Google Scholar] [CrossRef]
- Lee, H.; Battle, A.; Raina, R.; Ng, A.Y. Efficient sparse coding algorithms. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 801–808. [Google Scholar]
- Yin, F.; Cao, S.; Xu, X. Remote sensing image fusion based on dictionary learning and sparse representation. In Proceedings of the 2019 International Conference on Image and Video Processing, and Artificial Intelligence, Shanghai, China, 23–25 August 2019. [Google Scholar]
- Sun, Y.; Wang, S.; Liu, Q.; Hang, R.; Liu, G. Hypergraph embedding for spatial-spectral joint feature extraction in hyperspectral images. Remote Sens. 2017, 9, 506. [Google Scholar]
- Fauvel, M.; Benediktsson, J.A.; Chanussot, J.; Sveinsson, J.R. Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles. IEEE Trans. Geosci. Remote Sens. 2008, 46, 3804–3814. [Google Scholar] [CrossRef] [Green Version]
- Tu, B.; Li, N.; Fang, L.; He, D.; Ghamisi, P. Hyperspectral image classification with multi-scale feature extraction. Remote Sens. 2019, 11, 534. [Google Scholar] [CrossRef] [Green Version]
- Huang, X.; Han, X.; Zhang, L.; Gong, J.; Liao, W.; Benediktsson, J.A. Generalized differential morphological profiles for remote sensing image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 1736–1751. [Google Scholar] [CrossRef] [Green Version]
- Alsharif, A.A.A.; Pradhan, B. Urban sprawl analysis of Tripoli Metropolitan city (Libya) using remote sensing data and multivariate logistic regression model. J. Indian Soc. Remote Sens. 2014, 42, 149–163. [Google Scholar] [CrossRef]
- Cao, F.; Yang, Z.; Ren, J.; Ling, W.K.; Zhao, H.; Marshall, S. Extreme sparse multinomial logistic regression: A fast and robust framework for hyperspectral image classification. Remote Sens. 2017, 9, 1255. [Google Scholar] [CrossRef] [Green Version]
- Xu, M.; Watanachaturaporn, P.; Varshney, P.K.; Arora, M.K. Decision tree regression for soft classification of remote sensing data. Remote Sens. Environ. 2005, 97, 322–336. [Google Scholar] [CrossRef]
- Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
- Gualtieri, J.A.; Cromp, R.F. Support vector machines for hyperspectral remote sensing classification. Proc. SPIE-The Int. Soc. Opt. Eng. 1998, 3584, 221–232. [Google Scholar]
- Melgani, F.; Bruzzone, L. Classification of Hyperspectral Remote Sensing Images with Support Vector Machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
- Thaseen, I.; Kumar, C.A. Intrusion detection model using fusion of PCA and optimized SVM. In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics (IC3I), Mysore, India, 27–29 November 2014. [Google Scholar]
- Han, T.; Jiang, D.; Zhao, Q.; Wang, L.; Yin, K. Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery. Trans. Inst. Meas. Control. 2018, 40, 2681–2693. [Google Scholar] [CrossRef]
- Gong, Z.; Zhong, P.; Hu, W. Statistical Loss and Analysis for Deep Learning in Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2020. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Zhang, H.; Shen, Q. Spectral–spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef] [Green Version]
- Feng, J.; Feng, X.; Chen, J.; Cao, X.; Zhang, X.; Jiao, L.; Yu, T. Generative Adversarial Networks Based on Collaborative Learning and Attention Mechanism for Hyperspectral Image Classification. Remote Sens. 2020, 12, 1149. [Google Scholar] [CrossRef] [Green Version]
- Cao, X.; Zhou, F.; Xu, L.; Meng, D.; Xu, Z.; Paisley, J. Hyperspectral image classification with Markov random fields and a convolutional neural network. IEEE Trans. Image Process. 2018, 27, 2354–2367. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, N.; Fang, L.; Li, S.; Plaza, J.; Plaza, A. Skip-Connected Covariance Network for Remote Sensing Scene Classification. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 1461–1474. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, Y.; Zhong, Y.; Qin, Q. Scene classification based on multiscale convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 7109–7121. [Google Scholar] [CrossRef] [Green Version]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–39 October 2017; pp. 764–773. [Google Scholar]
- Wang, C.; Shi, J.; Yang, X.; Zhou, Y.; Wei, S.; Li, L.; Zhang, X. Geospatial Object Detection via Deconvolutional Region Proposal Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3014–3027. [Google Scholar] [CrossRef]
- Rizaldy, A.; Persello, C.; Gevaert, C.; Oude Elberink, S.; Vosselman, G. Ground and multi-class classification of airborne laser scanner point clouds using fully convolutional networks. Remote Sens. 2018, 10, 1723. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–15 June 2015; pp. 3431–3440. [Google Scholar]
- Körez, A.; Barışçı, N. Object Detection with Low Capacity GPU Systems Using Improved Faster R-CNN. Appl. Sci. 2020, 10, 83. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–15 June 2015; pp. 1–9. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Chen, Y.; Li, J.; Xiao, H.; Jin, X.; Yan, S.; Feng, J. Dual path networks. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 3–6 December 2017; pp. 4467–4475. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Gao, Z.; Xie, J.; Wang, Q.; Li, P. Global second-order pooling convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3024–3033. [Google Scholar]
- Carreira, J.; Caseiro, R.; Batista, J.; Sminchisescu, C. Semantic segmentation with second-order pooling. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 430–443. [Google Scholar]
- Dai, T.; Cai, J.; Zhang, Y.; Xia, S.T.; Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 11065–11074. [Google Scholar]
- Xia, B.N.; Gong, Y.; Zhang, Y.; Poellabauer, C. Second-Order Non-Local Attention Networks for Person Re-Identification. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3760–3769. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. Eca-net: Efficient channel attention for deep convolutional neural networks. arXiv 2019, arXiv:1910.03151. [Google Scholar]
- Liu, Y.; Zhong, Y.; Fei, F.; Zhu, Q.; Qin, Q. Scene Classification Based on a Deep Random-Scale Stretched Convolutional Neural Network. Remote Sens. 2018, 10, 444. [Google Scholar] [CrossRef] [Green Version]
- Zeng, D.; Chen, S.; Chen, B.; Li, S. Improving Remote Sensing Scene Classification by Integrating Global-Context and Local-Object Features. Remote Sens. 2018, 10, 734. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhao, B.; Zhong, Y.; Zhang, L. A spectral–structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2016, 116, 73–85. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef] [Green Version]
- Woo, S.; Park, J.; Lee, J.Y.; So Kweon, I. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Li, Y.; Chen, Y.; Wang, N.; Zhang, Z. Scale-aware trident networks for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6054–6063. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Method | Classification Accurzacy (%) |
---|---|
LLC | 82.85 |
BoVW | 73.46 |
AlexNet | 85.63 ± 2.6 |
GoogLeNet | 92.81 ± 0.64 |
VGG-16 | 89.09 ± 2.01 |
GSoP-Net | 92.62 ± 1.2 |
ECA-Net | 94.05 ± 0.96 |
ResNet-18 | 90.95 ± 0.42 |
MSAA-Net without attention | 92.38 ± 0.35 |
MSAA-Net (ours) | 94.524 ± 0.74 |
Method | Classification Accuracy (%) |
---|---|
LLC | 59.92 |
BoVW | 67.65 |
AlexNet | 79.92 ± 2.1 |
VGG-16 | 90.26 ± 0.74 |
GoogLeNet | 91.45 ± 1.24 |
GSoP-Net | 91.206 ± 1.32 |
ECA-Net | 93.378 ± 0.26 |
ResNet-18 | 89.93 ± 0.34 |
MSAA-Net without attention | 94.03 ± 0.72 |
MSAA-Net (ours) | 95.01 ± 0.54 |
Method | Classification Accuracy (%) |
---|---|
BoVW | 73.93 |
LLC | 70.89 |
AlexNet | 87.27 ± 1.63 |
GoogLeNet | 92.31 ± 1.64 |
VGG-16 | 90.83 ± 1.9 |
GSoP-Net | 94.37 ± 0.56 |
ECA-Net | 93.52 ± 0.4 |
ResNet-18 | 92.23 ± 0.9 |
MSAA-Net without attention | 93.958 ± 1.12 |
MSAA-Net (ours) | 95.21 ± 0.65 |
Fusion Method | Adaptive Fusion (Ours) | Eltsum | Concat |
---|---|---|---|
Classification Accuracy (%) | 95.21 | 94.58 | 94.79 |
Dilation rate in two branches | 1, 2 | 1, 3 | 2, 3 |
Classification accuracy (%) | 95.21 | 94.58 | 94.14 |
Block Number in Multiscale Features Module | 1 | 2 | 3 |
Classification Accuracy (%) | 95.21 | 94.375 | 94.164 |
Method | Time (Each Epoch) (s) | Time (Total) (s) | Classification Accuracy (%) |
---|---|---|---|
AlexNet | 10 | 750 | 87.27 |
VGG-16 | 23 | 1909 | 90.83 |
GoogLeNet | 16 | 1088 | 92.31 |
ResNet-18 | 10 | 790 | 92.23 |
GSoP-Net | 17.6 | 1672 | 94.37 |
ECA-Net | 10.5 | 840 | 93.52 |
MSAA-Net(ours) | 11 | 1050 | 95.21 |
Scale of Cropped Samples (Pixels) | 256 (original) | 224 | 192 | 160 | 128 |
Mean Attention Difference | 0.065 | 0.063 | 0.06 | 0.057 | 0.054 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, L.; Liang, P.; Ma, J.; Jiao, L.; Guo, X.; Liu, F.; Sun, C. A Multiscale Self-Adaptive Attention Network for Remote Sensing Scene Classification. Remote Sens. 2020, 12, 2209. https://doi.org/10.3390/rs12142209
Li L, Liang P, Ma J, Jiao L, Guo X, Liu F, Sun C. A Multiscale Self-Adaptive Attention Network for Remote Sensing Scene Classification. Remote Sensing. 2020; 12(14):2209. https://doi.org/10.3390/rs12142209
Chicago/Turabian StyleLi, Lingling, Pujiang Liang, Jingjing Ma, Licheng Jiao, Xiaohui Guo, Fang Liu, and Chen Sun. 2020. "A Multiscale Self-Adaptive Attention Network for Remote Sensing Scene Classification" Remote Sensing 12, no. 14: 2209. https://doi.org/10.3390/rs12142209
APA StyleLi, L., Liang, P., Ma, J., Jiao, L., Guo, X., Liu, F., & Sun, C. (2020). A Multiscale Self-Adaptive Attention Network for Remote Sensing Scene Classification. Remote Sensing, 12(14), 2209. https://doi.org/10.3390/rs12142209