SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation
<p>The overall architecture of the SFA-Net.</p> "> Figure 2
<p>Transformer-based decoder. (<b>a</b>–<b>c</b>) present the decoder block, weighted function, and feature refinement head, respectively.</p> "> Figure 3
<p>Visualization of the segmentation results on the UAVid datasets. (<b>a</b>) is ID 000300 in sequence 23, (<b>b</b>) is ID 000500 in sequence 28, (<b>c</b>) is ID 000000 id in sequence 30, and (<b>d</b>) is ID 000500 id in sequence 39.</p> "> Figure 4
<p>Visualization of the entire set of IDs and their segmentation results on the ISPRS Potsdam dataset. (<b>a</b>) is ID 3_14 and (<b>b</b>) is ID 5_13.</p> "> Figure 5
<p>Visualization of the segmentation results on the ISPRS Potsdam dataset. (<b>a</b>) is the 10th split of ID 3_14, (<b>b</b>) is the 12th split of ID 5_13, (<b>c</b>) is the 10th split of ID 6_14, and (<b>d</b>) is the 22nd split of ID 7_13.</p> "> Figure 6
<p>Visualization of the entire set of IDs and their segmentation results on the ISPRS Vaihingen test datasets. (<b>a</b>) is area 6 and (<b>b</b>) is area 27.</p> "> Figure 7
<p>Visualization of the segmentation results on the ISPRS Vaihingen dataset. (<b>a</b>) is the 4th split of ID area 31, and (<b>b</b>) is the 5th split from 2nd of area 33. (<b>c</b>) is the 4th split of area 38, and (<b>d</b>) is the 11th split of area 38.</p> "> Figure 8
<p>Visualization of the segmentation results on the LoveDA dataset. (<b>a</b>) is 4430, (<b>b</b>) is 4378, and (<b>c</b>) is 5458.</p> "> Figure 9
<p>Visualized complexity vs. performance graph of each dataset. The horizontal and vertical axes denote FLOPs and evaluation metrics, respectively, and the bubble diameter denotes the number of parameters.</p> ">
Abstract
:1. Introduction
- We propose FAMs to refine the multiscale feature maps extracted from the CNN encoder.
- We present the SFA-Net, which consists of a CNN encoder, a transformer decoder, and two FAMs.
- We demonstrate the effectiveness of the proposed model on four benchmark datasets, including UAVid, ISRPS Potsdam, ISPRS Vaihingen, and LoveDA.
2. Related Work
3. Proposed Method
3.1. Efficient CNN-Based Encoder
3.2. Feature Adjustment Module
3.3. Transformer-Based Decoder
3.4. Feature Refinement Head
3.5. Loss Function
4. Experimental Results
4.1. Datasets
4.1.1. UAVid
4.1.2. ISPRS Potsdam
4.1.3. ISPRS Vaihingen
4.1.4. LoveDA
4.2. Experimental Setting and Evaluation Measure
4.3. Experimental Results on the UAVid Dataset
4.4. Experimental Results on the ISPRS Potsdam Dataset
4.5. Experimental Results on the ISPRS Vanihingen Dataset
4.6. Experimental Results on the LoveDA Dataset
4.7. Ablation Study
4.8. Efficiency
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mnih, V.; Hinton, G.E. Learning to Detect Roads in High-Resolution Aerial Images. In Computer Vision–ECCV 2010, Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Daniilidis, K., Maragos, P., Paragios, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 210–223. [Google Scholar]
- Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. DeepGlobe 2018: A Challenge to Parse the Earth Through Satellite Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Mohajerani, S.; Saeedi, P. Cloud-Net: An End-to-End Cloud Detection Algorithm for Landsat 8 Imagery. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1029–1032. [Google Scholar] [CrossRef]
- Maji, S.; Rahtu, E.; Kannala, J.; Blaschko, M.; Vedaldi, A. Fine-Grained Visual Classification of Aircraft. arXiv 2013, arXiv:1306.5151. [Google Scholar]
- Chen, H.; Shi, Z. A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
- Zhang, Z.; Guo, W.; Zhu, S.; Yu, W. Toward Arbitrary-Oriented Ship Detection With Rotated Region Proposal and Discrimination Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1745–1749. [Google Scholar] [CrossRef]
- Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global land use/land cover with Sentinel 2 and deep learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 4704–4707. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Pun, M.O. RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2024, 21. [Google Scholar] [CrossRef]
- Kang, X.; Hong, Y.; Duan, P.; Li, S. Fusion of hierarchical class graphs for remote sensing semantic segmentation. Inf. Fusion 2024, 109, 102409. [Google Scholar] [CrossRef]
- Yamazaki, K.; Hanyu, T.; Tran, M.; Garcia, A.; Tran, A.; McCann, R.; Liao, H.; Rainwater, C.; Adkins, M.; Molthan, A.; et al. AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation. arXiv 2023, arXiv:2306.06842. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Wang, L.; Li, R.; Duan, C.; Zhang, C.; Meng, X.; Fang, S. A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv 2016, arXiv:1511.00561. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2017, arXiv:1612.01105. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. arXiv 2019, arXiv:1809.02983. [Google Scholar]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. arXiv 2019, arXiv:1904.04514. [Google Scholar]
- Islam, M.A.; Kowal, M.; Jia, S.; Derpanis, K.G.; Bruce, N.D.B. Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs. arXiv 2021, arXiv:2108.07884. [Google Scholar]
- Gu, X.; Li, S.; Ren, S.; Zheng, H.; Fan, C.; Xu, H. Adaptive enhanced swin transformer with U-net for remote sensing image segmentation. Comput. Electr. Eng. 2022, 102, 108223. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Long Beach, CA, USA, 2017; Volume 30. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Advances in Neural Information Processing Systems; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: San Diego, CA, USA, 2021; Volume 34, pp. 12077–12090. [Google Scholar]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 7262–7272. [Google Scholar]
- Zhang, C.; Wang, L.; Cheng, S.; Li, Y. SwinSUNet: Pure Transformer Network for Remote Sensing Image Change Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
- Liu, M.; Chai, Z.; Deng, H.; Liu, R. A CNN-Transformer Network With Multiscale Context Aggregation for Fine-Grained Cropland Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4297–4306. [Google Scholar] [CrossRef]
- Cheng, H.K.; Chung, J.; Tai, Y.W.; Tang, C.K. CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement. arXiv 2020, arXiv:2005.02551. [Google Scholar]
- Qin, X.; Fan, D.P.; Huang, C.; Diagne, C.; Zhang, Z.; Sant’Anna, A.C.; Suàrez, A.; Jagersand, M.; Shao, L. Boundary-Aware Segmentation Network for Mobile and Web Applications. arXiv 2021, arXiv:2101.04704. [Google Scholar]
- Dong, Z.; Li, J.; Fang, T.; Shao, X. Lightweight boundary refinement module based on point supervision for semantic segmentation. Image Vis. Comput. 2021, 110, 104169. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2020, arXiv:1905.11946. [Google Scholar]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Lyu, Y.; Vosselman, G.; Xia, G.S.; Yilmaz, A.; Yang, M.Y. UAVid: A semantic segmentation dataset for UAV imagery. ISPRS J. Photogramm. Remote Sens. 2020, 165, 108–119. [Google Scholar] [CrossRef]
- Potsdam and Vaihingen Datasets. International Society for Photogrammetry and Remote Sensing. Available online: https://www.isprs.org/education/benchmarks/UrbanSemLab (accessed on 20 June 2024).
- Wang, J.; Zheng, Z.; Ma, A.; Lu, X.; Zhong, Y. LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation. In Neural Information Processing Systems Track on Datasets and Benchmarks; Vanschoren, J., Yeung, S., Eds.; Curran: San Diego, CA, USA, 2021; Volume 1. [Google Scholar]
- Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Wang, L.; Atkinson, P.M. ABCNet: Attentive bilateral contextual network for efficient semantic segmentation of Fine-Resolution remotely sensed imagery. ISPRS J. Photogramm. Remote Sens. 2021, 181, 84–98. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Wang, D.; Duan, C.; Wang, T.; Meng, X. Transformer meets convolution: A bilateral awareness network for semantic segmentation of very fine resolution urban scene images. Remote Sens. 2021, 13, 3065. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Chen, Y.; Fang, P.; Yu, J.; Zhong, X.; Zhang, X.; Li, T. Hi-ResNet: A High-Resolution Remote Sensing Network for Semantic Segmentation. arXiv 2023, arXiv:2305.12691. [Google Scholar]
Datasets | Split | Category | ||
---|---|---|---|---|
Train | Validation | Test | ||
UAVid | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 31, 32, 33, 34, 35 (20) | 16, 17, 18, 19, 20, 36, 37 (7) | 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 38, 39, 40, 41, 42 (15) | building, road, tree, low vegetation, static car, moving car, human, clutter (8) |
ISPRS Potsdam | 2_11, 2_12, 3_10, 3_11, 3_12, 4_10, 4_11, 4_12, 5_10, 5_11, 5_12, 6_7, 6_8, 6_9, 6_10, 6_11, 6_12, 7_7, 7_8, 7_9, 7_11, 7_12 (22) | 2_10 (1) | 2_13, 2_14, 3_13, 3_14, 4_13, 4_14, 4_15, 5_13, 5_14, 5_15, 6_13, 6_14, 6_15, 7_13 (14) | Imp. Surf., building, low vegetation, tree, car, background (6) |
ISPRS Vaihingen | 1, 3, 5, 7, 11, 13, 15, 17, 21, 23, 26, 28, 32, 34, 37 (15) | 30 (1) | 2, 4, 6, 8, 10, 12, 14, 16, 20, 22, 24, 27, 29, 31, 33, 35, 38 (17) | Imp. Surf., building, low vegetation, tree, car, background (6) |
LoveDA | 0∼2521 (2522) | 2522∼4190 (1669) | 4191∼5986 (1796) | background, building, road, water, barren, forest, agriculture (7) |
Method | Backbone | Parameters (M) | Clutter | Building | Road | Tree | Vegetation | Moving Car | Static Car | Human | mIoU |
---|---|---|---|---|---|---|---|---|---|---|---|
DANet [18] | ResNet18 | 12.6 | 64.9 | 58.9 | 77.9 | 68.3 | 61.5 | 59.6 | 47.4 | 9.1 | 60.6 |
ABCNet [35] | ResNet18 | 14.0 | 67.4 | 86.4 | 81.2 | 79.9 | 63.1 | 69.8 | 48.4 | 13.9 | 63.8 |
BANet [36] | ResT-Lite | 12.7 | 66.7 | 85.4 | 80.7 | 78.9 | 62.1 | 69.3 | 52.8 | 21.0 | 64.6 |
SegFormer [23] | MiT-B1 | 13.7 | 66.6 | 86.3 | 80.1 | 79.6 | 62.3 | 72.5 | 52.5 | 28.5 | 66.0 |
UNetFormer [31] | ResNet18 | 11.7 | 68.4 | 87.4 | 81.5 | 80.2 | 63.5 | 73.6 | 56.4 | 31.0 | 67.8 |
SFA-Net (ours) | EfficientNet-B3 | 10.7 | 70.2 | 89.0 | 82.7 | 80.8 | 64.6 | 77.5 | 67.5 | 30.7 | 70.4 |
Method | Backbone | Parameters (M) | Imp.surf. | Building | LowVeg | Tree | Car | mF1 |
---|---|---|---|---|---|---|---|---|
DANet [18] | ResNet18 | 12.6 | 91.0 | 95.6 | 86.1 | 87.6 | 84.3 | 88.9 |
ABCNet [35] | ResNet18 | 14.0 | 93.5 | 96.9 | 87.9 | 89.1 | 95.8 | 92.7 |
Segmenter [24] | Vit-Tiny | 6.7 | 91.5 | 95.3 | 85.4 | 85.0 | 88.5 | 89.2 |
BANet [36] | ResT-Lite | 12.7 | 93.3 | 96.7 | 87.4 | 89.1 | 96.0 | 92.5 |
SwinUperNet [12] | Swin-Tiny | 60 | 93.2 | 96.4 | 87.6 | 88.6 | 95.4 | 92.2 |
DC-Swin [13] | Swin-Small | 66.9 | 94.2 | 97.6 | 88.6 | 89.6 | 96.3 | 93.3 |
UNetFormer [31] | ResNet18 | 11.7 | 93.6 | 97.2 | 87.7 | 88.9 | 96.5 | 92.8 |
AerialFormer-B [11] | Swin-Base | 113.8 | 95.5 | 98.1 | 89.8 | 89.8 | 97.5 | 94.1 |
SFA-Net (ours) | EfficientNet-B3 | 10.7 | 95.0 | 97.5 | 88.3 | 89.6 | 97.1 | 93.5 |
Method | Backbone | Parameters (M) | Imp.surf. | Building | LowVeg | Tree | Car | mF1 |
---|---|---|---|---|---|---|---|---|
DANet [18] | ResNet18 | 12.6 | 90.0 | 93.9 | 82.2 | 87.3 | 44.5 | 79.6 |
ABCNet [35] | ResNet18 | 14.0 | 92.7 | 95.2 | 84.5 | 89.7 | 85.3 | 89.5 |
BANet [36] | ResT-Lite | 12.7 | 92.2 | 95.2 | 83.8 | 89.9 | 86.8 | 89.6 |
Segmenter [24] | Vit-Tiny | 6.7 | 89.8 | 93.0 | 81.2 | 88.9 | 67.6 | 84.1 |
SwinUperNet [12] | Swin-Tiny | 60 | 92.8 | 95.6 | 85.1 | 90.6 | 85.1 | 89.8 |
DC-Swin [13] | Swin-Small | 66.9 | 93.6 | 96.2 | 85.8 | 90.4 | 87.6 | 90.7 |
UNetFormer [31] | ResNet18 | 11.7 | 92.7 | 95.3 | 84.9 | 90.6 | 88.5 | 90.4 |
SFA-Net (ours) | EfficientNet-B3 | 10.7 | 93.5 | 96.3 | 85.4 | 90.2 | 90.7 | 91.2 |
Method | Backbone | Parameters (M) | Background | Building | Road | Water | Barren | Forest | Agriculture | mIoU |
---|---|---|---|---|---|---|---|---|---|---|
TransUNet [37] | ResNet50 | 90.7 | 43.0 | 56.1 | 53.7 | 78.0 | 9.3 | 44.9 | 56.9 | 48.9 |
DC-Swin [13] | Swin-Tiny | 66.9 | 41.3 | 54.5 | 56.2 | 78.1 | 14.5 | 47.2 | 62.4 | 50.6 |
UNetFormer [31] | ResNet18 | 11.7 | 44.7 | 58.8 | 54.9 | 79.6 | 20.1 | 46.0 | 62.5 | 52.4 |
Hi-Resnet [38] | Hi-ResNet | 54.3 | 46.7 | 58.3 | 55.9 | 80.1 | 17.0 | 46.7 | 62.7 | 52.5 |
AerialFormer-B [11] | Swin-Base | 113.8 | 47.8 | 60.7 | 59.3 | 81.5 | 17.9 | 47.9 | 64.0 | 54.1 |
SFA-Net (ours) | EfficientNet-B3 | 10.7 | 48.4 | 60.3 | 59.1 | 81.9 | 24.1 | 46.2 | 64.0 | 54.9 |
Dataset | |||||
---|---|---|---|---|---|
FAM1 | FAM2 | UAVid | ISPRS Potsdam | ISPRS Vaihingen | LoveDA |
67.8 | 92.8 | 90.4 | 52.4 | ||
✓ | 69.4 | 93.4 | 90.5 | 52.4 | |
✓ | 69.7 | 93.3 | 90.7 | 53.1 | |
✓ | ✓ | 70.4 | 93.5 | 91.2 | 54.9 |
Dataset | |||||
---|---|---|---|---|---|
0.0 | 0.2 | 0.4 | 0.7 | 1.0 | |
UAVid | 68.4 | 69.4 | 70.4 | 69.9 | 70.3 |
ISPRS Potsdam | 93.4 | 93.4 | 93.5 | 93.5 | 93.4 |
ISPRS Vaihingen | 90.7 | 90.9 | 91.2 | 90.9 | 90.9 |
LoveDA | 52.4 | 53.4 | 54.9 | 53.8 | 53.4 |
Dataset | Backbone | Parameters (M) | FLOPs (G) | mF1 |
---|---|---|---|---|
ISPRS Vaihingen | EfficientNet-B0 | 4.2 | 17.8 | 87.9 |
EfficientNet-B3 | 10.7 | 42.8 | 91.2 | |
ResNet18 | 11.9 | 47.8 | 90.1 | |
ResNet101 | 46.7 | 186.8 | 88.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hwang, G.; Jeong, J.; Lee, S.J. SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation. Remote Sens. 2024, 16, 3278. https://doi.org/10.3390/rs16173278
Hwang G, Jeong J, Lee SJ. SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation. Remote Sensing. 2024; 16(17):3278. https://doi.org/10.3390/rs16173278
Chicago/Turabian StyleHwang, Gyutae, Jiwoo Jeong, and Sang Jun Lee. 2024. "SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation" Remote Sensing 16, no. 17: 3278. https://doi.org/10.3390/rs16173278
APA StyleHwang, G., Jeong, J., & Lee, S. J. (2024). SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation. Remote Sensing, 16(17), 3278. https://doi.org/10.3390/rs16173278