A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection
<p>The structure of the STCCD network. The framework primarily consists of four components: encoder, decoder, bridge, and output. Two branches of Swin transformer layers and convolution layers, as well the FCM, are in the encoder stage, and the bridge stage includes FFMAM and AMSFM; there are six convolution blocks and five upsampling interaction layers in the decoder stage. Finally, the BRM is a simple boundary-refined module.</p> "> Figure 2
<p>The structure of the basic residual block (<b>a</b>) and the basic convolution block (<b>b</b>).</p> "> Figure 3
<p>Two consecutive Swin transformer blocks.</p> "> Figure 4
<p>The operation process of different FCMs. BN denotes the BatchNorm operation.</p> "> Figure 5
<p>The structure of the FFMAM module. The feature tensor X is derived from the convolution branch, and the feature tensor Y is acquired from the Swin transformer branch. The feature tensor <math display="inline"><semantics> <mrow> <mi>F</mi> <msup> <mrow/> <mo>′</mo> </msup> </mrow> </semantics></math> is the output characteristic graph of the multihead attention mechanism part, and the feature tensor <math display="inline"><semantics> <mrow> <mi>E</mi> <msup> <mrow/> <mo>′</mo> </msup> </mrow> </semantics></math> is the output characteristic graph of the channel attention mechanism part.</p> "> Figure 6
<p>The structure of AMSFM.</p> "> Figure 7
<p>Four dilation convolution layers in parallel.</p> "> Figure 8
<p>The structure of boundary refinement module.</p> "> Figure 9
<p>Global distribution of the datasets, including GF1-WHU, SPARCS, and L8-Biome. The AIR-CD dataset is not shown due to lack of geolocation information.</p> "> Figure 10
<p>Prediction results of different models on GF1-WFV dataset. Black and white represent the noncloud and cloud pixels, respectively. (<b>a</b>) RGB image. (<b>b</b>) Label. (<b>c</b>) U<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>Net. (<b>d</b>) DeepLabV3+. (<b>e</b>) BoundaryNet. (<b>f</b>) ST-UNet. (<b>g</b>) BuildFormer. (<b>h</b>) LiteST-Net. (<b>i</b>) Our network.</p> "> Figure 11
<p>Prediction results of different models on the SPARCS dataset. Black and white represent the noncloud and cloud pixels, respectively, green represents cloud pixels that are missed, and red represents that the noncloud pixels are detected as clouds. (<b>a</b>) RGB image. (<b>b</b>) Label. (<b>c</b>) U<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>Net. (<b>d</b>) DeepLabV3+. (<b>e</b>) BoundaryNet. (<b>f</b>) ST-UNet. (<b>g</b>) BuildFormer. (<b>h</b>) LiteST-Net. (<b>i</b>) Our network.</p> "> Figure 12
<p>Prediction results of different models on AIR-CD dataset. Black and white represent the non cloud and cloud pixels, respectively, green represents cloud pixels that are missed, and red represents that the noncloud pixels are detected as clouds. (<b>a</b>) RGB image. (<b>b</b>) Label. (<b>c</b>) U<math display="inline"><semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics></math>Net. (<b>d</b>) DeepLabV3+. (<b>e</b>) BoundaryNet. (<b>f</b>) ST-UNet. (<b>g</b>) BuildFormer. (<b>h</b>) LiteST-Net. (<b>i</b>) Our network.</p> "> Figure 13
<p>Prediction results of the STCCD model and FMask method on L8-Biome dataset. Black and white represent the noncloud and cloud pixels, respectively, green represents cloud pixels that are missed, and red represents that the noncloud pixels are detected as clouds. (<b>a</b>) RGB image. (<b>b</b>) Label. (<b>c</b>) FMask. (<b>d</b>) Our network.</p> "> Figure 14
<p>Prediction results of STCCD model and FMask method on L8-Biome dataset. Black and white represent the noncloud and cloud pixels, respectively. (<b>a</b>) RGB image. (<b>b</b>) Label. (<b>c</b>) FMask. (<b>d</b>) Our network.</p> "> Figure 15
<p>Failure sample of the STCCD model. Black and white represent the noncloud and cloud pixels, respectively. (<b>a</b>) RGB image. (<b>b</b>) Label. (<b>c</b>) FMask. (<b>d</b>) Our network.</p> "> Figure 16
<p>Prediction of STCCD, LiteST-Net, and BoundaryNet for clouds over snow and clouds over bare soil on the SPARCS dataset. Black and white represent the noncloud and cloud pixels, respectively, green represents cloud pixels that are missed, and red represents that the noncloud pixels are detected as clouds. (<b>a</b>) RGB image. (<b>b</b>) Label. (<b>c</b>) Our network. (<b>d</b>) LiteST-Net. (<b>e</b>) BoundaryNet.</p> ">
Abstract
:1. Introduction
2. Methodology
2.1. Convolution Branch
2.2. Swin Transformer Branch
2.3. Feature Coupling Module
2.4. Feature Fusion Module Based on Attention Mechanism
2.5. Aggregation Multiscale Feature Module
2.6. Boundary Refinement Module
3. Experiment
3.1. Datasets
3.1.1. GF1-WHU Dataset
3.1.2. SPRACS Dataset
3.1.3. AIR-CD Dataset
3.1.4. L8-Biome Dataset
3.2. Training Details
3.2.1. Data Processing
3.2.2. Loss Function
3.2.3. Evaluation Metrics
3.3. Ablation Study
3.3.1. Ablation for Swin Transformer Branch
3.3.2. Ablation for FCM
3.3.3. Ablation for FFMAM
3.3.4. Ablation for AMSFM
3.3.5. Ablation for BRM
3.4. Comparison Test of the GF1-WHU Dataset
3.5. Comparison Test of the SPARCS Dataset
3.6. Comparison Test of the AIR-CD Dataset
3.7. Extended Experiment
3.7.1. Crossvalidation
3.7.2. Extend Validation
4. Discussion
4.1. Advantage Analysis
4.2. Limitations and Future Perspectives
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Long, T.; Zhang, Z.; He, G.; Jiao, W.; Tang, C.; Wu, B.; Zhang, X.; Wang, G.; Yin, R. 30 m Resolution Global Annual Burned Area Mapping Based on Landsat Images and Google Earth Engine. Remote Sens. 2019, 11, 489. [Google Scholar] [CrossRef]
- Yin, R.; He, G.; Jiang, W.; Peng, Y.; Zhang, Z.; Li, M.; Gong, C. Night-Time Light Imagery Reveals China’s City Activity During the COVID-19 Pandemic Period in Early 2020. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5111–5122. [Google Scholar] [CrossRef]
- Kuma, P.; Bender, F.A.M.; Schuddeboom, A.; McDonald, A.J.; Seland, Ø. Machine learning of cloud types shows higher climate sensitivity is associated with lower cloud biases. Atmos. Chem. Phys. Discuss. 2022, 32, 523–549. [Google Scholar] [CrossRef]
- Zheng, X.; Ye, J.; Chen, Y.; Wistar, S.; Li, J.; Piedra-Fernández, J.A.; Steinberg, M.A.; Wang, J.Z. Detecting Comma-shaped Clouds for Severe Weather Forecasting using Shape and Motion. IEEE Trans. Geosci. Remote. Sens. 2019, 57, 3788–3801. [Google Scholar] [CrossRef]
- Ju, J.; Roy, D.P. The availability of cloud-free Landsat ETM+ data over the conterminous United States and globally. Remote Sens. Environ. 2008, 112, 1196–1211. [Google Scholar] [CrossRef]
- Zhu, X.; Helmer, E.H. An automatic method for screening clouds and cloud shadows in optical satellite image time series in cloudy regions. Remote Sens. Environ. 2018, 214, 135–153. [Google Scholar] [CrossRef]
- Qiu, S.; Zhu, Z.; He, B. Fmask 4.0: Improved cloud and cloud shadow detection in Landsats 4–8 and Sentinel-2 imagery. Remote Sens. Environ. 2019, 231, 111205. [Google Scholar] [CrossRef]
- Ge, K.; Liu, J.; Wang, F.; Chen, B.; Hu, Y. A Cloud Detection Method Based on Spectral and Gradient Features for SDGSAT-1 Multispectral Images. Remote Sens. 2022, 15, 24. [Google Scholar] [CrossRef]
- Main-Knorn, M.; Pflug, B.; Louis, J.; Debaecker, V.; Müller-Wilm, U.; Gascon, F. Sen2Cor for Sentinel-2. In Proceedings of the Image and Signal Processing for Remote Sensing XXIII; Bruzzone, L., Bovolo, F., Benediktsson, J.A., Eds.; SPIE: Tokyo, Japen, 2018; p. 3. [Google Scholar] [CrossRef]
- Irish, R.R.; Barker, J.L.; Goward, S.N.; Arvidson, T. Characterization of the Landsat-7 ETM+ Automated Cloud-Cover Assessment (ACCA) Algorithm. Photogramm. Eng. Remote Sens. 2006, 72, 1179–1188. [Google Scholar] [CrossRef]
- Li, Z.; Shen, H.; Li, H.; Xia, G.; Gamba, P.; Zhang, L. Multi-feature combined cloud and cloud shadow detection in GaoFen-1 wide field of view imagery. Remote Sens. Environ. 2017, 191, 342–358. [Google Scholar] [CrossRef]
- Zhai, H.; Zhang, H.; Zhang, L.; Li, P. Cloud/shadow detection based on spectral indices for multi/hyperspectral optical remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2018, 144, 235–253. [Google Scholar] [CrossRef]
- Deng, J.; Wang, H.; Ma, J. An automatic cloud detection algorithm for Landsat remote sensing image. In Proceedings of the 2016 4th International Workshop on Earth Observation and Remote Sensing Applications (EORSA), Guangzhou, China, 4–6 July 2016; pp. 395–399. [Google Scholar] [CrossRef]
- Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
- Bai, T.; Li, D.; Sun, K.; Chen, Y.; Li, W. Cloud Detection for High-Resolution Satellite Imagery Using Machine Learning and Multi-Feature Fusion. Remote Sens. 2016, 8, 715. [Google Scholar] [CrossRef]
- Zi, Y.; Xie, F.; Jiang, Z. A Cloud Detection Method for Landsat 8 Images Based on PCANet. Remote Sens. 2018, 10, 877. [Google Scholar] [CrossRef]
- Yang, L.; Zhuo, W.; Qi, L.; Shi, Y.; Gao, Y. ST++: Make Self-trainingWork Better for Semi-supervised Semantic Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 4258–4267. [Google Scholar] [CrossRef]
- Cao, R.; Fang, L.; Lu, T.; He, N. Self-Attention-Based Deep Feature Fusion for Remote Sensing Scene Classification. IEEE Geosci. Remote Sens. Lett. 2021, 18, 43–47. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
- Mountrakis, G.; Li, J.; Lu, X.; Hellwich, O. Deep learning for remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2018, 145, 1–2. [Google Scholar] [CrossRef]
- Yin, R.; He, G.; Wang, G.; Long, T.; Li, H.; Zhou, D.; Gong, C. Automatic Framework of Mapping Impervious Surface Growth With Long-Term Landsat Imagery Based on Temporal Deep Learning Model. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Li, J.; Wu, Z.; Sheng, Q.; Wang, B.; Hu, Z.; Zheng, S.; Camps-Valls, G.; Molinier, M. A hybrid generative adversarial network for weakly-supervised cloud detection in multispectral images. Remote Sens. Environ. 2022, 280, 113197. [Google Scholar] [CrossRef]
- Liu, C.C.; Zhang, Y.C.; Chen, P.Y.; Lai, C.C.; Chen, Y.H.; Cheng, J.H.; Ko, M.H. Clouds Classification from Sentinel-2 Imagery with Deep Residual Learning and Semantic Image Segmentation. Remote Sens. 2019, 11, 119. [Google Scholar] [CrossRef]
- Yin, M.; Wang, P.; Ni, C.; Hao, W. Cloud and snow detection of remote sensing images based on improved Unet3+. Sci. Rep. 2022, 12, 14415. [Google Scholar] [CrossRef] [PubMed]
- Wu, K.; Xu, Z.; Lyu, X.; Ren, P. Cloud detection with boundary nets. ISPRS J. Photogramm. Remote Sens. 2022, 186, 218–231. [Google Scholar] [CrossRef]
- Mazza, A.; Sepe, P.; Poggi, G.; Scarpa, G. Cloud Segmentation of Sentinel-2 Images Using Convolutional Neural Network with Domain Adaptation. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 7236–7239. [Google Scholar] [CrossRef]
- Pang, S.; Sun, L.; Tian, Y.; Ma, Y.; Wei, J. Convolutional Neural Network-Driven Improvements in Global Cloud Detection for Landsat 8 and Transfer Learning on Sentinel-2 Imagery. Remote Sens. 2023, 15, 1706. [Google Scholar] [CrossRef]
- Zhang, C.; Weng, L.; Ding, L.; Xia, M.; Lin, H. CRSNet: Cloud and Cloud Shadow Refinement Segmentation Networks for Remote Sensing Imagery. Remote Sens. 2023, 15, 1664. [Google Scholar] [CrossRef]
- Chen, K.; Xia, M.; Lin, H.; Qian, M. Multi-scale Attention Feature Aggregation Network for Cloud and Cloud Shadow Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5612216. [Google Scholar] [CrossRef]
- Guo, J.; Yang, J.; Yue, H.; Liu, X.; Li, K. Unsupervised Domain-Invariant Feature Learning for Cloud Detection of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5405715. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. arXiv 2019, arXiv:1809.02983. [Google Scholar]
- Huang, Z.; Wang, X.; Wei, Y.; Huang, L.; Shi, H.; Liu, W.; Huang, T.S. CCNet: Criss-Cross Attention for Semantic Segmentation. arXiv 2020, arXiv:1811.11721. [Google Scholar]
- Zhang, G.; Gao, X.; Yang, Y.; Wang, M.; Ran, S. Controllably Deep Supervision and Multi-Scale Feature Fusion Network for Cloud and Snow Detection Based on Medium- and High-Resolution Imagery Dataset. Remote Sens. 2021, 13, 4805. [Google Scholar] [CrossRef]
- Wang, Y.; Gu, L.; Li, X.; Gao, F.; Jiang, T. Coexisting Cloud and Snow Detection based on a Hybrid Features Network applied to Remote Sensing Images. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 5405515. [Google Scholar] [CrossRef]
- Zhao, C.; Zhang, X.; Kuang, N.; Luo, H.; Zhong, S.; Fan, J. Boundary-Aware Bilateral Fusion Network for Cloud Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
- Hu, K.; Zhang, D.; Xia, M. CDUNet: Cloud Detection UNet for Remote Sensing Imagery. Remote Sens. 2021, 13, 4533. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Azad, R.; Heidari, M.; Shariatnia, M.; Aghdam, E.K.; Karimijafarbigloo, S.; Adeli, E.; Merhof, D. TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation. arXiv 2022, arXiv:2208.00713. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv 2021, arXiv:2105.05537. [Google Scholar]
- Gulati, A.; Qin, J.; Chiu, C.C.; Parmar, N.; Zhang, Y.; Yu, J.; Han, W.; Wang, S.; Zhang, Z.; Wu, Y.; et al. Conformer: Convolution-augmented Transformer for Speech Recognition. arXiv 2020, arXiv:2005.08100. [Google Scholar]
- Feng, D.; Zhang, Z.; Yan, K. A Semantic Segmentation Method for Remote Sensing Images Based on the Swin Transformer Fusion Gabor Filter. IEEE Access 2022, 10, 77432–77451. [Google Scholar] [CrossRef]
- Chen, H.; Qi, Z.; Shi, Z. Remote Sensing Image Change Detection With Transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408715. [Google Scholar] [CrossRef]
- Wang, L.; Fang, S.; Meng, X.; Li, R. Building Extraction With Vision Transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5625711. [Google Scholar] [CrossRef]
- Yuan, W.; Zhang, X.; Shi, J.; Wang, J. LiteST-Net: A Hybrid Model of Lite Swin Transformer and Convolution for Building Extraction from Remote Sensing Image. Remote Sens. 2023, 15, 1996. [Google Scholar] [CrossRef]
- Alrfou, K.; Zhao, T.; Kordijazi, A. Transfer Learning for Microstructure Segmentation with CS-UNet: A Hybrid Algorithm with Transformer and CNN Encoders. arXiv 2023, arXiv:2308.13917. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-Attention with Relative Position Representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
- Ma, H.; Yang, H.; Huang, D. Boundary Guided Context Aggregation for Semantic Segmentation. arXiv 2021, arXiv:2110.14587. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv 2020, arXiv:1910.03151. [Google Scholar]
- Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. arXiv 2017, arXiv:1701.04128. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv 2016, arXiv:1511.07122. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2017, arXiv:1612.01105. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar] [CrossRef]
- Zhu, Z.; Liu, G.; Hui, G.; Guo, X.; Cao, Y.; Wu, H.; Liu, T.; Tian, G. Semantic Segmentation of FOD Using an Improved Deeplab V3+ Model. In Proceedings of the 2022 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Baishan, China, 27–31 July 2022; pp. 791–796. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 833–851. [Google Scholar] [CrossRef]
- Su, J.; Li, J.; Zhang, Y.; Xia, C.; Tian, Y. Selectivity or Invariance: Boundary-Aware Salient Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3798–3807. [Google Scholar] [CrossRef]
- Hughes, M.; Hayes, D. Automated Detection of Cloud and Cloud Shadow in Single-Date Landsat Imagery Using Neural Networks and Spatial Post-Processing. Remote Sens. 2014, 6, 4907–4926. [Google Scholar] [CrossRef]
- Hughes, M. L8 SPARCS Cloud Validation Masks; US Geological Survey: Sioux Falls, SD, USA, 2016.
- He, Q.; Sun, X.; Yan, Z.; Fu, K. DABNet: Deformable Contextual and Boundary-Weighted Network for Cloud Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
- Foga, S.; Scaramuzza, P.L.; Guo, S.; Zhu, Z.; Dilley, R.D.; Beckmann, T.; Schmidt, G.L.; Dwyer, J.L.; Joseph Hughes, M.; Laue, B. Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sens. Environ. 2017, 194, 379–390. [Google Scholar] [CrossRef]
- USGS. Landsat 8 Cloud Cover Assessment Validation Data; USGS: Reston, VA, USA, 2016.
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4 December 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
- De Boer, P.T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A Tutorial on the Cross-Entropy Method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Mattyus, G.; Luo, W.; Urtasun, R. DeepRoadMapper: Extracting Road Topology from Aerial Images. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy; 2017; pp. 3458–3466. [Google Scholar] [CrossRef]
- Navab, N.; Hornegger, J.; Wells, W.M.; Frangi, A.F. (Eds.) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351. [Google Scholar] [CrossRef]
- Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
- Landsat 8 (L8) Data Users Handbook. Available online: https://www.usgs.gov/landsat-missions/landsat-8-data-users-handbook (accessed on 15 September 2023).
Dataset Name | Number | Resolution | Size | References |
---|---|---|---|---|
GF1-WHU | 76 | 16 m | Li et al., 2017 [11] | |
SPARCS | 80 | 30 m | Hughes and Hayes, 2014; USGS., 2016c [60,61] | |
AIR-CD | 34 | 4 m | He et al., 2021 [62] | |
L8-Biome | 96 | 30 m | Foga et al., 2017; USGS., 2016b [63,64] |
Method | MIoU (%) | F1_cloud (%) |
---|---|---|
Base | 91.14 | 91.32 |
Base+SwinTransformer | 91.52 | 91.59 |
Base+SwinTransformer+FCM | 91.67 | 91.73 |
Base+SwinTransformer+FFMAM | 91.70 | 91.79 |
Base+SwinTransformer+FCM+FFMAM | 91.79 | 92.08 |
Base+SwinTransformer+FCM+FFMAM+AMSFM | 91.90 | 92.42 |
Base+SwinTransformer+FCM+FFMAM+AMSFM+BRM | 91.96 | 92.45 |
Method | OA (%) | MIoU (%) | F1_cloud (%) |
---|---|---|---|
DeepLabv3+ | 97.88 | 90.83 | 91.06 |
UNet | 97.92 | 90.90 | 91.04 |
UNet | 97.91 | 91.19 | 91.54 |
BoundaryNet | 97.92 | 91.13 | 91.76 |
Swin-Unet | 97.72 | 90.34 | 90.54 |
ST-UNet | 97.66 | 90.70 | 91.30 |
BuildFormer | 97.13 | 90.15 | 91.15 |
LiteST-Net | 97.61 | 91.07 | 91.52 |
Our network | 98.06 | 91.96 | 92.45 |
Method | OA (%) | MIoU (%) | F1_cloud (%) |
---|---|---|---|
DeepLabv3+ | 96.89 | 87.40 | 87.09 |
UNet | 97.23 | 88.31 | 87.95 |
UNet | 97.14 | 89.08 | 89.39 |
BoundaryNet | 97.85 | 91.22 | 91.65 |
Swin-Unet | 96.09 | 86.21 | 86.75 |
ST-UNet | 96.14 | 88.26 | 90.08 |
BuildFormer | 96.91 | 89.44 | 90.67 |
LiteST-Net | 96.79 | 89.87 | 91.16 |
STCCD | 97.78 | 91.48 | 92.20 |
Method | OA (%) | MIoU (%) | F1_cloud (%) |
---|---|---|---|
DeepLabv3+ | 97.50 | 87.82 | 84.62 |
UNet | 96.76 | 86.96 | 85.38 |
UNet | 97.16 | 87.15 | 84.31 |
BoundaryNet | 97.12 | 89.18 | 88.81 |
Swin-Unet | 96.89 | 87.43 | 85.67 |
ST-UNet | 96.32 | 87.29 | 87.08 |
BuildFormer | 96.54 | 86.72 | 86.36 |
LiteST-Net | 97.21 | 88.25 | 85.81 |
Our network | 97.59 | 90.47 | 90.12 |
Method | OA (%) | MIoU (%) | F1_cloud (%) |
---|---|---|---|
BoundaryNet | |||
LiteST-Net | |||
Our network |
Method | OA(%) | MIoU (%) | F1_cloud (%) |
---|---|---|---|
FMask | 91.19 | 75.65 | 77.54 |
Our network | 92.07 | 76.37 | 80.62 |
Method | F1_cloud(%) (Snow) | F1_cloud(%) (Bare Soil) |
---|---|---|
BoundaryNet | 81.89 | 88.90 |
LiteST-Net | 83.40 | 87.60 |
Our network | 86.41 | 90.45 |
Method | Parameters (MB) | FPS |
---|---|---|
DeepLabv3+ | 17.59 | 405 |
UNet | 31.04 | 224 |
UNet | 44.01 | 160 |
BoundaryNet | 53.32 | 86 |
Swin-Unet | 27.17 | 378 |
ST-UNet | 168.8 | 23 |
BuildFormer | 40.52 | 140 |
LiteST-Net | 18.03 | 55 |
Our network | 164.71 | 63 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, C.; Long, T.; Yin, R.; Jiao, W.; Wang, G. A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection. Remote Sens. 2023, 15, 5264. https://doi.org/10.3390/rs15215264
Gong C, Long T, Yin R, Jiao W, Wang G. A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection. Remote Sensing. 2023; 15(21):5264. https://doi.org/10.3390/rs15215264
Chicago/Turabian StyleGong, Chengjuan, Tengfei Long, Ranyu Yin, Weili Jiao, and Guizhou Wang. 2023. "A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection" Remote Sensing 15, no. 21: 5264. https://doi.org/10.3390/rs15215264
APA StyleGong, C., Long, T., Yin, R., Jiao, W., & Wang, G. (2023). A Hybrid Algorithm with Swin Transformer and Convolution for Cloud Detection. Remote Sensing, 15(21), 5264. https://doi.org/10.3390/rs15215264