A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data
"> Figure 1
<p>SegNet architecture for building extraction of EO data [<a href="#B23-remotesensing-12-03764" class="html-bibr">23</a>,<a href="#B33-remotesensing-12-03764" class="html-bibr">33</a>].</p> "> Figure 2
<p>An overview of the hybrid attention-aware fusion network (HAFNet).</p> "> Figure 3
<p>Att-MFBlock: attention-aware multimodal Fusion Block. Different colors of the vector elements indicate the different values of the channel-wise statistics or learned weights of each modality.</p> "> Figure 4
<p>Plots showing the overall accuracy (OA) and the loss of the proposed MAFNet using different learning rates. (<b>a</b>) <math display="inline"><semantics> <mrow> <mo stretchy="false">(</mo> <msub> <mi>l</mi> <mrow> <mi>e</mi> <mo>_</mo> <mi>R</mi> <mi>G</mi> <mi>B</mi> </mrow> </msub> <mo>,</mo> <mtext> </mtext> <msub> <mi>l</mi> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math> = (0.1, 0.1), (<b>b</b>) <math display="inline"><semantics> <mrow> <mo stretchy="false">(</mo> <msub> <mi>l</mi> <mrow> <mi>e</mi> <mo>_</mo> <mi>R</mi> <mi>G</mi> <mi>B</mi> </mrow> </msub> <mo>,</mo> <mtext> </mtext> <msub> <mi>l</mi> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math> = (0.01, 0.01), (<b>c</b>) <math display="inline"><semantics> <mrow> <mo stretchy="false">(</mo> <msub> <mi>l</mi> <mrow> <mi>e</mi> <mo>_</mo> <mi>R</mi> <mi>G</mi> <mi>B</mi> </mrow> </msub> <mo>,</mo> <mtext> </mtext> <msub> <mi>l</mi> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math> = (0.001, 0.001), (<b>d</b>) <math display="inline"><semantics> <mrow> <mo stretchy="false">(</mo> <msub> <mi>l</mi> <mrow> <mi>e</mi> <mo>_</mo> <mi>R</mi> <mi>G</mi> <mi>B</mi> </mrow> </msub> <mo>,</mo> <mtext> </mtext> <msub> <mi>l</mi> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math> = (0.01, 0.1), (<b>e</b>) <math display="inline"><semantics> <mrow> <mo stretchy="false">(</mo> <msub> <mi>l</mi> <mrow> <mi>e</mi> <mo>_</mo> <mi>R</mi> <mi>G</mi> <mi>B</mi> </mrow> </msub> <mo>,</mo> <mtext> </mtext> <msub> <mi>l</mi> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math> = (0.001, 0.01), and (<b>f</b>) <math display="inline"><semantics> <mrow> <mo stretchy="false">(</mo> <msub> <mi>l</mi> <mrow> <mi>e</mi> <mo>_</mo> <mi>R</mi> <mi>G</mi> <mi>B</mi> </mrow> </msub> <mo>,</mo> <mtext> </mtext> <msub> <mi>l</mi> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> </mrow> </msub> <mo stretchy="false">)</mo> </mrow> </semantics></math> = (0.0001, 0.001).</p> "> Figure 5
<p>Plots showing the accuracy (including OA, F1 score, and IoU) and loss of the proposed MAFNet for training the Potsdam dataset (<b>a</b>) and Vaihingen dataset (<b>b</b>).</p> "> Figure 6
<p>Visualized features in different layers and output predictions of RGB-specific, DSM-specific, and cross-modal streams. Feat_<span class="html-italic">n</span> refers to the feature map derived from the <span class="html-italic">n</span>-th convolutional block.</p> "> Figure 7
<p>Visualized cross-modal features in different layers with/without an attention module.</p> "> Figure 8
<p>Predictions from different streams and the fused results using different decision-level fusion methods. (<b>a</b>) GT, (<b>b</b>) DSM prediction, (<b>c</b>) RGB prediction, (<b>d</b>) cross-modal prediction, (<b>e</b>) RGB image, (<b>f</b>) majority-voting, (<b>g</b>) averaging, and (<b>h</b>) Att-MFBlock.</p> "> Figure 9
<p>Building extraction results using FuseNet, V-FuseNet, RC-SegNet, and HAFNet.</p> "> Figure 10
<p>Building extraction results using SegNet, FCN-8s, U-Net, and DeepLab v3+ based MAFNet.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Fully Convolutional Networks (FCNs) for Semantic Labeling
2.2. Attention Mechanism
3. Hybrid Attention-Aware Fusion Network (HAFNet)
3.1. Network Architecture
3.2. Attention-Aware Multimodal Fusion Block
- (1)
- Channel-wise global pooling
- (2)
- Modality and channel-wise relationship modeling
4. Experiment Design
4.1. Datasets
4.2. Accuracy Assessment
4.3. Method Implementation
5. Results and Discussions
5.1. Effect of the Hybrid Fusion Architecture
5.2. Effect of Att-MFBlock
5.3. Comparisons to Other Classical Fusion Networks
5.4. Comparisons between Different Basic Networks of the MAFNet
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Huang, J.; Zhang, X.; Xin, Q.; Sun, Y.; Zhang, P. Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network. ISPRS J. Photogramm. Remote Sens. 2019, 151, 91–105. [Google Scholar] [CrossRef]
- Zhang, K.; Yan, J.; Chen, S.C. Automatic Construction of Building Footprints From Airborne LIDAR Data. IEEE Trans. Geosci. Remote Sens. 2006, 44, 2523–2533. [Google Scholar] [CrossRef] [Green Version]
- Zhou, G.; Zhou, X. Seamless fusion of LiDAR and aerial imagery for building extraction. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7393–7407. [Google Scholar] [CrossRef]
- Dalponte, M.; Bruzzone, L.; Gianelle, D. Fusion of Hyperspectral and LIDAR Remote Sensing Data for Classification of Complex Forest Areas. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1416–1427. [Google Scholar] [CrossRef] [Green Version]
- Lee, D.S.; Shan, J. Combining Lidar Elevation Data and IKONOS Multispectral Imagery for Coastal Classification Mapping. Mar. Geod. 2003, 26, 117–127. [Google Scholar] [CrossRef]
- Chen, Y.; Li, C.; Ghamisi, P.; Jia, X.; Gu, Y. Deep Fusion of Remote Sensing Data for Accurate Classification. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1253–1257. [Google Scholar] [CrossRef]
- Karsli, F.; Dihkan, M.; Acar, H.; Ozturk, A. Automatic building extraction from very high-resolution image and LiDAR data with SVM algorithm. Arabian J. Geosci. 2016, 9. [Google Scholar] [CrossRef]
- Zarea, A.; Mohammadzadeh, A. A Novel Building and Tree Detection Method From LiDAR Data and Aerial Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 1864–1875. [Google Scholar] [CrossRef]
- Du, P.; Bai, X.; Tan, K.; Xue, Z.; Samat, A.; Xia, J.; Li, E.; Su, H.; Liu, W. Advances of Four Machine Learning Methods for Spatial Data Handling: A Review. J. Geovis. Spat. Anal. 2020, 4. [Google Scholar] [CrossRef]
- Li, E.; Xia, J.; Du, P.; Lin, C.; Samat, A. Integrating Multilayer Features of Convolutional Neural Networks for Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5653–5665. [Google Scholar] [CrossRef]
- Zhong, Y.; Zhu, Q.; Zhang, L. Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6207–6222. [Google Scholar] [CrossRef]
- Zou, Q.; Ni, L.; Zhang, T.; Wang, Q. Deep Learning Based Feature Selection for Remote Sensing Scene Classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2321–2325. [Google Scholar] [CrossRef]
- Ienco, D.; Interdonato, R.; Gaetano, R.; Minh, H.T.D. Combining Sentinel-1 and Sentinel-2 Satellite Image Time Series for land cover mapping via a multi-source deep learning architecture. ISPRS J. Photogramm. Remote Sens. 2019, 158, 11–22. [Google Scholar] [CrossRef]
- Storie, C.D.; Henry, C.J. Deep Learning Neural Networks for Land Use Land Cover Mapping. In Proceedings of the Igarss 2018—2018 Ieee International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: New York, NY, USA, 2018; pp. 3445–3448. [Google Scholar]
- Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
- Pan, X.; Yang, F.; Gao, L.; Chen, Z.; Zhang, B.; Fan, H.; Ren, J. Building Extraction from High-Resolution Aerial Imagery Using a Generative Adversarial Network with Spatial and Channel Attention Mechanisms. Remote Sens. 2019, 11, 917. [Google Scholar] [CrossRef] [Green Version]
- Sun, G.; Huang, H.; Zhang, A.; Li, F.; Zhao, H.; Fu, H. Fusion of Multiscale Convolutional Neural Networks for Building Extraction in Very High-Resolution Images. Remote Sens. 2019, 11, 227. [Google Scholar] [CrossRef] [Green Version]
- Du, L.; You, X.; Li, K.; Meng, L.; Cheng, G.; Xiong, L.; Wang, G. Multi-modal deep learning for landform recognition. J. Photogramm. Remote Sens. 2019, 158, 63–75. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651. [Google Scholar]
- Xin, J.; Zhang, X.; Zhang, Z.; Fang, W. Road Extraction of High-Resolution Remote Sensing Images Derived from DenseUNet. Remote Sens. 2019, 11, 2499. [Google Scholar] [CrossRef] [Green Version]
- Yang, H.; Wu, P.; Yao, X.; Wu, Y.; Wang, B.; Xu, Y. Building Extraction in Very High Resolution Imagery by Dense-Attention Networks. Remote Sens. 2018, 10, 1768. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Yang, M.; Xie, M.; Guo, Z.; Li, E.; Zhang, L.; Pei, T.; Wang, D. Accurate Building Extraction from Fused DSM and UAV Images Using a Chain Fully Convolutional Neural Network. Remote Sens. 2019, 11, 2912. [Google Scholar] [CrossRef] [Green Version]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS J. Photogramm. Remote Sens. 2018, 140, 20–32. [Google Scholar] [CrossRef] [Green Version]
- Sun, Y.; Zhang, X.; Xin, Q.; Huang, J. Developing a multi-filter convolutional neural network for semantic segmentation using high-resolution aerial imagery and LiDAR data. ISPRS J. Photogramm. Remote Sens. 2018, 143, 3–14. [Google Scholar] [CrossRef]
- Xu, Y.; Du, B.; Zhang, L. Multi-source remote sensing data classification via fully convolutional networks and post-classification processing. In Proceedings of the IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 3852–3855. [Google Scholar]
- Hazirbas, C.; Ma, L.; Domokos, C.; Cremers, D. FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture. In Computer Vision—Accv 2016, Pt I; Lai, S.H., Lepetit, V., Nishino, K., Sato, Y., Eds.; Springer: Cham, Switzerland, 2017; Volume 10111, pp. 213–228. [Google Scholar]
- Zhang, W.; Huang, H.; Schmitz, M.; Sun, X.; Wang, H.; Mayer, H. Effective Fusion of Multi-Modal Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling. Remote Sens. 2018, 10, 52. [Google Scholar] [CrossRef] [Green Version]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef] [Green Version]
- Marcos, D.; Hamid, R.; Tuia, D. Geospatial Correspondences for Multimodal Registration. In Proceedings of the 2016 Ieee Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 5091–5100. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 Ieee/Cvf Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 7132–7141. [Google Scholar] [CrossRef] [Green Version]
- Chen, H.; Li, Y. Three-stream Attention-aware Network for RGB-D Salient Object Detection. IEEE Trans. Image Process. 2019. [Google Scholar] [CrossRef]
- Mohla, S.; Pande, S.; Banerjee, B.; Chaudhuri, S. FusAtNet: Dual Attention based SpectroSpatial Multimodal Fusion Network for Hyperspectral and LiDAR Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 92–93. [Google Scholar]
- Badrinarayanan, V.; Handa, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling. arXiv 2015, arXiv:1505.07293. [Google Scholar]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 180–196. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention, Pt III; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing Ag: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Guo, M.; Liu, H.; Xu, Y.; Huang, Y. Building Extraction Based on U-Net with an Attention Block and Multiple Losses. Remote Sens. 2020, 12, 1400. [Google Scholar] [CrossRef]
- Wagner, F.H.; Dalagnol, R.; Tarabalka, Y.; Segantine, T.Y.; Thomé, R.; Hirye, M. U-Net-Id, an Instance Segmentation Model for Building Extraction from Satellite Images—Case Study in the Joanópolis City, Brazil. Remote Sens. 2020, 12, 1544. [Google Scholar] [CrossRef]
- Lin, Y.; Xu, D.; Wang, N.; Shi, Z.; Chen, Q. Road Extraction from Very-High-Resolution Remote Sensing Images via a Nested SE-Deeplab Model. Remote Sens. 2020, 12, 2985. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Schlemper, J.; Oktay, O.; Schaap, M.; Heinrich, M.; Kainz, B.; Glocker, B.; Rueckert, D. Attention gated networks: Learning to leverage salient regions in medical images. Med. Image Anal. 2019, 53, 197–207. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Mit Press: Cambridge, MA, USA, 1996; pp. 5998–6008. [Google Scholar]
- Chen, L.; Zhang, H.; Xiao, J.; Nie, L.; Shao, J.; Liu, W.; Chua, T.-S. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5659–5667. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Lin, G.; Shen, C.; van den Hengel, A.; Reid, I. Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3194–3203. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Yuan, Y.; Wang, J. Ocnet: Object context network for scene parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
- Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 267–283. [Google Scholar]
- Jin, Y.; Xu, W.; Hu, Z.; Jia, H.; Luo, X.; Shao, D. GSCA-UNet: Towards Automatic Shadow Detection in Urban Aerial Imagery with Global-Spatial-Context Attention Module. Remote Sens. 2020, 12, 2864. [Google Scholar] [CrossRef]
- Tian, Z.; Zhan, R.; Hu, J.; Wang, W.; He, Z.; Zhuang, Z. Generating Anchor Boxes Based on Attention Mechanism for Object Detection in Remote Sensing Images. Remote Sens. 2020, 12, 2416. [Google Scholar] [CrossRef]
- Li, L.; Liang, P.; Ma, J.; Jiao, L.; Guo, X.; Liu, F.; Sun, C. A Multiscale Self-Adaptive Attention Network for Remote Sensing Scene Classification. Remote Sens. 2020, 12, 2209. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Hazirbas, C.; Ma, L.; Domokos, C.; Cremers, D. Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 213–228. [Google Scholar]
Fusion Level | Architectures | Advantages | Limitations |
---|---|---|---|
Data-level | Relatively simple and easily implemented | Ignore the qualitative distinction of different modalities; cannot be initialized with the pre-trained CNN | |
Feature-level | Be capable of learning cross-modal features | Fail to fully exploit individual modal features | |
Decision-level | Perform well on learning individual modal features | Lack of sufficient learning of cross-modal features |
OA (%) | F1 Score (%) | IoU (%) | |||
---|---|---|---|---|---|
1 | 0.1 | 0.1 | 90.15 | 92.51 | 74.85 |
10 | 0.01 | 0.1 | 92.33 | 94.03 | 80.65 |
1 | 0.01 | 0.01 | 96.90 | 97.61 | 91.54 |
10 | 0.001 | 0.01 | 97.15 | 97.79 | 92.27 |
1 | 0.001 | 0.001 | 96.60 | 97.39 | 90.71 |
10 | 0.0001 | 0.001 | 96.83 | 97.55 | 91.35 |
Streams | Potsdam | Vaihingen | ||||
---|---|---|---|---|---|---|
OA (%) | F1 Score (%) | IoU (%) | OA (%) | F1 Score (%) | IoU (%) | |
RGB | 96.06 | 97.43 | 88.78 | 96.11 | 97.40 | 86.12 |
DSM | 93.88 | 96.06 | 86.85 | 92.17 | 94.83 | 82.48 |
Cross-modal | 97.60 | 98.43 | 89.65 | 96.60 | 97.71 | 86.89 |
RGB + DSM | 96.87 | 97.95 | 89.12 | 96.32 | 97.58 | 86.67 |
RGB + DSM + cross-modal | 97.96 | 98.78 | 90.10 | 97.04 | 98.17 | 87.32 |
Attention | Potsdam | Vaihingen | ||||
---|---|---|---|---|---|---|
OA (%) | F1 Score (%) | IoU (%) | OA (%) | F1 Score (%) | IoU (%) | |
√ | 97.96 | 98.78 | 90.10 | 97.04 | 98.17 | 87.32 |
× | 96.56 | 97.48 | 89.36 | 96.32 | 97.46 | 86.44 |
Decision-Level Fusion Methods | Potsdam | Vaihingen | ||||
---|---|---|---|---|---|---|
OA (%) | F1 score (%) | IoU (%) | OA (%) | F1 Score (%) | IoU (%) | |
Averaging | 97.43 | 98.32 | 89.44 | 96.48 | 97.65 | 86.79 |
Majority-voting | 97.44 | 98.32 | 89.42 | 96.48 | 97.64 | 86.83 |
Att-MFBlock | 97.96 | 98.78 | 90.10 | 97.04 | 98.17 | 87.32 |
Fusion Network | Potsdam | Vaihingen | ||||
---|---|---|---|---|---|---|
OA (%) | F1 Score (%) | IoU (%) | OA (%) | F1 Score (%) | IoU (%) | |
FuseNet | 97.39 | 98.29 | 89.32 | 96.12 | 97.41 | 86.45 |
V-FuseNet | 97.50 | 98.36 | 89.78 | 96.62 | 97.74 | 86.73 |
RC-SegNet | 97.42 | 98.31 | 89.63 | 96.71 | 97.79 | 86.92 |
HAFNet | 97.96 | 98.78 | 90.10 | 97.04 | 98.17 | 87.32 |
Basic Networks | Potsdam | Vaihingen | ||||
---|---|---|---|---|---|---|
OA (%) | F1 Score (%) | IoU | OA (%) | F1 Score (%) | IoU | |
SegNet | 97.96 | 98.78 | 90.10 | 97.04 | 98.17 | 87.32 |
FCN-8s | 97.29 | 98.21 | 89.36 | 95.99 | 92.25 | 86.21 |
U-Net | 97.83 | 98.68 | 90.17 | 96.92 | 98.07 | 87.14 |
DeepLab v3+ | 97.60 | 98.42 | 90.49 | 96.10 | 97.38 | 86.45 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, P.; Du, P.; Lin, C.; Wang, X.; Li, E.; Xue, Z.; Bai, X. A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data. Remote Sens. 2020, 12, 3764. https://doi.org/10.3390/rs12223764
Zhang P, Du P, Lin C, Wang X, Li E, Xue Z, Bai X. A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data. Remote Sensing. 2020; 12(22):3764. https://doi.org/10.3390/rs12223764
Chicago/Turabian StyleZhang, Peng, Peijun Du, Cong Lin, Xin Wang, Erzhu Li, Zhaohui Xue, and Xuyu Bai. 2020. "A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data" Remote Sensing 12, no. 22: 3764. https://doi.org/10.3390/rs12223764
APA StyleZhang, P., Du, P., Lin, C., Wang, X., Li, E., Xue, Z., & Bai, X. (2020). A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data. Remote Sensing, 12(22), 3764. https://doi.org/10.3390/rs12223764