EU-Net: An Efficient Fully Convolutional Network for Building Extraction from Optical Remote Sensing Images
"> Figure 1
<p>Architecture of the proposed EU-Net, which consists of three parts: encoder, DSPP and decoder.</p> "> Figure 2
<p>A comparison of which features on the input image can be used by a single pixel on the output image when a different size convolution kernel is used at the end of the DSPP. (<b>a</b>) Use a 1*1 convolution. (<b>b</b>) Use a 3*3 convolution.</p> "> Figure 3
<p>An example of the WHU dataset. (<b>a</b>) Original image. (<b>b</b>) Ground truth label.</p> "> Figure 4
<p>An example of the Massachusetts dataset. (<b>a</b>) Original image; (<b>b</b>) Ground truth label.</p> "> Figure 5
<p>An example of the Inria dataset. (<b>a</b>) Original image; (<b>b</b>) Ground truth label.</p> "> Figure 6
<p>The form of normalized confusion matrix.</p> "> Figure 7
<p>An example of the contour label extracted from the mask label. (<b>a</b>) Original image. (<b>b</b>) Mask label. (<b>c</b>) Contour label.</p> "> Figure 8
<p>The normalized confusion matrix of EU-Net on WHU dataset.</p> "> Figure 9
<p>Examples of building extraction results produced by four models on the WHU dataset. The first three rows are examples of oversized building, medium-sized buildings and small-sized buildings, respectively. The last row is an example which has no buildings. Columns 2-6 are the ground truth labels and prediction maps from SRI-Net, FastFCN, DeepLabv3+, and EU-Net, respectively.</p> "> Figure 10
<p>The normalized confusion matrix of EU-Net on Massachusetts dataset.</p> "> Figure 11
<p>Examples of building extraction results produced by four models on the Massachusetts dataset. The even rows are the enlargements of the red box selected areas in the odd rows. The red box selected areas in the odd rows are error label examples. Columns 2–6 are the ground truth labels and prediction maps from JointNet, FastFCN, DeepLabv3+, and EU-Net, respectively.</p> "> Figure 12
<p>The normalized confusion matrix of EU-Net on Inria dataset.</p> "> Figure 13
<p>Examples of building extraction results produced by three models on the Inria dataset. Five scenarios from top to bottom are chosen from Austin, Chicago, Kitsap, Tyrol and Vienna. Columns 2–5 are the ground truth labels and prediction maps from FastFCN, DeepLabv3+, and EU-Net, respectively.</p> "> Figure 14
<p>The enlargements of red box selected areas in <a href="#remotesensing-11-02813-f013" class="html-fig">Figure 13</a>. Five scenarios from top to bottom are Austin, Chicago, Kitsap, Tyrol and Vienna. Columns 2–5 are the ground truth labels and prediction maps from FastFCN, DeepLabv3+, and EU-Net, respectively. The red box selected areas have error labels and correct predictions. The yellow box selected area has correct label and error prediction.</p> "> Figure 15
<p>Mask IoU histogram and contour IoU line for different channel ratios of shallow features to deep features used in the short connections.</p> "> Figure 16
<p>Sample comparison of prediction results with or without DSPP. Three scenarios from top to bottom are from WHU, Massachusetts and Inria. (<b>a</b>) Original image. (<b>b</b>) Ground truth label. (<b>c</b>) Prediction of EU-Net. (<b>d</b>) Prediction of EU-Net-simple. The red box selected areas have error labels.</p> "> Figure 17
<p>The EU-Net training loss curves for different learning rate on Massachusetts dataset.</p> "> Figure 18
<p>The loss curves and accuracy curves on training set and validation set for three datasets. (<b>a</b>) WHU dataset. (<b>b</b>) Massachusetts dataset. (<b>c</b>) Inria dataset.</p> ">
Abstract
:1. Introduction
- A simple but efficient model EU-Net is proposed for optical remote sensing image building extraction. It can be trained efficiently with large learning rate and large batch size.
- By applying the dense spatial pyramid pooling (DSPP) structure, multi-scale dense features can be extracted simultaneously from more compact receptive field and then buildings of different scales can be better detected. By using the focal loss in reverse, we reduced the impact of error labels in the datasets on model training, leading to a significant improvement of the accuracy.
- Exhaustive experiments were performed for evaluation and comparison using three public remote sensing building datasets. Compared with the state-of-the-art models on each dataset, the results have demonstrated the universality of the proposed model for building extraction task.
2. Preliminaries
- Standard convolutional layer: The standard convolutional layers are usually used for different purposes with different convolution kernel sizes. For example, convolution with 3*3 kernel is the most common choice for feature extraction and convolution with 1*1 kernel is always used to reintegrate features from different sources in concatenate layer or reduce feature channels. In order to use the spatial context information around the pixel, the convolution kernel is at least a 3*3 convolution kernel. Compared with larger kernel convolution, cascade 3*3 convolutions can get the same receptive field with fewer parameters and introduce more nonlinear functions at the same time. As for 1*1 kernel, the fewest parameters can be used to reduce feature channels.
- ReLU layer: The rectified linear unit (ReLU) [45] is the preferred nonlinear activation function for most neural network models. The function of ReLU is very simple, keeping the positive values while setting negative values to zero, i.e., max(0,x).
- Pooling layer: Pooling is a general option for downsampling feature maps along the spatial dimension. The max-pooling is adopted by most models and we also use it in our model.
- Dilated convolution layer: By adjusting the dilated rate, the dilated convolution can change the receptive field without changing the number of parameters. Therefore, the dilated convolution is used to expand receptive field and simultaneously acquire features of different scales.
- Transposed convolution layer: Transposed convolution is used to recover the resolution of feature maps and implement pixel-to-pixel prediction. Different with the uppooling used in U-Net or SegNet, the transposed convolution is trainable and more flexible.
- Batch normalization layer: Batch normalization (BN) is used to accelerate model training by normalizing layer inputs [46]. By doing this, the internal covariate shift can be suppressed, and much higher learning rate can be used.
- Concatenate layer: Concatenate layer is used to connect feature maps from different sources.
3. Methodology
3.1. Encoder
3.2. DSPP
3.3. Decoder
3.4. Loss Function
4. Experimental Results
4.1. Dataset Description
4.2. Implementation Settings
4.3. Evaluation Metrics
4.4. Comparing Methods
4.5. Comparison with Deep Models
4.5.1. WHU Dataset
4.5.2. Massachusetts Dataset
4.5.3. Inria Dataset
5. Discussion
5.1. Channel Ratio in Short Connection
5.2. Larger Sample Size or Larger Batch Size
5.3. DSPP
5.4. Loss Function
5.5. Learning Rate and Epoch
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A. Derivatives
References
- Liu, Y.; Fan, B.; Wang, L.; Bai, J.; Xiang, S.; Pan, C. Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS J. Photogramm. Remote Sens. 2018, 145, 78–95. [Google Scholar] [CrossRef]
- Mohammadimanesh, F.; Salehi, B.; Mahdianpari, M.; Gill, E.; Molinier, M. A new fully convolutional neural network for semantic segmentation of polarimetric SAR imagery in complex land cover ecosystem. ISPRS J. Photogramm. Remote Sens. 2019, 151, 223–236. [Google Scholar] [CrossRef]
- Kemker, R.; Salvaggio, C.; Kanan, C. Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS J. Photogramm. Remote Sens. 2018, 145, 60–77. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, X.; Xin, Q.; Sun, Y.; Zhang, P. Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network. ISPRS J. Photogramm. Remote Sens. 2019, 151, 91–105. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling; University of Toronto: Toront, ON, Canada, 2013. [Google Scholar]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 3226–3229. [Google Scholar]
- Pan, X.; Yang, F.; Gao, L.; Chen, Z.; Zhang, B.; Fan, H.; Ren, J. Building Extraction from High-Resolution Aerial Imagery Using a Generative Adversarial Network with Spatial and Channel Attention Mechanisms. Remote Sens. 2019, 11, 917. [Google Scholar] [CrossRef]
- Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef]
- Ok, A.O.; Senaras, C.; Yuksel, B. Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery. IEEE Trans. Geosci. Remote Sens. 2012, 51, 1701–1717. [Google Scholar] [CrossRef]
- Huang, X.; Yuan, W.; Li, J.; Zhang, L. A new building extraction postprocessing framework for high-spatial-resolution remote-sensing imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 654–668. [Google Scholar] [CrossRef]
- Chen, R.; Li, X.; Li, J. Object-based features for house detection from RGB high-resolution images. Remote Sens. 2018, 10, 451. [Google Scholar] [CrossRef]
- Senaras, C.; Ozay, M.; Vural, F.T.Y. Building detection with decision fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 1295–1304. [Google Scholar] [CrossRef]
- Saito, S.; Aoki, Y. Building and road detection from large aerial imagery. Proc. SPIE 2015, 9405, 94050K. [Google Scholar]
- Vakalopoulou, M.; Karantzalos, K.; Komodakis, N.; Paragios, N. Building detection in very high resolution multispectral data with deep learning features. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 13–18 July 2015; pp. 1873–1876. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Wu, H.; Zhang, J.; Huang, K.; Liang, K.; Yu, Y. FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation. arXiv 2019, arXiv:1903.11816. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Liu, P.; Liu, X.; Liu, M.; Shi, Q.; Yang, J.; Xu, X.; Zhang, Y. Building Footprint Extraction from High-Resolution Images via Spatial Residual Inception Convolutional Neural Network. Remote Sens. 2019, 11, 830. [Google Scholar] [CrossRef]
- Lin, J.; Jing, W.; Song, H.; Chen, G. ESFNet: Efficient Network for Building Extraction From High-Resolution Aerial Images. IEEE Access 2019, 7, 54285–54294. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, Y. JointNet: A Common Neural Network for Road and Building Extraction. Remote Sens. 2019, 11, 696. [Google Scholar] [CrossRef]
- Li, X.; Yao, X.; Fang, Y. Building-A-Nets: Robust Building Extraction From High-Resolution Remote Sensing Images With Adversarial Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3680–3687. [Google Scholar] [CrossRef]
- Mou, L.; Zhu, X.X. RiFCN: Recurrent network in fully convolutional network for semantic segmentation of high resolution remote sensing images. arXiv 2018, arXiv:1805.02091. [Google Scholar]
- Krähenbühl, P.; Koltun, V. Efficient inference in fully connected crfs with gaussian edge potentials. Adv. Neural Inf. Process. Syst. 2011, 24, 109–117. [Google Scholar]
- Shrestha, S.; Vanneschi, L. Improved fully convolutional network with conditional random fields for building extraction. Remote Sens. 2018, 10, 1135. [Google Scholar] [CrossRef]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv 2015, arXiv:1511.07289. [Google Scholar]
- Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Dalla Mura, M. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
- Goodfellow, I. NIPS 2016 tutorial: Generative adversarial networks. arXiv 2016, arXiv:1701.00160. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Liu, M.Y.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. arXiv 2017, arXiv:1703.00848. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Pan, B.; Shi, Z.; Xu, X. MugNet: Deep learning for hyperspectral image classification using limited samples. ISPRS J. Photogramm. Remote Sens. 2018, 145, 108–119. [Google Scholar] [CrossRef]
- Bittner, K.; Adam, F.; Cui, S.; Körner, M.; Reinartz, P. Building footprint extraction from VHR remote sensing images combined with normalized DSMs using fused fully convolutional networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2615–2629. [Google Scholar] [CrossRef]
- Zhang, Y.; Gong, W.; Sun, J.; Li, W. Web-Net: A Novel Nest Networks with Ultra-Hierarchical Sampling for Building Extraction from Aerial Imageries. Remote Sens. 2019, 11, 1897. [Google Scholar] [CrossRef]
- Wu, G.; Shao, X.; Guo, Z.; Chen, Q.; Yuan, W.; Shi, X.; Xu, Y.; Shibasaki, R. Automatic building segmentation of aerial imagery using multi-constraint fully convolutional networks. Remote Sens. 2018, 10, 407. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. A scale robust convolutional neural network for automatic building extraction from aerial and satellite imagery. Int. J. Remote Sens. 2019, 40, 3308–3322. [Google Scholar] [CrossRef]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS J. Photogramm. Remote Sens. 2018, 140, 20–32. [Google Scholar] [CrossRef]
- Bischke, B.; Helber, P.; Folz, J.; Borth, D.; Dengel, A. Multi-task learning for segmentation of building footprints with deep neural networks. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–29 September 2019; pp. 1480–1484. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Bridle, J.S. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Neurocomputing; Springer: Berlin/Heidelberg, Germany, 1990; pp. 227–236. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–25 June 2010; pp. 807–814. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Marcu, A.; Costea, D.; Slusanschi, E.; Leordeanu, M. A multi-stage multi-task neural network for aerial scene interpretation and geolocalization. arXiv 2018, arXiv:1804.01322. [Google Scholar]
- Ruan, T.; Liu, T.; Huang, Z.; Wei, Y.; Wei, S.; Zhao, Y. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4814–4821. [Google Scholar]
- Lu, T.; Ming, D.; Lin, X.; Hong, Z.; Bai, X.; Fang, J. Detecting building edges from high spatial resolution remote sensing imagery using richer convolution features network. Remote Sens. 2018, 10, 1496. [Google Scholar] [CrossRef]
- Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. arXiv 2016, arXiv:1701.04128. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Han, W.; Feng, R.; Wang, L.; Cheng, Y. A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification. ISPRS J. Photogramm. Remote Sens. 2018, 145, 23–43. [Google Scholar] [CrossRef]
- Kang, J.; Körner, M.; Wang, Y.; Taubenböck, H.; Zhu, X.X. Building instance classification using street view images. ISPRS J. Photogramm. Remote Sens. 2018, 145, 44–59. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Khalel, A.; El-Saban, M. Automatic pixelwise object labeling for aerial imagery using stacked u-nets. arXiv 2018, arXiv:1803.04953. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
Encoders | FLOPs (M) | Parameters (M) |
---|---|---|
VGG16 [16] | 268.52 | 134.27 |
VGG16 (first 13 layers) | 29.42 | 14.71 |
ResNet50 [48] | 76.06 | 38.07 |
ResNet101 [48] | 149.66 | 74.91 |
Xception41 [49] | 62.54 | 31.33 |
Xception65 [49] | 96.94 | 48.57 |
Recall (%) | Precision (%) | F1 (%) | IoU (%) | Images/s | ||
---|---|---|---|---|---|---|
Web-Net [38] (report) | mask | - | - | - | 88.76 | - |
SRI-Net [22] (report) | mask | 93.28 | 95.21 | 94.23 | 89.09 | - |
SRI-Net [22] | mask | 91.92 | 92.75 | 92.33 | 85.75 | 8.51 |
contour | 36.65 | 37.36 | 37.00 | 22.70 | ||
FastFCN [20] | mask | 81.37 | 87.98 | 84.55 | 73.23 | 9.78 |
contour | 20.27 | 16.61 | 18.26 | 10.05 | ||
DeepLabv3+ [49] | mask | 92.99 | 93.11 | 93.05 | 87.00 | 9.44 |
contour | 40.49 | 40.67 | 40.58 | 25.45 | ||
EU-Net | mask | 95.10 | 94.98 | 95.04 | 90.56 | 16.78 |
contour | 48.73 | 49.10 | 48.91 | 32.38 |
Recall (%) | Precision (%) | F1 (%) | IoU (%) | Time (s) | ||
---|---|---|---|---|---|---|
JointNet [24] (report) | mask | 81.29 | 86.21 | 83.68 | 71.99 | - |
JointNet [24] | mask | 79.85 | 85.21 | 82.44 | 70.13 | 4.16 |
contour | 27.30 | 27.32 | 27.31 | 15.82 | ||
FastFCN [20] | mask | 65.70 | 78.83 | 71.67 | 55.85 | 2.19 |
contour | 13.13 | 14.38 | 13.73 | 7.37 | ||
DeepLabv3+ [49] | mask | 69.90 | 83.21 | 75.98 | 61.26 | 2.20 |
contour | 21.38 | 22.50 | 21.92 | 12.31 | ||
EU-Net | mask | 83.40 | 86.70 | 85.01 | 73.93 | 1.13 |
contour | 28.23 | 29.44 | 28.83 | 16.84 |
Recall (%) | Precision (%) | F1 (%) | IoU (%) | Time (s) | ||
---|---|---|---|---|---|---|
SRI-Net [22] (report) | mask | 81.46 | 85.77 | 83.56 | 71.76 | - |
2-levels U-Nets [60] (report) | mask | - | - | - | 74.55 | 208.8 |
Building-A-Net [25] (report) | mask | - | - | - | 78.73 | 150.50 |
Web-Net [38] (report) | mask | - | - | - | 80.10 | 56.50 |
FastFCN [20] | mask | 83.55 | 87.51 | 85.48 | 74.64 | 29.12 |
contour | 11.31 | 11.07 | 11.18 | 5.92 | ||
DeepLabv3+ [49] | mask | 84.00 | 87.88 | 85.90 | 75.28 | 29.61 |
contour | 13.94 | 15.85 | 14.84 | 8.01 | ||
EU-Net | mask | 88.14 | 90.28 | 89.20 | 80.50 | 14.79 |
contour | 19.81 | 21.18 | 20.47 | 11.40 |
Austin | Chicago | Kitsap | Tyrol-w | Vienna | Overall | |
---|---|---|---|---|---|---|
Web-Net [38] (report) | 82.49 | 73.90 | 70.71 | 83.72 | 83.49 | 80.10 |
FastFCN [20] | 75.56 | 70.05 | 64.37 | 74.10 | 78.97 | 74.64 |
DeepLabv3+ [49] | 78.89 | 69.93 | 66.11 | 73.09 | 79.24 | 75.28 |
EU-Net | 82.86 | 76.18 | 70.68 | 80.83 | 83.55 | 80.50 |
(256*256, 64) | (354*354, 32) | (512*512, 16) | ||
---|---|---|---|---|
WHU | mask | 90.56 | 90.35 | 89.00 |
contour | 32.38 | 32.22 | 29.91 | |
Massachusetts | mask | 73.93 | 73.75 | 70.42 |
contour | 16.84 | 16.78 | 14.99 | |
Inria | mask | 80.50 | 80.24 | 80.20 |
contour | 11.40 | 11.27 | 11.30 |
(224*224, 64) | (256*256, 32) | (256*256, 16) | |
---|---|---|---|
mask IoU | 79.83 | 80.22 | 80.01 |
contour IoU | 11.12 | 10.91 | 10.87 |
WHU | Massachusetts | Inria | ||
---|---|---|---|---|
EU-Net | mask | 90.56 | 73.93 | 80.50 |
contour | 32.38 | 16.84 | 11.40 | |
EU-Net-simple | mask | 90.21 | 73.07 | 78.74 |
contour | 32.13 | 16.52 | 10.61 |
Massachusetts | Inria | ||
---|---|---|---|
CE+FL | mask | 73.93 | 80.50 |
contour | 16.84 | 11.40 | |
CE | mask | 72.83 | 80.45 |
contour | 16.53 | 11.32 |
1 | 2 | 3 | |
---|---|---|---|
mask | 73.67 | 73.93 | 73.51 |
contour | 16.62 | 16.84 | 16.65 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kang, W.; Xiang, Y.; Wang, F.; You, H. EU-Net: An Efficient Fully Convolutional Network for Building Extraction from Optical Remote Sensing Images. Remote Sens. 2019, 11, 2813. https://doi.org/10.3390/rs11232813
Kang W, Xiang Y, Wang F, You H. EU-Net: An Efficient Fully Convolutional Network for Building Extraction from Optical Remote Sensing Images. Remote Sensing. 2019; 11(23):2813. https://doi.org/10.3390/rs11232813
Chicago/Turabian StyleKang, Wenchao, Yuming Xiang, Feng Wang, and Hongjian You. 2019. "EU-Net: An Efficient Fully Convolutional Network for Building Extraction from Optical Remote Sensing Images" Remote Sensing 11, no. 23: 2813. https://doi.org/10.3390/rs11232813
APA StyleKang, W., Xiang, Y., Wang, F., & You, H. (2019). EU-Net: An Efficient Fully Convolutional Network for Building Extraction from Optical Remote Sensing Images. Remote Sensing, 11(23), 2813. https://doi.org/10.3390/rs11232813