Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks
"> Figure 1
<p>Aerial imagery of the study area. The area covers 18 km<math display="inline"> <semantics> <msup> <mrow/> <mn>2</mn> </msup> </semantics> </math> of residential area, factories or farms in New Zealand, and is located from –43<math display="inline"> <semantics> <msup> <mrow/> <mo>°</mo> </msup> </semantics> </math>28′(N) to –43<math display="inline"> <semantics> <msup> <mrow/> <mo>°</mo> </msup> </semantics> </math>30′(S) and 172<math display="inline"> <semantics> <msup> <mrow/> <mo>°</mo> </msup> </semantics> </math>34′(W) to 172<math display="inline"> <semantics> <msup> <mrow/> <mo>°</mo> </msup> </semantics> </math>38′(E).</p> "> Figure 2
<p>Scheme of the experiment. The MC–FCNs are trained and cross-validated using training dataset, then evaluated by the testing dataset.</p> "> Figure 3
<p>Network architecture of the proposed multi-constraint fully convolutional network (MC–FCN). The MC–FCN model adopts the basic structure of U–Net and adds three extra multi-scale constraints between upsampled layers and their corresponding ground truths.</p> "> Figure 4
<p>Segmentation results of HOG–Ada, FCN, U–Net and MC–FCN for the Test-1, Test-2 and Test-3 regions. The green, red, blue and black pixels of the maps represent the predictions of true positive, false positive, false negative and true negative, respectively.</p> "> Figure 5
<p>Center patches of segmentation results of Test-1, Test-2 and Test-3 regions. The green, red, blue and black pixels of the maps represent the predictions of true positive, false positive, false negative and true negative, respectively.</p> "> Figure 6
<p>Segmentation results of randomly sampled buildings from the HOG–Ada, FCN, U–Net and MC–FCN methods. Images within columns (<b>a</b>–<b>h</b>) are sampled buildings from Test-1, Test-2 and Test-3 regions. The green, red and blue channels of results represent true positive, false positive and false negative predictions, respectively, of every pixel.</p> "> Figure 7
<p>Representative results of building segmentation through the U–Net and MC–FCN methods. The green, red and blue channels of results represent true positive, false positive and false negative predictions, respectively, of every pixel.</p> "> Figure 8
<p>Representative results of building segmentation through MC–FCN models with variant numbers of subconstraints. The green, red and blue channels of results represent true positive, false positive and false negative predictions, respectively, of every pixel.</p> "> Figure 9
<p>Representative results of segmented buildings from MC–FCN models with different constraint combinations. The green, red and blue channels of results represent true positive, false positive and false negative predictions, respectively, of every pixel.</p> "> Figure 10
<p>Zoomed-in image of the middle-left corner of the Test-1 region in <a href="#remotesensing-10-00407-f004" class="html-fig">Figure 4</a>, where a small lake was misclassified as a building by all CNN-based methods. The green, red, blue and black pixels denote the predictions of true positive, false positive, false negative and true negative, respectively.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data
2.2. Method
2.2.1. Data Preprocessing
2.2.2. MC–FCN
- According to backpropagation algorithms, at every iteration, the closer to the output layer, the more significant will be the updating of the parameters; thus, during training, the model’s performance becomes more sensitive to the layer used to compute the final loss.
- Models that minimize only the difference between the final output and ground truth will lead to insufficient constraints for the middle layers, and applying more constraints on the intermediate layers can enhance the multi-scale feature representation by further optimizing the parameters in these layers.
2.3. Experimental Setup
2.3.1. Architecture of the MC–FCN
2.3.2. Various Combinations of Constraints
3. Results
3.1. Qualitative Result Comparison
3.2. Quantitative Result Comparison
3.3. Sensitivity Analysis of Constraints
3.4. Computational Efficiency
4. Discussions
4.1. About the Proposed MC–FCN Model
4.2. Accuracies, Uncertainties and Limitations
5. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
Abbreviations
AdaBoost | Adaptive Boosting |
CRF | Conditional Random Field |
CNN | Convolutional Neural Network |
HOG | Histogram of Oriented Gradients |
FCN | Fully Convolutional Networks |
MC–FCN | Multi-Constraint Fully Convolutional Network |
References
- Ma, L.; Li, M.; Ma, X.; Cheng, L.; Du, P.; Liu, Y. A review of supervised object-based land-cover image classification. ISPRS J. Photogramm. Remote Sens. 2017, 130, 277–293. [Google Scholar] [CrossRef]
- Glasbey, C.A. An analysis of histogram-based thresholding algorithms. CVGIP Graph. Model. Image Process. 1993, 55, 532–537. [Google Scholar] [CrossRef]
- Chen, J.S.; Huertas, A.; Medioni, G. Fast convolution with Laplacian-of-Gaussian masks. IEEE Trans. Pattern Anal. Mach. Intell. 1987, PAMI-9, 584–590. [Google Scholar] [CrossRef]
- Kanopoulos, N.; Vasanthavada, N.; Baker, R.L. Design of an image edge detection filter using the Sobel operator. IEEE J. Solid-State Circ. 1988, 23, 358–367. [Google Scholar] [CrossRef]
- Canny, J. A computational approach to edge detection. In Readings in Computer Vision; Elsevier: Amsterdam, The Netherlands, 1987; pp. 184–203. [Google Scholar]
- Wu, Z.; Leahy, R. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 1993, 15, 1101–1113. [Google Scholar] [CrossRef]
- Chuang, K.S.; Tzeng, H.L.; Chen, S.; Wu, J.; Chen, T.J. Fuzzy c-means clustering with spatial information for image segmentation. Comput. Med. Imaging Graph. 2006, 30, 9–15. [Google Scholar] [CrossRef] [PubMed]
- Zhen, D.; Zhongshan, H.; Jingyu, Y.; Zhenming, T. FCM Algorithm for the Research of Intensity Image Segmentation. Acta Electron. Sin. 1997, 5, 39–43. [Google Scholar]
- Pappas, T.N. An adaptive clustering algorithm for image segmentation. IEEE Trans. Signal Process. 1992, 40, 901–914. [Google Scholar] [CrossRef]
- Tremeau, A.; Borel, N. A region growing and merging algorithm to color segmentation. Pattern Recognit. 1997, 30, 1191–1203. [Google Scholar] [CrossRef]
- Ok, A.O. Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts. ISPRS J. Photogramm. Remote Sens. 2013, 86, 21–40. [Google Scholar] [CrossRef]
- Karantzalos, K.; Paragios, N. Recognition-driven two-dimensional competing priors toward automatic and accurate building detection. IEEE Trans. Geosci. Remote Sens. 2009, 47, 133–144. [Google Scholar] [CrossRef]
- Li, M.; Zang, S.; Zhang, B.; Li, S.; Wu, C. A review of remote sensing image classification techniques: The role of spatio-contextual information. Eur. J. Remote Sens. 2014, 47, 389–411. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; Volume 1. [Google Scholar]
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the Computer IEEE Computer Society Conference on Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Inglada, J. Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features. ISPRS J. Photogramm. Remote Sens. 2007, 62, 236–248. [Google Scholar] [CrossRef]
- Aytekin, Ö.; Zöngür, U.; Halici, U. Texture-based airport runway detection. IEEE Geosci. Remote Sens. Lett. 2013, 10, 471–475. [Google Scholar] [CrossRef]
- Dong, Y.; Du, B.; Zhang, L. Target detection based on random forest metric learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 1830–1838. [Google Scholar] [CrossRef]
- Li, E.; Femiani, J.; Xu, S.; Zhang, X.; Wonka, P. Robust rooftop extraction from visible band images using higher order CRF. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4483–4495. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks; MIT Press: Cambridge, MA, USA, 1995; Volume 3361, p. 1995. [Google Scholar]
- Ciresan, D.; Giusti, A.; Gambardella, L.M.; Schmidhuber, J. Deep neural networks segment neuronal membranes in electron microscopy images. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2012; pp. 2843–2851. [Google Scholar]
- Guo, Z.; Shao, X.; Xu, Y.; Miyazaki, H.; Ohira, W.; Shibasaki, R. Identification of village building via Google Earth images and supervised machine learning methods. Remote Sens. 2016, 8, 271. [Google Scholar] [CrossRef]
- Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1–9. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv, 2015; arXiv:1511.00561. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA, 3–7 December 2015; pp. 1520–1528. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U–Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA, 7–13 December 2015; pp. 1395–1403. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 17 June–22 June 2017; Volume 1, p. 4. [Google Scholar]
- Polak, M.; Zhang, H.; Pi, M. An evaluation metric for image segmentation of multiple objects. Image Vis. Comput. 2009, 27, 1223–1227. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Carletta, J. Assessing agreement on classification tasks: The kappa statistic. Comput. Linguist. 1996, 22, 249–254. [Google Scholar]
- Paisitkriangkrai, S.; Sherrah, J.; Janney, P.; Hengel, V.D. Effective semantic pixel labelling with convolutional networks and conditional random fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 36–43. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Nagi, J.; Ducatelle, F.; Di Caro, G.A.; Cireşan, D.; Meier, U.; Giusti, A.; Nagi, F.; Schmidhuber, J.; Gambardella, L.M. Max-pooling convolutional neural networks for vision-based hand gesture recognition. In Proceedings of the IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 16–18 November 2011; pp. 342–347. [Google Scholar]
- Novak, K. Rectification of digital imagery. Photogramm. Eng. Remote Sens. 1992, 58. [Google Scholar]
- Shore, J.; Johnson, R. Properties of cross-entropy minimization. IEEE Trans. Inf. Theory 1981, 27, 472–482. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv, 2014; arXiv:1412.6980. [Google Scholar]
- Mboga, N.; Persello, C.; Bergado, J.R.; Stein, A. Detection of Informal Settlements from VHR Images Using Convolutional Neural Networks. Remote Sens. 2017, 9, 1106. [Google Scholar] [CrossRef]
- Guo, Z.; Chen, Q.; Wu, G.; Xu, Y.; Shibasaki, R.; Shao, X. Village Building Identification Based on Ensemble Convolutional Neural Networks. Sensors 2017, 17, 2487. [Google Scholar] [CrossRef] [PubMed]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 645–657. [Google Scholar] [CrossRef]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Fully convolutional networkss for remote sensing image classification. In Proceedings of the IEEE International Conference on Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 5071–5074. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef]
- Jin, L.; Gao, S.; Li, Z.; Tang, J. Hand-crafted features or machine learnt features? together they improve RGB-D object recognition. In Proceedings of the IEEE International Symposium on Multimedia (ISM), Taichung, Taiwan, 10–12 December 2014; pp. 311–319. [Google Scholar]
- Wu, S.; Chen, Y.C.; Li, X.; Wu, A.C.; You, J.J.; Zheng, W.S. An enhanced deep feature representation for person re-identification. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–8. [Google Scholar]
Layer | Output Shape | Kernel Size | Scale | Number of Kernels | Connect to |
---|---|---|---|---|---|
Conv_1 | (h, w, k) | (3, 3) | - | k | Input |
ReLU_1 | (h, w, k) | - | - | - | Conv_1 |
BN_1 | (h, w, k) | - | - | - | ReLU_1 |
Conv_2 | (h, w, k) | (3, 3) | - | k | BN_1 |
ReLU_2 | (h, w, k) | - | - | - | Conv_2 |
BN_2 | (h, w, k) | - | - | - | ReLU_2 |
Maxpool_1 | (h/2, w/2, k) | - | (2, 2) | - | BN_2 |
Layer | Output Shape | Kernel Size | Scale | Number of Kernels | Connect to |
---|---|---|---|---|---|
Upsample_1’ | - | (2, 2) | - | Input’ | |
Skip_1’ | - | - | - | Upsample_1’ & BN_2 | |
Conv_1’ | (3, 3) | - | k’ | Skip_1’ | |
ReLU_1’ | - | - | - | Conv_1’ | |
BN_1’ | - | - | - | ReLU_1’ | |
Conv_2’ | (3, 3) | - | k’ | BN_1’ | |
ReLU_2’ | - | - | - | Conv_2’ | |
BN_2’ | - | - | - | ReLU_2’ |
Number of Subconstraints | -Value | -Value | -Value | -Value |
---|---|---|---|---|
0 | 1/1 | - | - | - |
1 | 1/2 | 1/2 | - | - |
2 | 1/3 | 1/3 | 1/3 | - |
3 | 1/4 | 1/4 | 1/4 | 1/4 |
Constraint Combination | -Value | -Value | -Value | -Value |
---|---|---|---|---|
1.0 | - | - | - | |
0.5 | 0.5 | - | - | |
0.5 | - | 0.5 | - | |
0.5 | - | - | 0.5 |
Regions | Methods | Precison | Recall | Jaccard Index | Overall Accuracy | Kappa |
---|---|---|---|---|---|---|
Test-1 | HOG–Ada | 0.497 | 0.715 | 0.414 | 0.826 | 0.479 |
FCN | 0.613 | 0.935 | 0.588 | 0.888 | 0.672 | |
U–Net | 0.869 | 0.928 | 0.815 | 0.964 | 0.875 | |
MC–FCN | 0.892 | 0.937 | 0.841 | 0.968 | 0.895 | |
Test-2 | HOG–Ada | 0.476 | 0.684 | 0.390 | 0.785 | 0.424 |
FCN | 0.671 | 0.934 | 0.641 | 0.894 | 0.713 | |
U–Net | 0.897 | 0.935 | 0.844 | 0.965 | 0.893 | |
MC–FCN | 0.916 | 0.938 | 0.863 | 0.971 | 0.908 | |
Test-3 | HOG–Ada | 0.363 | 0.690 | 0.307 | 0.921 | 0.418 |
FCN | 0.356 | 0.925 | 0.342 | 0.919 | 0.425 | |
U–Net | 0.826 | 0.903 | 0.762 | 0.987 | 0.855 | |
MC–FCN | 0.862 | 0.908 | 0.794 | 0.989 | 0.877 | |
Mean | HOG–Ada | 0.445 | 0.696 | 0.307 | 0.844 | 0.414 |
FCN | 0.547 | 0.931 | 0.524 | 0.900 | 0.603 | |
U–Net | 0.864 | 0.922 | 0.807 | 0.972 | 0.874 | |
MC–FCN | 0.890 | 0.928 | 0.833 | 0.976 | 0.893 |
No. of | Precision | Recall | Jaccard Index | Overall Accuracy | Kappa |
---|---|---|---|---|---|
0 | 0.864 | 0.922 | 0.807 | 0.972 | 0.874 |
1 | 0.864 | 0.932 | 0.814 | 0.972 | 0.880 |
2 | 0.903 | 0.901 | 0.823 | 0.974 | 0.886 |
3 | 0.882 | 0.923 | 0.823 | 0.974 | 0.886 |
Constraints | Precision | Recall | Jaccard Index | Overall Accuracy | Kappa |
---|---|---|---|---|---|
0.864 | 0.922 | 0.807 | 0.972 | 0.874 | |
0.864 | 0.932 | 0.814 | 0.972 | 0.880 | |
0.896 | 0.917 | 0.830 | 0.976 | 0.891 | |
0.890 | 0.928 | 0.833 | 0.976 | 0.893 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, G.; Shao, X.; Guo, Z.; Chen, Q.; Yuan, W.; Shi, X.; Xu, Y.; Shibasaki, R. Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks. Remote Sens. 2018, 10, 407. https://doi.org/10.3390/rs10030407
Wu G, Shao X, Guo Z, Chen Q, Yuan W, Shi X, Xu Y, Shibasaki R. Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks. Remote Sensing. 2018; 10(3):407. https://doi.org/10.3390/rs10030407
Chicago/Turabian StyleWu, Guangming, Xiaowei Shao, Zhiling Guo, Qi Chen, Wei Yuan, Xiaodan Shi, Yongwei Xu, and Ryosuke Shibasaki. 2018. "Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks" Remote Sensing 10, no. 3: 407. https://doi.org/10.3390/rs10030407
APA StyleWu, G., Shao, X., Guo, Z., Chen, Q., Yuan, W., Shi, X., Xu, Y., & Shibasaki, R. (2018). Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks. Remote Sensing, 10(3), 407. https://doi.org/10.3390/rs10030407