Semantic Labeling of High Resolution Aerial Imagery and LiDAR Data with Fine Segmentation Network
"> Figure 1
<p>Illustration of sparse connectivity and parameter sharing for CNN: (<b>a</b>) sparse connectivity; and (<b>b</b>) parameter sharing.</p> "> Figure 2
<p>Illustration of convolutional layer.</p> "> Figure 3
<p>Network architecture of FSN, where structure depicted in the solid-line box is encoder and structure depicted in the dashed-line box is decoder.</p> "> Figure 4
<p>Architecture of inception module.</p> "> Figure 5
<p>Example of: (<b>a</b>) transpose convolutional layer; and (<b>b</b>) sub-pixel convolution layer.</p> "> Figure 6
<p>General procedure of the image segmentation.</p> "> Figure 7
<p>Structure of multi-scale extra branches: (<b>a</b>) lightweight branch; (<b>b</b>) middleweight branch; and (<b>c</b>) heavyweight branch.</p> "> Figure 8
<p>Semantic Labeling results for two patches of Potsdam validation set. Classes: impervious surface (white); buildings (blue); low vegetation (cyan); tree (green); car (yellow); clutter (red). In the first row, (<b>a</b>) is true orthophoto, (<b>b</b>) is ground truth, (<b>c</b>–<b>e</b>) are inference results of FSN-noMLP, FSN-noSC and the proposed FSN for image patch with building with roof lawn; in the second row, (<b>f</b>) is true orthophoto, (<b>g</b>) is ground truth, (<b>h</b>–<b>j</b>) are inference results of FSN-noMLP, FSN-noSC and the proposed FSN for image patch with a street between buildings.</p> "> Figure 9
<p>Errors of commission and omission of each model per classes (Potsdam dataset): (<b>a</b>) error of commission; and (<b>b</b>) error of omission. Lower values indicate the better segmentation performance.</p> "> Figure 10
<p>Semantic Labeling results for three patches of Potsdam test set. Classes: impervious surface (white); buildings (blue); low vegetation (cyan); tree (green); car (yellow); clutter (red). In the first row, (<b>a</b>) is true orthophoto, (<b>b</b>) is ground truth, (<b>c</b>–<b>h</b>) are inference results of FCN-8s, SegNet, FSN-noL, HSN, FSN and FSN+CRFs for image patch with buildings; in the second row, (<b>i</b>) is true orthophoto, (<b>j</b>) is ground truth, (<b>k</b>–<b>p</b>) are inference results of FCN-8s, SegNet, FSN-noL, HSN, FSN and FSN+CRFs for image patch with a street between buildings; in the third row, (<b>q</b>) is true orthophoto, (<b>r</b>) is ground truth, (<b>s</b>–<b>x</b>) are inference results of FCN-8s, SegNet, FSN-noL, HSN, FSN and FSN+CRFs for image patch with clutters.</p> "> Figure 11
<p>Errors of commission and omission of each model per classes (Vaihingen dataset): (<b>a</b>) error of commission; and (<b>b</b>) error of omission. Lower values indicate the better segmentation performance.</p> "> Figure 12
<p>Semantic Labeling results for three patches of Vaihingen test set. Classes: impervious surface (white); buildings (blue); low vegetation (cyan); tree (green); car (yellow); clutter (red). In the first row, (<b>a</b>) is true orthophoto, (<b>b</b>) is ground truth, (<b>c</b>–<b>h</b>) are inference results of FCN-8s, SegNet, FSN-noL, HSN, FSN and FSN+CRFs for image patch with trees and low vegetation areas; in the second row, (<b>i</b>) is true orthophoto, (<b>j</b>) is ground truth, (<b>k</b>–<b>p</b>) are inference results of FCN-8s, SegNet, FSN-noL, HSN, FSN and FSN+CRFs for image patch with a low-rise building; in the third row, (<b>q</b>) is true orthophoto, (<b>r</b>) is ground truth, (<b>s</b>–<b>x</b>) are inference results of FCN-8s, SegNet, FSN-noL, HSN, FSN and FSN+CRFs for image patch with cars parked around buildings.</p> "> Figure A1
<p>Architecture of FCN-8s, where structure depicted in the solid-line box is encoder and structure depicted in the dashed-line box is decoder.</p> "> Figure A2
<p>Architecture of SegNet, where structure depicted in the solid-line box is encoder and structure depicted in the dashed-line box is decoder.</p> "> Figure A3
<p>Architecture of HSN, where structure depicted in the solid-line box is encoder and structure depicted in the dashed-line box is decoder.</p> "> Figure A4
<p>Full tile prediction for NO. 5_11 of Potsdam dataset. Classes: impervious surface (white); buildings (blue); low vegetation (cyan); tree (green); car (yellow); clutter (red). (<b>a</b>) TOP, true orthophoto; (<b>b</b>) nDSM, normalized DSM; (<b>c</b>) GT, ground truth; (<b>d</b>–<b>f</b>) inference result of FCN-8s, SegNet, and FSN-noL, respectively; (<b>g</b>–<b>i</b>) Red/Green Images of FCN-8s, SegNet, and FSN-noL, respectively; (<b>j</b>–<b>l</b>) inference result of HSN, FSN and FSN + CRFs respectively; and (<b>m</b>–<b>o</b>) Red/Green Images of HSN, FSN and FSN + CRFs, respectively.</p> "> Figure A4 Cont.
<p>Full tile prediction for NO. 5_11 of Potsdam dataset. Classes: impervious surface (white); buildings (blue); low vegetation (cyan); tree (green); car (yellow); clutter (red). (<b>a</b>) TOP, true orthophoto; (<b>b</b>) nDSM, normalized DSM; (<b>c</b>) GT, ground truth; (<b>d</b>–<b>f</b>) inference result of FCN-8s, SegNet, and FSN-noL, respectively; (<b>g</b>–<b>i</b>) Red/Green Images of FCN-8s, SegNet, and FSN-noL, respectively; (<b>j</b>–<b>l</b>) inference result of HSN, FSN and FSN + CRFs respectively; and (<b>m</b>–<b>o</b>) Red/Green Images of HSN, FSN and FSN + CRFs, respectively.</p> "> Figure A5
<p>Full tile prediction for NO. 3 of Vaihingen dataset. Classes: impervious surface (white); buildings (blue); low vegetation (cyan); tree (green); car (yellow); clutter (red). (<b>a</b>) TOP, true orthophoto; (<b>b</b>) nDSM, normalized DSM; (<b>c</b>) GT, ground truth; (<b>d</b>,<b>e</b>) inference result and Red/Green Image of FCN-8s; (<b>f</b>,<b>g</b>) inference result and Red/Green Image of SegNet; (<b>h</b>,<b>i</b>) inference result and Red/Green Image of FSN-noL; (<b>j</b>,<b>k</b>) inference result and Red/Green Image of HSN; (<b>l</b>,<b>m</b>) inference result and Red/Green Image of FSN; and (<b>n</b>,<b>o</b>) inference result and Red/Green Image of FSN + CRFs.</p> ">
Abstract
:1. Introduction
- The encoder is structured into two parts: a main encoder and a lightweight branch. The main encoder is based on the Vgg16 [28] for CIR images. The lightweight branch is designed to deal with its corresponding LiDAR images: the digital surface models (DSMs) and the normalized DSMs (nDSMs) independently. This design accomplishes the feature extraction of multi-sensor data with a relatively few parameters.
- Sub-pixel convolution layers proposed for image and video super-resolution [29] are implemented to replace the traditional deconvolution layers in the proposed FSN. Without adding any artificial value, sub-pixel convolution layer calculates convolutions in low resolution feature maps and upscales them in a single step. Thus, the contextual area can be expanded by a filter of the same size as that of common up-sampling layer.
- MLP is used to accomplish effective feature-level fusion of multi-sensor remote sensing data at the back end of the structure. Moreover, multi-resolution feature maps are also fed into MLP to mitigate the recognition/localization trade-off.
2. Convolutional Neural Network
2.1. Convolutional Layer
2.2. Nonlinear Activation Layer
2.3. Spatial Pooling Layer
2.4. Transposed Convolutional Layer
2.5. Unpooling Layer
3. Proposed FSN Method
3.1. Network Architecture of FSN
3.2. Post-Processing Method for FSN-Based Segmentation
4. Experiments and Results Analysis
4.1. Experimental Design
4.2. Validation of Lightweight Branches
4.3. Validation of Sub-Pixel Convolution Layers and Multi-Layer Perceptron
4.4. Potsdam Dataset Results
4.5. Vaihingen Dataset Results
4.6. Submission to the ISPRS Challenge
4.7. Trainable Weights and Receptive Fields
5. Conclusions
Author Contributions
Acknowledgments
Conflicts of Interest
Appendix A. Related Works of CNN Architectures
Appendix A.1. Architecture 1: FCN
Appendix A.2. Architecture 2: SegNet
Appendix A.3. Architecture 3: HSN
Appendix B. Full Tile Prediction
References
- Sherrah, J. Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery. arXiv, 2016; arXiv:1606.02585v1. [Google Scholar]
- Sun, J.; Yang, J.; Zhang, C.; Yun, W.; Qu, J. Automatic remotely sensed image classification in a grid environment based on the maximum likelihood method. Math. Comput. Model. 2013, 58, 573–581. [Google Scholar] [CrossRef]
- Toth, D.; Aach, T. Improved minimum distance classification with gaussian outlier detection for industrial inspection. In Proceedings of the 11th International Conference on Image Analysis and Processing, Palermo, Italy, 26–28 September 2001; pp. 584–588. [Google Scholar]
- Jumb, V.; Sohani, M.; Shrivas, A. Color image segmentation using k-means clustering and otsus adaptive thresholding. Int. J. Innov. Technol. Explor. Eng. 2014, 3, 71–76. [Google Scholar]
- Ratle, F.; Camps-Valls, G.; Weston, J. Semisupervised neural networks for efficient hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2010, 48, 2271–2282. [Google Scholar] [CrossRef]
- Yu, H.; Gao, L.; Li, J.; Li, S.S.; Zhang, B.; Benediktsson, J. Spectral-spatial hyperspectral image classification using subspace-based support vector machines and adaptive markov random fields. Remote Sens. 2016, 8, 355. [Google Scholar] [CrossRef]
- Sugg, Z.; Finke, T.; Goodrich, D.; Susan Moran, M.; Yool, S. Mapping impervious surfaces using object-oriented classification in a semiarid urban region. Photogramm. Eng. Remote Sens. 2014, 80, 343–352. [Google Scholar] [CrossRef]
- Song, B.; Li, P.; Li, J.; Plaza, A. One-class classification of remote sensing images using kernel sparse representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 1613–1623. [Google Scholar] [CrossRef]
- Wang, Q.; Lin, J.; Yuan, Y. Salient band selection for hyperspectral image classification via manifold ranking. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1–11. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; Lecun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Cirean, D.C.; Giusti, A.; Gambardella, L.M.; Schmidhuber, J. Deep neural networks segment neuronal membranes in electron microscopy images. In Proceedings of the Advances in Neural Information Processing System, Nevada, NV, USA, 3–6 December 2012; pp. 2852–2860. [Google Scholar]
- Pinheiro, P.; Collobert, R. Recurrent convolutional neural networks for scene labeling. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 82–90. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Simultaneous detection and segmentation. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 297–312. [Google Scholar]
- Gupta, S.; Girshick, R.; Arbeláez, P.; Malik, J. Learning rich features from rgb-d images for object detection and segmentation. In Computer Vision—ECCV; Springer: Cham, Switzerland, 2014; pp. 345–360. [Google Scholar]
- Ganin, Y.; Lempitsky, V. N4-fields: Neural network nearest neighbor fields for image transforms. In Proceedings of the Asian Conference on Computer Vision (ACCV), Kent Ridge, Singapore, 1–5 November 2014; pp. 536–551. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. In Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.L.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. In Proceedings of the 3rd International Conference on Learning Representations (ICLR2015), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; pp. 1520–1528. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Paisitkriangkrai, S.; Sherrah, J.; Janney, P.; Hengel, A. Effective semantic pixel labelling with convolutional networks and conditional random fields. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 36–43. [Google Scholar]
- Nogueira, K.; Penatti, O.A.B.; Santos, J.D. Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recogn. 2016, 61, 539–556. [Google Scholar] [CrossRef]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks. In Proceedings of the Asian Conference on Computer Vision (ACCV), Taipei, Taiwan, 21–23 November 2016; pp. 180–196. [Google Scholar]
- Liu, Y.; Piramanayagam, S.; Monteiro, S.; Saber, E. Dense semantic labeling of very-high-resolution aerial imagery and lidar with fully-convolutional neural networks and higher-order crfs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1561–1570. [Google Scholar]
- Volpi, M.; Tuia, D. Dense semantic labeling of subdecimeter resolution images with convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 881–893. [Google Scholar] [CrossRef]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. High-resolution aerial image labeling with convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7092–7103. [Google Scholar] [CrossRef]
- Liu, Y.; Minh Nguyen, D.; Deligiannis, N.; Ding, W.; Munteanu, A. Hourglass-shapenetwork based semantic segmentation for high resolution aerial imagery. Remote Sens. 2017, 9, 522. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Machine Learning (ICML), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1874–1883. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Nevada, NV, USA, 3–6 December 2012; pp. 1106–1114. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceeding of International Conference on Machine Leaning (ICML), Haifa, Israel, 21–25 June 2010; pp. 807–814. [Google Scholar]
- Maas, A.Y.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the International Conference on Machine Learning (ICML), Atlanta, GA, USA, 16–21 June 2013; pp. 16–21. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Zeiler, M.D.; Taylor, G.W.; Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2018–2025. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- ISPRS Potsdam 2D Semantic Labeling Dataset. Available online: http://www2.isprs.org/commissions/ comm3/wg4/2d-sem-label-potsdam.html (accessed on 10 December 2017).
- ISPRS Vaihingen 2D Semantic Labeling Dataset. Available online: http://www2.isprs.org/commissions/ comm3/wg4/2d-sem-label-vaihingen.html (accessed on 10 December 2017).
- Gerke, M. Use of the Stair Vision Library within the ISPRS 2d Semantic Labeling Benchmark (Vaihingen); University of Twente: Enschede, The Netherlands, 2015. [Google Scholar]
- Kingma, D.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Pretrained Models. Available online: http://www.vlfeat.org/matconvnet/pretrained/ (accessed on 10 December 2017).
- ISPRS Semantic Labeling Contest (2D): Results (Potsdam). Available online: http://www2.isprs.org/ potsdam-2d-semantic-labeling.html (accessed on 30 March 2018).
- ISPRS Semantic Labeling Contest (2D): Results (Vaihingen). Available online: http://www2.isprs.org/ vaihingen-2d-semantic-labeling-contest.html (accessed on 30 March 2018).
- Newell, A.; Yang, K.; Deng, J. Stacked Hourglass Networks for Human Pose Estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 483–499. [Google Scholar]
- Chen, W.; Fu, Z.; Yang, D.; Deng, J. Single-image depth perception in the wild. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016; pp. 730–738. [Google Scholar]
Layer | Filter Size | Number of Filters | Stride | Padding | Layer | Filter Size | Number of Filters | Stride | Padding |
---|---|---|---|---|---|---|---|---|---|
Conv1_1 | 3 | 64 | 1 | 1 | Conv4_3 | 3 | 512 | 1 | 1 |
Conv1_2 | 3 | 64 | 1 | 1 | Pool4 | 2 | 2 | ||
Pool1 | 2 | 2 | Conv5_1 | 3 | 512 | 1 | 1 | ||
Conv2_1 | 3 | 128 | 1 | 1 | Conv5_2 | 3 | 512 | 1 | 1 |
Conv2_2 | 3 | 128 | 1 | 1 | Conv5_3 | 3 | 512 | 1 | 1 |
Pool2 | 2 | 2 | Pool5 | 2 | 2 | ||||
Conv3_1 | 3 | 256 | 1 | 1 | ConvL1 | 3 | 64 | 1 | 1 |
Conv3_2 | 3 | 256 | 1 | 1 | PoolL1 | 2 | 2 | ||
Conv3_3 | 3 | 256 | 1 | 1 | ConvL2 | 3 | 128 | 1 | 1 |
Pool3 | 2 | 2 | PoolL2 | 2 | 2 | ||||
Conv4_1 | 3 | 512 | 1 | 1 | ConvL3 | 3 | 16 | 1 | 1 |
Conv4_2 | 3 | 512 | 1 | 1 | PoolL3 | 2 | 2 |
Layer | Filter Size | Number of Filters | Stride | Padding |
Conv6 | 3 | 32 | 1 | 1 |
ConvI1 | 1 | IP1: 64; IP2: 64; IP3: 256 | 1 | 1 |
ConvI2_1 | 1 | IP1: 128; IP2: 128; IP3: 128 | 1 | 1 |
ConvI2_2 | 3 | IP1: 128; IP2: 128; IP3: 512 | 1 | 1 |
ConvI3_1 | 1 | IP1: 64; IP2: 64; IP3: 64 | 1 | 1 |
ConvI3_2 | 5 | IP1: 32; IP2: 32; IP3: 128 | 1 | 1 |
ConvI4_1 | 1 | IP1: 32; IP2: 32; IP3: 64 | 1 | 1 |
ConvI4_2 | 7 | IP1: 32; IP2: 32; IP3: 128 | 1 | 1 |
Conv7_1 | 1 | 256 | 1 | 1 |
Conv7_2 | 3 | 384 | 1 | 1 |
Layer | Scale | Input Channel | Output Channel | |
SP1 | 2 | 256 | 64 | |
SP2 | 4 | 1024 | 64 | |
SP3 | 8 | 384 | 6 |
Layer | Filter Size | Number of Filters | Layer | Filter Size | Number of Filters |
---|---|---|---|---|---|
ConvM1_1 | 3 | 64 | ConvH1_3 | 3 | 64 |
ConvM1_2 | 3 | 64 | ConvH2_1 | 3 | 128 |
ConvM2_1 | 3 | 128 | ConvH2_2 | 3 | 128 |
ConvM2_2 | 3 | 128 | ConvH2_3 | 3 | 128 |
ConvM2_2 | 3 | 64 | ConvH3_1 | 3 | 64 |
ConvM2_2 | 3 | 16 | ConvH3_2 | 3 | 64 |
ConvH1_1 | 3 | 64 | ConvH3_3 | 3 | 16 |
ConvH1_2 | 3 | 64 |
Methods | Imp. Surf. | Build | Low Veg. | Tree | Car | Clutter | Aver. F1 | OA | |
---|---|---|---|---|---|---|---|---|---|
GT | HW | 91.03 | 90.83 | 83.84 | 80.79 | 90.71 | 85.76 | 87.16 | 87.43 |
MW | 91.01 | 91.30 | 84.27 | 80.59 | 90.85 | 84.79 | 87.14 | 87.54 | |
LW | 91.41 | 91.16 | 84.41 | 81.14 | 91.51 | 83.50 | 87.19 | 87.59 | |
E-GT | HW | 92.90 | 91.60 | 85.70 | 83.71 | 96.13 | 88.05 | 89.68 | 89.26 |
MW | 92.89 | 92.90 | 86.01 | 83.67 | 96.20 | 87.10 | 89.80 | 89.30 | |
LW | 93.23 | 91.89 | 86.26 | 84.01 | 96.53 | 85.73 | 89.61 | 89.44 |
Methods | Imp. Surf. | Build | Low Veg. | Tree | Car | Clutter | Aver. F1 | OA | |
---|---|---|---|---|---|---|---|---|---|
GT | FSN-noSC | 90.84 | 90.44 | 83.51 | 79.18 | 90.92 | 85.80 | 86.78 | 86.99 |
FSN-noMLP | 90.21 | 90.02 | 83.21 | 80.11 | 89.81 | 85.56 | 86.49 | 86.72 | |
FSN | 91.41 | 91.16 | 84.41 | 81.14 | 91.51 | 83.50 | 87.19 | 87.59 | |
E-GT | FSN-noSC | 92.70 | 91.19 | 85.23 | 82.27 | 96.43 | 88.02 | 89.31 | 88.79 |
FSN-noMLP | 92.13 | 90.87 | 85.08 | 83.16 | 95.59 | 87.85 | 89.11 | 88.61 | |
FSN | 93.23 | 91.89 | 86.26 | 84.01 | 96.53 | 85.73 | 89.61 | 89.44 |
Methods | Imp. Surf. | Build | Low Veg. | Tree | Car | Clutter | Aver. F1 | OA | |
---|---|---|---|---|---|---|---|---|---|
GT | FCN-8s | 90.02 | 94.59 | 85.59 | 78.59 | 87.68 | 46.14 | 80.44 | 87.36 |
SegNet | 89.52 | 93.33 | 85.68 | 79.78 | 88.28 | 44.69 | 80.21 | 87.08 | |
FSN-noL | 90.11 | 94.54 | 86.12 | 80.34 | 88.29 | 44.91 | 80.72 | 87.74 | |
HSN | 89.92 | 93.96 | 85.80 | 79.90 | 84.20 | 44.24 | 79.67 | 87.30 | |
FSN | 90.34 | 94.74 | 86.19 | 80.46 | 88.75 | 51.43 | 81.99 | 87.91 | |
FSN + CRFs | 91.14 | 95.73 | 86.75 | 80.66 | 89.52 | 51.79 | 82.60 | 88.57 | |
E-GT | FCN-8s | 92.34 | 95.84 | 87.95 | 81.63 | 94.01 | 50.87 | 83.77 | 89.80 |
SegNet | 91.76 | 94.58 | 88.08 | 82.87 | 94.76 | 49.30 | 83.56 | 89.51 | |
FSN-noL | 92.41 | 95.81 | 88.58 | 83.44 | 94.63 | 49.50 | 84.06 | 90.21 | |
HSN | 92.23 | 95.25 | 88.24 | 82.99 | 89.82 | 48.83 | 82.89 | 89.77 | |
FSN | 92.95 | 95.85 | 88.61 | 83.76 | 95.19 | 55.92 | 85.38 | 90.51 | |
FSN + CRFs | 93.32 | 96.89 | 89.08 | 83.69 | 95.66 | 56.41 | 85.84 | 90.94 |
Methods | Imp. Surf. | Build | Low Veg. | Tree | Car | Aver. F1 | OA | |
---|---|---|---|---|---|---|---|---|
GT | FCN-8s | 87.28 | 90.28 | 73.70 | 84.91 | 68.84 | 81.00 | 84.65 |
SegNet | 87.79 | 91.59 | 74.02 | 84.49 | 76.87 | 82.95 | 85.07 | |
FSN-noL | 88.63 | 92.68 | 73.98 | 84.67 | 76.30 | 83.25 | 85.66 | |
HSN | 89.28 | 92.80 | 74.04 | 83.96 | 74.56 | 82.93 | 85.75 | |
FSN | 88.89 | 92.55 | 75.04 | 85.50 | 78.01 | 84.00 | 86.13 | |
FSN + CRFs | 89.49 | 92.95 | 75.93 | 85.78 | 74.01 | 83.63 | 86.63 | |
E-GT | FCN-8s | 90.58 | 92.70 | 78.66 | 89.43 | 77.02 | 85.68 | 88.59 |
SegNet | 91.13 | 93.89 | 78.80 | 89.09 | 84.60 | 87.50 | 88.98 | |
FSN-noL | 91.99 | 95.03 | 78.92 | 89.32 | 84.78 | 88.01 | 89.64 | |
HSN | 92.40 | 95.15 | 78.92 | 88.76 | 82.42 | 87.53 | 89.67 | |
FSN | 92.27 | 94.89 | 80.19 | 90.14 | 86.31 | 88.76 | 90.14 | |
FSN + CRFs | 92.78 | 95.18 | 80.99 | 90.33 | 83.26 | 88.51 | 90.55 |
Network | FCN-8s | SegNet | HSN | FSN |
---|---|---|---|---|
Trainable Weights | 140.5 M | 29.4 M | 5.56 M | 18.0 M |
Receptive Field | 404 | 212 | 212 | 596 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pan, X.; Gao, L.; Marinoni, A.; Zhang, B.; Yang, F.; Gamba, P. Semantic Labeling of High Resolution Aerial Imagery and LiDAR Data with Fine Segmentation Network. Remote Sens. 2018, 10, 743. https://doi.org/10.3390/rs10050743
Pan X, Gao L, Marinoni A, Zhang B, Yang F, Gamba P. Semantic Labeling of High Resolution Aerial Imagery and LiDAR Data with Fine Segmentation Network. Remote Sensing. 2018; 10(5):743. https://doi.org/10.3390/rs10050743
Chicago/Turabian StylePan, Xuran, Lianru Gao, Andrea Marinoni, Bing Zhang, Fan Yang, and Paolo Gamba. 2018. "Semantic Labeling of High Resolution Aerial Imagery and LiDAR Data with Fine Segmentation Network" Remote Sensing 10, no. 5: 743. https://doi.org/10.3390/rs10050743
APA StylePan, X., Gao, L., Marinoni, A., Zhang, B., Yang, F., & Gamba, P. (2018). Semantic Labeling of High Resolution Aerial Imagery and LiDAR Data with Fine Segmentation Network. Remote Sensing, 10(5), 743. https://doi.org/10.3390/rs10050743