Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery
"> Figure 1
<p>Illustration of elementary modules for the convolutional layer. (<b>a</b>) Convolutional layer and (<b>b</b>) Transposed convolutional layer.</p> "> Figure 2
<p>The fully-convolutional network (FCN) [<a href="#B21-remotesensing-09-00522" class="html-bibr">21</a>], SegNet [<a href="#B19-remotesensing-09-00522" class="html-bibr">19</a>] and full patch labeling (FPL) [<a href="#B22-remotesensing-09-00522" class="html-bibr">22</a>] network designs. A, B, C and D are convolutional layers; E is a pooling layer; F is a transposed convolutional layer or unpooling layer (in SegNet); G is a loss layer.</p> "> Figure 3
<p>The proposed hourglass-shaped network (HSN) architecture. A and B are convolutional layers; C and D are inception modules; E is the max pooling layer; F is the transposed convolutional layer; G is the residuals modules; H is the loss layer.</p> "> Figure 4
<p>Composition modules in the proposed HSN architecture. (<b>a</b>) Inception module; (<b>b</b>) Residual module.</p> "> Figure 5
<p>Full tile prediction for tile No. 34. Legend on the Vaihingen dataset: white: impervious surface; blue: buildings; cyan: low vegetation; green: trees; yellow: cars; red: clutter (best viewed in color). (<b>a</b>) Ground truth; (<b>b</b>) HSN; (<b>c</b>) HSN-NS; (<b>d</b>) HSN-NI.</p> "> Figure 6
<p>Full tile prediction for No. 30. Legend on the Vaihengen dataset: white: impervious surface; blue: buildings; cyan: low vegetation; green: trees; yellow: cars; red: clutter (best viewed in color). (<b>a</b>) TOP, true orthophoto; (<b>b</b>) nDSM, normalized DSM; (<b>c</b>) GT, Ground truth labeling; (<b>d</b>–<b>g</b>) the inference result from FCN, SegNet, FPL and HSN respectively; (<b>h</b>) HSN + WBP, HSN inference result after WBP post-processing.</p> "> Figure 7
<p>Semantic segmentation results for some patches of Vaihingen dataset. white: impervious surface; blue: buildings; {cyan}: low vegetation; {green}: trees; {yellow}: cars; {red}: clutter (best viewed in~color). Four different tiles from Vaihingen are included: (<b>a</b>) a narrow passage; (<b>b</b>) shadowed areas from trees and buildings; (<b>c</b>) cars in the shadow; and (<b>d</b>) building roofs with depth discontinuities.</p> "> Figure 8
<p>Full tile prediction for tile No. 04_12. Legend on the Potsdam dataset: white: impervious surface; blue: buildings; cyan: low vegetation; green: trees; yellow: cars; red: clutter (best viewed in color). (<b>a</b>) TOP, true orthophoto; (<b>b</b>) nDSM, normalized DSM; (<b>c</b>) GT, Ground truth labeling; (<b>d</b>–<b>g</b>) the inference result from FCN, SegNet, FPL and HSN respectively; (<b>h</b>) HSN + WBP, HSN inference result after WBP post-processing.</p> "> Figure 9
<p>Semantic segmentation results for some patches of Potsdam dataset.white: impervious surface; blue: buildings; {cyan}: low vegetation; {green}: trees; {yellow}: cars; {red}: clutter (best viewed in~color). Four tiles from Potsdam are included: (<b>a</b>) buildings with backyards; (<b>b</b>) parking lot; (<b>c</b>) rooftops; and (<b>d</b>) low vegetation areas.</p> ">
Abstract
:1. Introduction
- We leverage skip connections with residual units and an inception module in a generic CNN encoder-decoder architecture to improve semantic segmentation of remote sensing data. This combination benefits multi-scale inference and forwards spatial and contextual information directly to the decoding stage.
- We propose to apply overlapped inference in semantic segmentation, which systematically improves classification performance.
- We propose a weighted belief-propagation post-processing module, which addresses the border effects and smooths the results. This module improves the visual quality, as well as the classification results on segment boundaries.
2. Convolutional Neural Networks
2.1. Composition Elements
2.1.1. Convolutional Layer
2.1.2. Transposed Convolutional Layer
2.1.3. Non-Linear Function Layer
2.1.4. Spatial Pooling Layer
2.2. CNN Architectures for Semantic Segmentation of Remote Sensing Images
2.2.1. Patch-Based Methods
2.2.2. Pixel-Based Methods
3. Proposed CNN Architecture for Semantic Segmentation
3.1. Proposed Hourglass-Shaped Convolutional Neural Network
3.1.1. Network Design
3.1.2. Median Frequency Balancing
3.2. Training Strategy
3.3. Overlap Inference
3.4. Post-Processing with Weighted Belief Propagation
4. Experimental Results
4.1. Datasets
4.1.1. Vaihingen Dataset
4.1.2. Potsdam Dataset
4.2. Evaluation Metrics
4.3. Overlap Inference Size
4.4. Skip Connections and Inception Modules
4.5. Performance Evaluations
4.5.1. Vaihingen Dataset
Numerical results
Qualitative Results
4.5.2. Potsdam Dataset
Numerical Results
Qualitative Results
5. Discussion
6. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
Appendix A. Confusion Matrices for Vaihingen and Potsdam Datasets
Appendix A.1. Vaihingen Dataset
Reference→ Predictions↓ | Imp. Surf | Buildings | Low Veg | Tree | Car | |
---|---|---|---|---|---|---|
FCN | Imp. Surf | 88.99 | 3.14 | 5.39 | 1.09 | 1.38 |
Buildings | 3.89 | 93.21 | 2.22 | 0.57 | 0.11 | |
Low Veg | 5.88 | 2.47 | 74.11 | 17.32 | 0.22 | |
Tree | 0.92 | 0.37 | 9.36 | 89.35 | 0.01 | |
Car | 15.60 | 1.71 | 1.00 | 0.57 | 81.11 | |
SegNet | Imp. Surf | 91.68 | 2.46 | 3.87 | 1.18 | 0.81 |
Buildings | 4.16 | 93.22 | 2.02 | 0.55 | 0.55 | |
Low Veg | 6.62 | 2.44 | 73.63 | 17.22 | 0.09 | |
Tree | 1.09 | 0.34 | 0.90 | 97.66 | 0.01 | |
Car | 17.31 | 0.80 | 0.90 | 0.72 | 80.27 | |
FPL | Imp. Surf | 91.66 | 2.24 | 4.47 | 3.96 | 1.24 |
Buildings | 3.24 | 93.46 | 2.74 | 4.14 | 1.40 | |
Low Veg | 6.47 | 1.75 | 76.21 | 15.50 | 0.07 | |
Tree | 1.32 | 0.51 | 9.87 | 88.28 | 0.03 | |
Car | 10.06 | 1.54 | 2.69 | 0.3 | 85.67 | |
HSN | Imp. Surf | 92.64 | 2.54 | 3.71 | 0.65 | 0.46 |
Buildings | 3.50 | 94.11 | 2.18 | 0.18 | 0.03 | |
Low Veg | 6.73 | 2.44 | 78.09 | 12.67 | 0.08 | |
Tree | 1.24 | 0.35 | 10.96 | 87.44 | 0.01 | |
Car | 15.91 | 1.96 | 1.22 | 0.32 | 85.59 |
Appendix A.2. Potsdam Dataset
Reference→ Predictions↓ | Imp. Surf | Buildings | Low Veg | Tree | Car | Clutter | |
---|---|---|---|---|---|---|---|
FCN | Imp. Surf | 85.52 | 2.36 | 4.84 | 1.80 | 1.09 | 4.39 |
Buildings | 1.69 | 93.79 | 1.59 | 1.55 | 0.30 | 1.08 | |
Low Veg | 2.24 | 0.74 | 87.19 | 8.30 | 0.12 | 1.41 | |
Tree | 4.01 | 0.88 | 15.06 | 78.54 | 0.87 | 0.64 | |
Car | 0.65 | 0.87 | 0.13 | 0.20 | 96.74 | 1.41 | |
Clutter | 16.87 | 17.99 | 12.83 | 2.87 | 8.27 | 41.17 | |
SegNet | Imp. Surf | 87.42 | 1.81 | 6.72 | 1.81 | 0.80 | 1.43 |
Buildings | 2.33 | 94.19 | 1.96 | 0.84 | 0.14 | 0.54 | |
Low Veg | 2.22 | 0.57 | 89.44 | 7.34 | 0.04 | 0.38 | |
Tree | 2.97 | 0.94 | 16.04 | 79.07 | 0.81 | 0.17 | |
Car | 1.39 | 1.07 | 1.30 | 0.32 | 95.73 | 1.37 | |
Clutter | 27.13 | 17.68 | 2.18 | 2.87 | 7.54 | 22.60 | |
FPL | Imp. Surf | 92.08 | 2.54 | 2.42 | 1.00 | 0.29 | 1.66 |
Buildings | 2.56 | 95.21 | 0.71 | 0.42 | 0.15 | 0.95 | |
Low Veg | 5.79 | 0.91 | 85.45 | 6.84 | 0.01 | 1.00 | |
Tree | 7.05 | 2.36 | 16.33 | 73.12 | 0.21 | 0.92 | |
Car | 4.74 | 2.56 | 0.34 | 2.36 | 83.14 | 6.85 | |
Clutter | 44.03 | 13.67 | 8.42 | 1.93 | 1.59 | 30.37 | |
HSN | Imp. Surf | 90.69 | 2.05 | 4.92 | 0.60 | 0.59 | 1.14 |
Buildings | 2.45 | 95.06 | 1.13 | 0.66 | 0.20 | 0.51 | |
Low Veg | 4.26 | 0.70 | 87.17 | 7.56 | 0.06 | 0.25 | |
Tree | 3.31 | 0.95 | 17.70 | 77.05 | 0.92 | 0.07 | |
Car | 1.04 | 1.87 | 0.08 | 0.11 | 96.81 | 0.08 | |
Clutter | 37.60 | 26.33 | 12.72 | 2.22 | 7.36 | 13.75 |
References
- Rees, W.G. Physical Principles of Remote Sensing; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
- Wang, Q.; Lin, J.; Yuan, Y. Salient band selection for hyperspectral image classification via manifold ranking. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1279–1289. [Google Scholar] [CrossRef] [PubMed]
- Yuan, Y.; Ma, D.; Wang, Q. Hyperspectral anomaly detection by graph pixel selection. IEEE Trans. Cybern. 2016, 46, 3123–3134. [Google Scholar] [CrossRef] [PubMed]
- Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Huang, J.; Kumar, S.R.; Mitra, M.; Zhu, W.J.; Zabih, R. Image indexing using color correlograms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 17–19 June 1997; pp. 762–768. [Google Scholar]
- Stehling, R.O.; Nascimento, M.A.; Falcão, A.X. A Compact and Efficient Image Retrieval Approach Based on Border/Interior Pixel Classification. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), McLean, VA, USA, 4–9 November 2002; pp. 102–109. [Google Scholar]
- Avila, S.; Thome, N.; Cord, M.; Valle, E.; AraúJo, A.D.A. Pooling in image representation: The visual codeword point of view. Comput. Vis. Image Underst. 2013, 117, 453–465. [Google Scholar] [CrossRef]
- Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA, 17–22 June 2006; pp. 2169–2178. [Google Scholar]
- Agrawal, R.; Gehrke, J.; Gunopulos, D.; Raghavan, P. Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 2005, 11, 5–33. [Google Scholar] [CrossRef]
- Lu, H.; Plataniotis, K.N.; Venetsanopoulos, A.N. A survey of multilinear subspace learning for tensor data. Pattern Recognit. 2011, 44, 1540–1551. [Google Scholar] [CrossRef]
- Peng, X.; Yu, Z.; Yi, Z.; Tang, H. Constructing the L2-graph for robust subspace learning and subspace clustering. IEEE Trans. Cybern. 2017, 47, 1053–1066. [Google Scholar] [CrossRef] [PubMed]
- Tokarczyk, P.; Wegner, J.D.; Walk, S.; Schindler, K. Features, Color Spaces, and Boosting: New Insights on Semantic Classification of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2015, 53, 280–295. [Google Scholar] [CrossRef]
- Cheriyadat, A.M. Unsupervised feature learning for aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2014, 52, 439–451. [Google Scholar] [CrossRef]
- Sun, H.; Sun, X.; Wang, H.; Li, Y.; Li, X. Automatic Target Detection in High-Resolution Remote Sensing Images Using Spatial Sparse Coding Bag-of-Words Model. IEEE Geosci. Remote Sens. Lett. 2012, 9, 109–113. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Stateline, NV, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- Razavian, A.S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Columbus, OH, USA, 23–28 June 2014; pp. 512–519. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. Comput. Vis. Pattern Recognit. 2015. [Google Scholar]
- Paisitkriangkrai, S.; Sherrah, J.; Janney, P.; Hengel, V.D. Effective semantic pixel labeling with convolutional networks and conditional random fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 36–43. [Google Scholar]
- Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1–9. [Google Scholar]
- Volpi, M.; Tuia, D. Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 1–13. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Taylor, G.W.; Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 2018–2025. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the International Conference on Machine Learning (ICML), Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Saxe, A.; Koh, P.W.; Chen, Z.; Bhand, M.; Suresh, B.; Ng, A.Y. On random weights and unsupervised feature learning. In Proceedings of the International conference on machine learning (ICML), Bellevue, WA, USA, 28 June–2 July 2011; pp. 1089–1096. [Google Scholar]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Lee, C.Y.; Gallagher, P.W.; Tu, Z. Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Cadiz, Spain, 9–11 May 2016. [Google Scholar]
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Farabet, C.; Couprie, C.; Najman, L.; LeCun, Y. Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1915–1929. [Google Scholar] [CrossRef] [PubMed]
- Pinheiro, P.; Collobert, R. Recurrent convolutional neural networks for scene parsing. In Proceedings of the International Conference on Machine Learning (ICML), Beijing, China, 21–26 June 2014; pp. 82–90. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning (ICML), San Diego, CA, USA, 7–9 May 2015; pp. 448–456. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Chen, W.; Fu, Z.; Yang, D.; Deng, J. Single-image depth perception in the wild. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016; pp. 730–738. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked Hourglass Networks for Human Pose Estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 483–499. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Los Alamitos, CA, USA, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
- Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. Neural Netw. Mach. Learn. 2012, 4, 26–30. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Los Alamitos, CA, USA, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Murphy, K.P.; Weiss, Y.; Jordan, M.I. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the Conference on Uncertainty in artificial intelligence (UAI), Stockholm, Sweden, 30 July–1 August 1999; pp. 467–475. [Google Scholar]
- Kschischang, F.R.; Frey, B.J.; Loeliger, H.A. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 2001, 47, 498–519. [Google Scholar] [CrossRef]
- ISPRS Vaihingen 2D Semantic Labeling Dataset. Available online: http://www2.isprs.org/commissions/comm3/wg4/2d-sem-label-vaihingen.html (accessed on 24 May 2017).
- ISPRS Potsdam 2D Semantic Labeling Dataset. Available online: http://www2.isprs.org/commissions/comm3/wg4/2d-sem-label-potsdam.html (accessed on 24 May 2017).
- Gerke, M. Use of the Stair Vision Library within the ISPRS 2D Semantic Labeling Benchmark (Vaihingen); Technical Report; University of Twente: Enschede, The Netherlands, 2015. [Google Scholar]
Layer ID | A | B | C | D | F |
---|---|---|---|---|---|
FCN | 3 × 3, 64 | 3 × 3, 128 | 3 × 3, 256 | 3 × 3, 512 | 16 × 16, 6 |
SegNet | 3 × 3, 64 | 3 × 3, 128 | 3 × 3, 256 | 3 × 3, 512 | Unpooling |
FPL | 7 × 7, 64 | 5 × 5, 64 | 5 × 5, 128 | 5 × 5, 256 | 2 × 2, 512 |
Network | FCN | SegNet | FPL | HSN |
---|---|---|---|---|
#Trainable weights | 7.82M | 15.27M | 5.66M | 5.56M |
Layer ID | conv1_1 | conv1_2 | conv2_1 | conv2_2 | conv3_1 | conv3_2 | conv4 |
---|---|---|---|---|---|---|---|
C | 1 × 1, 128 | 3 × 3, 128 | 1 × 1, 64 | 5 × 5, 32 | 1 × 1, 32 | 7 × 7, 32 | 1 × 1, 64 |
D | 1 × 1, 256 | 3 × 3, 384 | 1 × 1, 64 | 5 × 5, 32 | 1 × 1, 32 | 7 × 7, 32 | 1 × 1, 64 |
Overlap Percent | Imp.Surf | Buildings | Low Veg | Tree | Car | Average F-score | Overall Accuracy | |
---|---|---|---|---|---|---|---|---|
erGT | 0% | 90.89 | 94.51 | 78.83 | 87.84 | 81.87 | 86.79 | 88.32 |
25% | 91.18 | 94.60 | 79.57 | 88.19 | 83.23 | 87.35 | 88.67 | |
50% | 91.23 | 94.64 | 79.54 | 88.20 | 83.74 | 87.47 | 88.70 | |
75% | 91.32 | 94.66 | 79.73 | 88.30 | 83.60 | 87.52 | 88.79 | |
GT | 0% | 87.57 | 92.20 | 75.03 | 84.44 | 75.16 | 82.88 | 84.92 |
25% | 87.88 | 92.30 | 75.69 | 84.76 | 76.20 | 83.37 | 85.27 | |
50% | 87.92 | 92.34 | 75.64 | 84.77 | 76.61 | 83.46 | 85.29 | |
75% | 88.01 | 92.37 | 75.83 | 84.86 | 76.50 | 83.51 | 85.38 |
Network | Imp. Surf | Buildings | Low Veg | Tree | Car | Average F-Score | Overall Accuracy | |
---|---|---|---|---|---|---|---|---|
erGT | HSN | 90.89 | 94.51 | 78.83 | 87.84 | 81.87 | 86.79 | 88.32 |
HSN-NS | 89.40 | 93.68 | 78.90 | 87.57 | 62.17 | 82.34 | 87.48 | |
HSN-NI | 85.63 | 92.83 | 74.60 | 85.74 | 62.18 | 80.17 | 84.89 | |
GT | HSN | 87.57 | 92.20 | 75.03 | 84.44 | 75.16 | 82.88 | 84.92 |
HSN-NS | 85.94 | 91.25 | 74.78 | 84.08 | 56.26 | 78.46 | 83.92 | |
HSN-NI | 82.34 | 90.56 | 71.05 | 82.31 | 55.76 | 76.41 | 81.52 |
Methods | Imp. Surf | Buildings | Low Veg | Tree | Car | Average F-Score | Overall Accuracy | |
---|---|---|---|---|---|---|---|---|
erGT | FCN [21] | 89.41 | 93.80 | 76.46 | 86.63 | 71.32 | 83.52 | 86.75 |
SegNet [19] | 90.15 | 94.11 | 77.35 | 87.40 | 77.31 | 85.27 | 87.59 | |
FPL [22] | 90.43 | 94.62 | 78.11 | 86.81 | 66.81 | 83.36 | 87.70 | |
HSN | 90.89 | 94.51 | 78.83 | 87.84 | 81.87 | 86.79 | 88.32 | |
HSN + OI | 91.32 | 94.66 | 79.73 | 88.30 | 83.60 | 87.52 | 88.79 | |
HSN + OI + WBP | 91.34 | 94.67 | 79.83 | 88.31 | 83.59 | 87.55 | 88.82 | |
GT | FCN [21] | 85.82 | 91.27 | 72.39 | 83.30 | 63.10 | 79.18 | 83.18 |
SegNet [19] | 86.68 | 91.74 | 73.22 | 83.99 | 71.36 | 81.40 | 84.07 | |
FPL [22] | 86.62 | 92.03 | 73.73 | 82.73 | 57.68 | 78.56 | 83.69 | |
HSN | 87.57 | 92.20 | 75.03 | 84.44 | 75.16 | 82.88 | 84.92 | |
HSN + OI | 88.01 | 92.37 | 75.83 | 84.86 | 76.50 | 83.51 | 85.38 | |
HSN + OI + WBP | 88.00 | 92.34 | 75.92 | 84.86 | 75.95 | 83.41 | 85.39 |
Network | FCN | SegNet | FPL | HSN |
---|---|---|---|---|
Average inference time (s) | 0.78 | 1.54 | 6.2 | 3.17 |
Methods | Imp. Surf | Buildings | Low Veg | Tree | Car | Clutter | Average F-Score | Overall Accuracy | |
---|---|---|---|---|---|---|---|---|---|
erGT | FCN [21] | 89.73 | 94.87 | 84.24 | 76.67 | 81.64 | 28.39 | 75.92 | 87.40 |
SegNet [19] | 90.44 | 95.34 | 83.48 | 78.49 | 84.84 | 25.81 | 76.41 | 88.37 | |
FPL [22] | 90.59 | 95.34 | 83.54 | 75.58 | 85.62 | 17.59 | 74.71 | 88.12 | |
HSN | 91.39 | 95.49 | 83.91 | 78.86 | 86.28 | 17.77 | 75.62 | 88.97 | |
HSN + OI | 91.63 | 95.65 | 84.28 | 79.42 | 87.47 | 17.95 | 76.07 | 89.29 | |
HSN + OI + WBP | 91.77 | 95.71 | 84.40 | 79.56 | 88.25 | 17.76 | 76.24 | 89.42 | |
GT | FCN [21] | 87.36 | 93.83 | 81.73 | 74.06 | 76.63 | 29.01 | 73.77 | 85.04 |
SegNet [19] | 88.10 | 94.37 | 81.05 | 75.76 | 79.40 | 24.72 | 73.90 | 86.02 | |
FPL [22] | 88.55 | 94.31 | 81.13 | 72.90 | 80.52 | 16.30 | 72.29 | 85.93 | |
HSN | 89.01 | 94.42 | 81.18 | 76.09 | 81.05 | 15.35 | 72.85 | 86.56 | |
HSN + OI | 89.26 | 94.60 | 81.54 | 76.63 | 82.08 | 15.36 | 73.25 | 86.89 | |
HSN + OI + WBP | 89.45 | 94.66 | 81.67 | 76.78 | 82.97 | 15.12 | 73.44 | 87.05 |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Minh Nguyen, D.; Deligiannis, N.; Ding, W.; Munteanu, A. Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery. Remote Sens. 2017, 9, 522. https://doi.org/10.3390/rs9060522
Liu Y, Minh Nguyen D, Deligiannis N, Ding W, Munteanu A. Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery. Remote Sensing. 2017; 9(6):522. https://doi.org/10.3390/rs9060522
Chicago/Turabian StyleLiu, Yu, Duc Minh Nguyen, Nikos Deligiannis, Wenrui Ding, and Adrian Munteanu. 2017. "Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery" Remote Sensing 9, no. 6: 522. https://doi.org/10.3390/rs9060522
APA StyleLiu, Y., Minh Nguyen, D., Deligiannis, N., Ding, W., & Munteanu, A. (2017). Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery. Remote Sensing, 9(6), 522. https://doi.org/10.3390/rs9060522