HsgNet: A Road Extraction Network Based on Global Perception of High-Order Spatial Information
<p>Illustrating the uniqueness and difficulty of road extraction. The first row: original DeepGlobe test images. The second row: the road extracted using LinkNet. The green represents the areas that were marked as roads but were not predicted or were misidentified by LinkNet. (<b>a</b>) The slenderness road, (<b>b</b>) the geometric features being similar to those of a gully led to a road being misidentified, (<b>c</b>) texture, and other features were extremely similar to the surrounding environment, (<b>d</b>) tree obscuration, and (<b>e</b>) a complex topological connectivity led to roads being unrecognized.</p> "> Figure 2
<p>The architecture of HsgNet. The blue rectangle represents the multichannel features map, and the yellow rectangle represents the high-order spatial information global perception module. The model is divided into three parts: an Encoder, Middle Block, and Decoder. ResNet34 was used as the encoder, and the decoder used the original decoder part of LinkNet. D-LinkNet uses several dilated convolution layers as the intermediate module, while HsgNet uses a high-order spatial information global perception module as the intermediate module.</p> "> Figure 3
<p>The Middle Block between the Encoder and Decoder.It contains three convolutions. All convolution kernel sizes are 1 × 1. First, we use bilinear pooling to capture the second-order statistics of features and generate feature resource pool. Then, a set of attention coefficients is used to recover the features of each position from feature resource pool. We inserted this module into the middle part of LinkNet to form HsgNet (<a href="#ijgi-08-00571-f002" class="html-fig">Figure 2</a>).</p> "> Figure 4
<p>The precision–recall curves of D-LinkNet and HsgNet on DeepGlobe.</p> "> Figure 5
<p>The visualization of results on the DeepGlobe test set: (<b>a</b>) input images; (<b>b</b>) ground truth; (<b>c</b>) U-Net; (<b>d</b>) LinkNet; (<b>e</b>) D-LinkNet; (<b>f</b>) HsgNet. The red box indicates the location where our method was a significant improvement over the other methods.</p> "> Figure 6
<p>Clustering visualization: (<b>a</b>) D-LinkNet based on dilated convolution, (<b>b</b>) HsgNet with high-order spatial information global perception. The purple clusters represent road features, and the yellow represents background or other features.</p> "> Figure 7
<p>The visualization of different channels’ features: (<b>a</b>) before (first row) and after (second row) adding dilated convolution based on D-LinkNet, and (<b>b</b>) before (first row) and after (second row) adding the Middle Block of HsgNet. Each image represents a feature map of different channels, and different brightness levels represent the sizes of activation values.</p> ">
Abstract
:1. Introduction
- A novel encoder–decoder network, HsgNet, is proposed for road extraction; this only replaces the center dilation part of D-LinkNet with a higher-order global spatial information perception module, named Middle Block in Section 3.
- We introduce bilinear pooling to model the Middle Block, including three important steps presented in Section 3.2. The Middle Block based on bilinear pooling not only makes full use of the global spatial information but also preserves the high-order (second-order) information and dependencies of different feature channels.
2. Related Work
3. Methods
3.1. HsgNet
3.2. Middle Block: High-Order Spatial Information Global Perception
4. Experiments and Results
4.1. Data Sets
4.2. Implementation Details
4.3. Metric
4.4. Results
4.5. Analysis
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Alshehhi, R.; Marpu, P.R. Hierarchical graph-based segmentation for extracting road networks from high-resolution satellite images. ISPRS J. Photogramm. Remote Sens. 2017, 126, 245–260. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef] [Green Version]
- Sujatha, C.; Selvathi, D. Connected component-based technique for automatic extraction of road centerline in high resolution satellite images. J. Image Video Proc. 2015, 2015, 8. [Google Scholar] [CrossRef] [Green Version]
- Laptev, I.; Mayer, H.; Lindeberg, T.; Eckstein, W.; Steger, C.; Baumgartner, A. Automatic extraction of roads from aerial images based on scale space and snakes. Mach. Vis. Appl. 2000, 12, 23–31. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Z.; Zhang, X.; Sun, Y.; Zhang, P. Road Centerline Extraction from Very-High-Resolution Aerial Image and LiDAR Data Based on Road Connectivity. Remote Sens. 2018, 10, 1284. [Google Scholar] [CrossRef] [Green Version]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2016, arXiv:1605.06211. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv 2016, arXiv:1606.00915. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2016, arXiv:1612.01105. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Chaurasia, A.; Culurciello, E. LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), Saint Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
- Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 192–194. [Google Scholar]
- Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; p. 172. [Google Scholar]
- Van Etten, A.; Lindenbaum, D.; Bacastow, T.M. SpaceNet: A Remote Sens. Dataset and Challenge Series. arXiv 2018, arXiv:1807.01232. [Google Scholar]
- Wegner, J.D.; Montoya-Zegarra, J.A.; Schindler, K. A Higher-Order CRF Model for Road Network Extraction. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1698–1705. [Google Scholar]
- Chai, D.; Forstner, W.; Lafarge, F. Recovering Line-Networks in Images by Junction-Point Processes. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1894–1901. [Google Scholar]
- Liu, J.; Qin, Q.; Li, J.; Li, Y. Rural Road Extraction from High-Resolution Remote Sens. Images Based on Geometric Feature Inference. IJGI 2017, 6, 314. [Google Scholar] [CrossRef] [Green Version]
- Song, M.; Civco, D. Road Extraction Using SVM and Image Segmentation. Photogramm. Eng. Remote Sens. 2004, 70, 1365–1371. [Google Scholar] [CrossRef] [Green Version]
- Das, S.; Mirnalinee, T.T.; Varghese, K. Use of Salient Features for the Design of a Multistage Framework to Extract Roads From High-Resolution Multispectral Satellite Images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3906–3931. [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, Department of Computer Science, University of Toronto, Toronto, ON, Canada, 2013. [Google Scholar]
- Saito, S.; Yamashita, T.; Aoki, Y. Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks. J. Imaging Sci. Technol. 2016, 60, 104021–104029. [Google Scholar] [CrossRef]
- Bastani, F.; He, S.; Abbar, S.; Alizadeh, M.; Balakrishnan, H.; Chawla, S.; Madden, S.; DeWitt, D. RoadTracer: Automatic Extraction of Road Networks from Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4720–4728. [Google Scholar]
- Xia, W.; Zhang, Y.-Z.; Liu, J.; Luo, L.; Yang, K. Road Extraction from High Resolution Image with Deep Convolution Network—A Case Study of GF-2 Image. Proceedings 2018, 2, 325. [Google Scholar] [CrossRef] [Green Version]
- Batra, A.; Singh, S.; Pang, G.; Basu, S.; Jawahar, C.V.; Paluri, M. Improved Road Connectivity by Joint Learning of Orientation and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Qiqi Zhu; Yanfei Zhong; Yanfei Liu; Liangpei Zhang; Deren Li A Deep-Local-Global Feature Fusion Framework for High Spatial Resolution Imagery Scene Classification. Remote Sens. 2018, 10, 568. [CrossRef] [Green Version]
- Xu, Y.; Xie, Z.; Feng, Y.; Chen, Z. Road Extraction from High-Resolution Remote Sens. Imagery Using Deep Learning. Remote Sens. 2018, 10, 1461. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.-Y.; RoyChowdhury, A.; Maji, S. Bilinear CNN Models for Fine-Grained Visual Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1449–1457. [Google Scholar]
- Gao, Y.; Beijbom, O.; Zhang, N.; Darrell, T. Compact Bilinear Pooling. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 317–326. [Google Scholar]
- Fukui, A.; Park, D.H.; Yang, D.; Rohrbach, A.; Darrell, T.; Rohrbach, M. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 2–6 November 2016; pp. 457–468. [Google Scholar]
- Kong, S.; Fowlkes, C. Low-Rank Bilinear Pooling for Fine-Grained Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7025–7034. [Google Scholar]
- Kim, J.-H.; On, K.-W. Hadamard Product for Low-Rank Bilinear Pooling. arXiv 2017, arXiv:1610.04325. [Google Scholar]
- Wei, X.; Zhang, Y.; Gong, Y.; Zhang, J.; Zheng, N. Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11207, pp. 365–380. ISBN 978-3-030-01218-2. [Google Scholar]
- Yu, Z.; Yu, J.; Xiang, C.; Fan, J.; Tao, D. Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5947–5959. [Google Scholar] [CrossRef] [Green Version]
- Yu, C.; Zhao, X.; Zheng, Q.; Zhang, P.; You, X. Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11220, pp. 595–610. ISBN 978-3-030-01269-4. [Google Scholar]
- Li, P.; Xie, J.; Wang, Q.; Zuo, W. Is Second-Order Information Helpful for Large-Scale Visual Recognition? In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2089–2097. [Google Scholar]
- Carreira, J.; Caseiro, R.; Batista, J.; Sminchisescu, C. Semantic Segmentation with Second-Order Pooling. In Computer Vision—ECCV 2012; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7578, pp. 430–443. ISBN 978-3-642-33785-7. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; p. 8. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? arXiv 2014, arXiv:1411.1792. [Google Scholar]
- Zeiler, M.D.; Taylor, G.W.; Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2018–2025. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. In Proceedings of the Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; p. 236. [Google Scholar]
- Chen, L.-C.; Yang, Y.; Wang, J.; Xu, W.; Yuille, A.L. Attention to Scale: Scale-aware Semantic Image Segmentation. arXiv 2016, arXiv:1511.03339. [Google Scholar]
- Liu, M.; Yin, H. Cross Attention Network for Semantic Segmentation. arXiv 2019, arXiv:1907.10958. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. arXiv 2018, arXiv:1809.02983. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 10, 1. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Chen, Y.; Kalantidis, Y.; Li, J.; Yan, S.; Feng, J. A^2-Nets: Double Attention Networks. Adv. Neural Inf. Process. Syst. 2018, 10, 352–361. [Google Scholar]
- Kingma, D.P.; Lei, J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. CCNet: Criss-Cross Attention for Semantic Segmentation. arXiv 2018, arXiv:1811.11721. [Google Scholar]
- Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. PSANet: Point-wise Spatial Attention Network for Scene Parsing. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11213, pp. 270–286. ISBN 978-3-030-01239-7. [Google Scholar]
Methods | DeepGlobe (input size 1024*1024) | SpaceNet (input size 512*512) | Model Size | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | mIoU | time | P | R | F1 | mIoU | time | ||
U-Net [9] | 78.6 | 79.7 | 79.2 | 65.3 | 36 | 80.9 | 79.8 | 80.3 | 67.1 | 15 | 158.0 |
LinkNet [10] | 81.7 | 81.7 | 81.7 | 69.1 | 54 | 81.9 | 82.1 | 81.9 | 69.3 | 20 | 86.7 |
D-LinkNet [11] | 82.6 | 82.6 | 82.6 | 70.5 | 59 | 82.4 | 82.9 | 82.6 | 70.1 | 29 | 124.5 |
HsgNet | 83.0 | 82.8 | 82.9 | 71.1 | 57 | 81.6 | 84.5 | 83.0 | 71.0 | 29 | 88.9 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, Y.; Miao, F.; Zhou, K.; Peng, J. HsgNet: A Road Extraction Network Based on Global Perception of High-Order Spatial Information. ISPRS Int. J. Geo-Inf. 2019, 8, 571. https://doi.org/10.3390/ijgi8120571
Xie Y, Miao F, Zhou K, Peng J. HsgNet: A Road Extraction Network Based on Global Perception of High-Order Spatial Information. ISPRS International Journal of Geo-Information. 2019; 8(12):571. https://doi.org/10.3390/ijgi8120571
Chicago/Turabian StyleXie, Yan, Fang Miao, Kai Zhou, and Jing Peng. 2019. "HsgNet: A Road Extraction Network Based on Global Perception of High-Order Spatial Information" ISPRS International Journal of Geo-Information 8, no. 12: 571. https://doi.org/10.3390/ijgi8120571
APA StyleXie, Y., Miao, F., Zhou, K., & Peng, J. (2019). HsgNet: A Road Extraction Network Based on Global Perception of High-Order Spatial Information. ISPRS International Journal of Geo-Information, 8(12), 571. https://doi.org/10.3390/ijgi8120571