[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Spatial Structure Preserving Feature Pyramid Network for Semantic Image Segmentation

Published: 31 August 2019 Publication History

Abstract

Recently, progress on semantic image segmentation is substantial, benefiting from the rapid development of Convolutional Neural Networks. Semantic image segmentation approaches proposed lately have been mostly based on Fully convolutional Networks (FCNs). However, these FCN-based methods use large receptive fields and too many pooling layers to depict the discriminative semantic information of the images. Specifically, on one hand, convolutional kernel with large receptive field smooth the detailed edges, since too much contexture information is used to depict the “center pixel.” However, the pooling layer increases the receptive field through zooming out the latest feature maps, which loses many detailed information of the image, especially in the deeper layers of the network. These operations often cause low spatial resolution inside deep layers, which leads to spatially fragmented prediction. To address this problem, we exploit the inherent multi-scale and pyramidal hierarchy of deep convolutional networks to extract the feature maps with different resolutions and take full advantages of these feature maps via a gradually stacked fusing way. Specifically, for two adjacent convolutional layers, we upsample the features from deeper layer with stride of 2 and then stack them on the features from shallower layer. Then, a convolutional layer with kernels of 1× 1 is followed to fuse these stacked features. The fused feature preserves the spatial structure information of the image; meanwhile, it owns strong discriminative capability for pixel classification. Additionally, to further preserve the spatial structure information and regional connectivity of the predicted category label map, we propose a novel loss term for the network. In detail, two graph model-based spatial affinity matrixes are proposed, which are used to depict the pixel-level relationships in the input image and predicted category label map respectively, and then their cosine distance is backward propagated to the network. The proposed architecture, called spatial structure preserving feature pyramid network, significantly improves the spatial resolution of the predicted category label map for semantic image segmentation. The proposed method achieves state-of-the-art results on three public and challenging datasets for semantic image segmentation.

References

[1]
A. E. Abdel-Hakim and A. A Farag. 2006. CSIFT: A SIFT descriptor with color invariant characteristics. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1978--1983.
[2]
Eduardo Aguilar, Beatriz Remeseiro, Marc Bolaos, and Petia Radeva. 2018. Grab, pay and eat: Semantic food detection for smart restaurants. IEEE Trans. Multimedia 20, 12 (2018), 3266--3275.
[3]
Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross Girshick. 2016. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2874--2883.
[4]
Carlos Castillo, Soham De, Xintong Han, Bharat Singh, Abhay Kumar Yadav, and Tom Goldstein. 2017. Son of Zorn’s lemma: Targeted style transfer using instance-aware semantic segmentation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17). IEEE, 1348--1352.
[5]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018a. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 834--848.
[6]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
[7]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018b. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV'18). 801--818.
[8]
Jifeng Dai, Kaiming He, and Jian Sun. 2015. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. (2015), 1635--1643.
[9]
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 886--893.
[10]
Clément Dechesne, Clément Mallet, Arnaud Le Bris, and Valérie Gouet-Brunet. 2017. Semantic segmentation of forest stands of pure species combining airborne lidar data and very high resolution multispectral imagery. ISPRS J. Photogram. Remote Sens. 126 (2017), 129--145.
[11]
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2013. DeCAF: A deep convolutional activation feature for generic visual recognition. Comput. Sci. 50, 1 (2013), 815--830.
[12]
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. Comput. Sci. (2014), 2366--2374.
[13]
Pedro F. Felzenszwalb, Ross B. Girshick, David Mcallester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627.
[14]
Yaroslav Ganin and Victor Lempitsky. 2014. N<sup>4</sup>-Fields: Neural Network Nearest Neighbor Fields for Image Transforms. Springer International Publishing. 536--551.
[15]
Golnaz Ghiasi and Charless C. Fowlkes. 2016. Laplacian pyramid reconstruction and refinement for semantic segmentation. In Proceedings of the European Conference on Computer Vision. 519--534.
[16]
Saurabh Gupta, Pablo Arbelaez, and Jitendra Malik. 2013. Perceptual organization and recognition of indoor scenes from RGB-D images. In Computer Vision and Pattern Recognition. 564--571.
[17]
Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. 8695 (2014), 345--360.
[18]
Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. 2011. Semantic contours from inverse detectors. In Proceedings of the International Conference on Computer Vision. 991--998.
[19]
B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. 2015. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 447--456.
[20]
Yang He, Wei Chen Chiu, Margret Keuper, and Mario Fritz. 2017. STD2P: RGBD semantic segmentation using spatio-temporal data-driven pooling. (2017).
[21]
Andrew Holliday, Mohammadamin Barekatain, Johannes Laurmaa, Chetak Kandaswamy, and Helmut Prendinger. 2017. Speedup of deep learning ensembles for semantic segmentation using a model compression technique. Computer Vision and Image Understanding (2017).
[22]
Sina Honari, Jason Yosinski, Pascal Vincent, and Christopher Pal. 2016. Recombinator networks: Learning coarse-to-fine feature aggregation. In Computer Vision and Pattern Recognition. 5743--5752.
[23]
Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3204--3212.
[24]
Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In Proceedings of the International Conference on Machine Learning. 597--606.
[25]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2261--2269.
[26]
Jia, Yangqing, Shelhamer, Evan, Donahue, Jeff, Karayev, Sergey, Long, and Jonathan. 2014. Caffe: Convolutional architecture for fast feature embedding. eprint arxiv (2014), 675--678.
[27]
Michael Kampffmeyer, Arnt Borre Salberg, and Robert Jenssen. 2016. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 680--688.
[28]
Byeongkeun Kang, Yeejin Lee, and Truong Q. Nguyen. 2018. Depth adaptive deep neural network for semantic segmentation. IEEE Trans. Multimedia 20, 9 (2018), 2478--2490.
[29]
Ronald Kemker, Carl Salvaggio, and Christopher Kanan. 2017. High-resolution multispectral dataset for semantic segmentation. (2017).
[30]
Tao Kong, Anbang Yao, Yurong Chen, and Fuchun Sun. 2016. HyperNet: Towards accurate region proposal generation and joint object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 845--853.
[31]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.
[32]
Guosheng Lin, Chunhua Shen, Anton Van Den Hengel, and Ian Reid. 2016. Efficient piecewise training of deep structured models for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3194--3203.
[33]
Tsung Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference and Computer Vision and Pattern Recognition. 2117--2125.
[34]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot MultiBox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21--37.
[35]
Wei Liu, Andrew Rabinovich, and Alexander C. Berg. 2015. ParseNet: Looking wider to see better. Comput. Sci. (2015).
[36]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 79, 10 (2015), 1337--1342.
[37]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 483--499.
[38]
Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2016. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 1520--1528.
[39]
Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. ENet: A deep neural network architecture for real-time semantic segmentation. (2016).
[40]
Pedro O. Pinheiro, Tsung Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. In Proceedings of the European Conference on Computer Vision. Springer, 75--91.
[41]
D. Ravi, H. Fabelo, G. M. Callico, and G. Yang. 2017. Manifold embedding and semantic segmentation for intraoperative guidance with hyperspectral brain imaging. IEEE Trans. Medical Imag. 36, 9 (2017), 1845--1857.
[42]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234--241.
[43]
Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann Lecun. 2013. OverFeat: Integrated recognition, localization and detection using convolutional networks. eprint Arxiv (2013).
[44]
Laura Sevillalara, Deqing Sun, Varun Jampani, and Michael J. Black. 2016. Optical flow with semantic segmentation and localized layers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3889--3898.
[45]
Hengcan Shi, Hongliang Li, Fanman Meng, Qingbo Wu, Linfeng Xu, and King N. Ngan. 2018. Hierarchical parsing net: Semantic scene parsing from global scene to objects. IEEE Trans. Multimedia 20, 10 (2018), 2670--2682.
[46]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Proceedings of the European Conference on Computer Vision. 746--760.
[47]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014).
[48]
Nasim Souly, Concetto Spampinato, and Mubarak Shah. 2017. Semi and weakly supervised semantic segmentation using generative adversarial network. (2017).
[49]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition. 1--9.
[50]
Joseph Tighe and Svetlana Lazebnik. 2010. SuperParsing: Scalable nonparametric image parsing with superpixels. Int. J. Comput. Vis. 101, 2 (2010), 352--365.
[51]
Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018. Understanding convolution for semantic segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV'18). IEEE, 1451--1460.
[52]
Huaxin Xiao, Jiashi Feng, Yunchao Wei, Maojun Zhang, and Shuicheng Yan. 2018. Deep salient object detection with dense connections and distraction diagnosis. IEEE Trans. Multimedia 20, 12 (2018), 3239--3251.
[53]
Wei Xu, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013), 221--231.
[54]
Robail Yasrab. 2017. DCSeg: Decoupled CNN for classification and semantic segmentation. In Proceedings of the IEEE Sponsored International Conference on Knowledge and Smart Technologies.
[55]
Hao Zhou, Jun Zhang, Shuohao Lei, Jun, and Dan Tu. 2016. Image semantic segmentation based on FCN-CRF model. In Proceedings of the International Conference on Image, Vision and Computing. 9--14.

Cited By

View all
  • (2024)Multi-Modal LiDAR Point Cloud Semantic Segmentation with Salience Refinement and Boundary PerceptionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497920:10(1-20)Online publication date: 1-Jul-2024
  • (2024)Learning Nighttime Semantic Segmentation the Hard WayACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003220:7(1-23)Online publication date: 4-Mar-2024
  • (2024)Multi-Content Interaction Network for Few-Shot SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364385020:6(1-20)Online publication date: 8-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 3
August 2019
331 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3352586
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 August 2019
Accepted: 01 March 2019
Revised: 01 January 2019
Received: 01 August 2018
Published in TOMM Volume 15, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Semantic image segmentation
  2. discriminative capability
  3. feature pyramid network
  4. spatial resolution

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Young Top-notch Talent Program of Chinese Academy of Sciences
  • National Key R8D Program of China
  • CAS “Light of West China” Program
  • National Natural Science Foundation of China
  • Key Research Program of Frontier Sciences, CAS

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-Modal LiDAR Point Cloud Semantic Segmentation with Salience Refinement and Boundary PerceptionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497920:10(1-20)Online publication date: 1-Jul-2024
  • (2024)Learning Nighttime Semantic Segmentation the Hard WayACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003220:7(1-23)Online publication date: 4-Mar-2024
  • (2024)Multi-Content Interaction Network for Few-Shot SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364385020:6(1-20)Online publication date: 8-Mar-2024
  • (2024)E-TPE: Efficient Thumbnail-Preserving Encryption for Privacy Protection in Visual Sensor NetworksACM Transactions on Sensor Networks10.1145/359261120:4(1-26)Online publication date: 11-May-2024
  • (2024)Unpaired Multisource Imagery Joint Learning via Lightweight Network for Tread Tire Recognition2024 6th International Conference on Natural Language Processing (ICNLP)10.1109/ICNLP60986.2024.10692996(585-591)Online publication date: 22-Mar-2024
  • (2024)Multiple Feature Fusion Based on Hierarchical Constraint for Crack Detection2024 6th International Conference on Natural Language Processing (ICNLP)10.1109/ICNLP60986.2024.10692429(439-445)Online publication date: 22-Mar-2024
  • (2024)Automatic Generation of a Portuguese Land Cover Map with Machine LearningIntelligent Systems and Applications10.1007/978-3-031-47721-8_3(36-58)Online publication date: 10-Jan-2024
  • (2023)Complementary Coarse-to-Fine Matching for Video Object SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359649619:6(1-21)Online publication date: 16-May-2023
  • (2023)Dual-Field-of-View Context Aggregation and Boundary Perception for Airport Runway ExtractionIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.327167661(1-12)Online publication date: 2023
  • (2022)External Attention Based TransUNet and Label Expansion Strategy for Crack DetectionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.315440723:10(19054-19063)Online publication date: Oct-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media