[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3404555.3404629acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaiConference Proceedingsconference-collections
research-article

SEB-Net: Revisiting Deep Encoder-Decoder Networks for Scene Understanding

Published: 20 August 2020 Publication History

Abstract

As a research area of computer vision and deep learning, scene understanding has attracted a lot of attention in recent years. One major challenge encountered is obtaining high levels of segmentation accuracy while dealing with the computational cost and time associated with training or inference. Most current algorithms compromise one metric for the other depending on the intended devices. To address this problem, this paper proposes a novel deep neural network architecture called Segmentation Efficient Blocks Network (SEB-Net) that seeks to achieve the best possible balance between accuracy and computational costs as well as real-time inference speed. The model is composed of both an encoder path and a decoder path in a symmetric structure. The encoder path consists of 16 convolution layers identical to a VGG-19 model, and the decoder path includes what we call E-blocks (Efficient Blocks) inspired by the widely popular ENet architecture's bottleneck module with slight modifications. One advantage of this model is that the max-unpooling in the decoder path is employed for expansion and projection convolutions in the E-Blocks, allowing for less learnable parameters and efficient computation (10.1 frames per second (fps) for a 480x320 input, 11x fewer parameters than DeconvNet, 52.4 GFLOPs for a 640x360 input on a TESLA K40 GPU device). Experimental results on two outdoor scene datasets; Cambridge-driving Labeled Video Database (CamVid) and Cityscapes, indicate that SEB-Net can achieve higher performance compared to Fully Convolutional Networks (FCN), SegNet, DeepLabV, and Dilation8 in most cases. What's more, SEB-Net outperforms efficient architectures like ENet and LinkNet by 16.1 and 11.6 respectively in terms of Instance-level intersection over Union (iLoU). SEB-Net also shows better performance when further evaluated on the SUNRGB-D, an indoor scene dataset

References

[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Adv. Neural Inf. Process. Syst., pp. 1--9, 2012.
[2]
K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," pp. 1--14, 2014.
[3]
X. Jiang, Y. Wang, W. Liu, S. Li, and J. Liu, "CapsNet, CNN, FCN: Comparative performance evaluation for image classification," Int. J. Mach. Learn. Comput., vol. 9, no. 6, pp. 840--848, 2019.
[4]
O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge," Int. J. Comput. Vis., vol. 115, pp. 211--252, 2015.
[5]
C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1--9, 2015.
[6]
U. Iqbal, A. Milan, and J. Gall, "PoseTrack: Joint Multi-Person Pose Estimation and Tracking," 2011.
[7]
V. Badrinarayanan, A. Kendall, and R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481--2495, 2017.
[8]
J. Long, E. Shelhamer, and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation."
[9]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6230--6239, 2017.
[10]
H. Noh, S. Hong, and B. Han, "Learning deconvolution network for semantic segmentation," Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 1520--1528, 2015.
[11]
J. Li, Y. Wu, J. Zhao, L. Guan, C. Ye, and T. Yang, "Pedestrian detection with dilated convolution, region proposal network and boosted decision trees," Proc. Int. Jt. Conf. Neural Networks, vol. 2017-May, pp. 4052--4057, 2017.
[12]
S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, "The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation," IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., vol. 2017-July, pp. 1175--1183, 2017.
[13]
O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge," Int. J. Comput. Vis., vol. 115, no. 3, pp. 211--252, 2015.
[14]
A. Krizhevsky, "(CIFAR10) Learning Multiple Layers of Features from Tiny Images" ... Sci. Dep. Univ. Toronto, Tech. ..., pp. 1--60, 2009.
[15]
P. LeCun, Yann; Bottou, L.;Bengio, Y.;Haffner, "Lecun-01a," Proc. IEEE, vol. 86, no. 11, pp. 2278--2324, 1998.
[16]
W. Sun and R. Wang, "Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined with DSM," IEEE Geosci. Remote Sens. Lett., vol. 15, no. 3, pp. 474--478, 2018.
[17]
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs," IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834--848, 2018.
[18]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[19]
A. Chaurasia and E. Culurciello, "LinkNet: Exploiting encoder representations for efficient semantic segmentation," 2017 IEEE Vis. Commun. Image Process. VCIP 2017, vol. 2018-Janua, pp. 1--4, 2018.
[20]
A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, "ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation," pp. 1--10, 2016, [Online]. Available: http://arxiv.org/abs/1606.02147.
[21]
P. Sturgess, K. Alahari, L. Ladický, and P. H. S. Torr, "Combining appearance and structure from motion features for road scene understanding," Br. Mach. Vis. Conf. BMVC 2009-Proc., pp. 1--11, 2009.
[22]
A. Kendall, V. Badrinarayanan, and R. Cipolla, "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding," 2019.
[23]
O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9351, pp. 234--241, 2015.
[24]
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," 2017 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2261--2269, 2017.
[25]
O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," pp. 1--8.
[26]
A. F. Agarap, "Deep Learning using Rectified Linear Units (ReLU)," no. 1, pp. 2--8, 2018, [Online]. Available: http://arxiv.org/abs/1803.08375.
[27]
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 1026--1034, 2015.
[28]
A. Paszke et al., "Automatic differentiation in PyTorch," no. Nips, pp. 1--4, 2017.
[29]
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-fei, "ImageNet: A Large-Scale Hierarchical Image Database," 2009 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 248--255, 2009.
[30]
G. J. Brostow, J. Fauqueur, and R. Cipolla, "Semantic object classes in video: A high-definition ground truth database," Pattern Recognit. Lett., vol. 30, no. 2, pp. 88--97, 2009.
[31]
M. Cordts et al., "The Cityscapes Dataset for Semantic Urban Scene Understanding," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 3213--3223, 2016.
[32]
A. Handa, P. Viorica, V. Badrinarayanan, and S. Stent, "SceneNet: Understanding Real World Indoor Scenes With Synthetic Data," arXiv Prepr., 2015.
[33]
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from RGBD images," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 7576 LNCS, no. PART 5, pp. 746--760, 2012.
[34]
A. Janoch et al., "A Category-level 3-D Database: Putting the Kinect to Work," ICCV 2011 Work. Consum. Depth Cameras Comput. Vis., 2011.
[35]
J. Xiao, A. Owens, and A. Torralba, "SUN3D: A database of big spaces reconstructed using SfM and object labels," Proc. IEEE Int. Conf. Comput. Vis., pp. 1625--1632, 2013.

Index Terms

  1. SEB-Net: Revisiting Deep Encoder-Decoder Networks for Scene Understanding

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCAI '20: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence
    April 2020
    563 pages
    ISBN:9781450377089
    DOI:10.1145/3404555
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • University of Tsukuba: University of Tsukuba

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Semantic segmentation
    2. decoder networks
    3. e-blocks, scene understanding
    4. encoder networks

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICCAI '20

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 40
      Total Downloads
    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media