More Web Proxy on the site http://driver.im/

research-article

Depth-Adaptive Deep Neural Network for Semantic Segmentation

Authors:

Byeongkeun Kang,

Truong Q. NguyenAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 20, Issue 9

Pages 2478 - 2490

https://doi.org/10.1109/TMM.2018.2798282

Published: 01 September 2018 Publication History

Abstract

In this paper, we present the depth-adaptive deep neural network using a depth map for semantic segmentation. Typical deep neural networks receive inputs at the predetermined locations regardless of the distance from the camera. This fixed receptive field presents a challenge to generalize the features of objects at various distances in neural networks. Specifically, the predetermined receptive fields are too small at a short distance, and vice versa. To overcome this challenge, we develop a neural network that is able to adapt the receptive field not only for each layer but also for each neuron at the spatial location. To adjust the receptive field, we propose the depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive perception neuron and the in-layer multiscale neuron. The adaptive perception neuron is to adjust the receptive field at each spatial location using the corresponding depth information. The in-layer multiscale neuron is to apply the different size of the receptive field at each feature space to learn features at multiple scales. The proposed DaM convolution is applied to two fully convolutional neural networks. We demonstrate the effectiveness of the proposed neural networks on the publicly available RGB-D dataset for semantic segmentation and the novel hand segmentation dataset for hand-object interaction. The experimental results show that the proposed method outperforms the state-of-the-art methods without any additional layers or preprocessing/postprocessing.

References

[1]

Z. Zhang, “ Microsoft kinect sensor and its effect,” IEEE MultiMedia, vol. Volume 19, no. Issue 2, pp. 4–10, 2012.

Digital Library

[2]

M. Adams and P. Probert, “ The interpretation of phase and intensity data from AMCW light detection sensors for reliable ranging,” Int. J. Robot. Res., vol. Volume 15, no. Issue 5, pp. 441–458, 1996.

Digital Library

[3]

B. Schwarz, “ Lidar: Mapping the world in 3D,” Nature Photon., vol. Volume 4, pp. 429–430, 2010.

[4]

Z. Lee and T. Q. Nguyen, “ Multi-resolution disparity processing and fusion for large high-resolution stereo image,” IEEE Trans. Multimedia, vol. Volume 17, no. Issue 6, pp. 792–803, 2015.

[5]

Z. Lee and T. Nguyen, “ Multi-array camera disparity enhancement,” IEEE Trans. Multimedia, vol. Volume 16, no. Issue 8, pp. 2168–2177, 2014.

[6]

R. Ranftl, V. Vineet, Q. Chen, and V. Koltun, “ Dense monocular depth estimation in complex dynamic scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4058–4066.

[7]

J. Shotton et al., “ Real-time human pose recognition in parts from single depth images,” in Proc. Comput. Vis. Pattern Recognit., 2011, pp. 1297–1304.

Digital Library

[8]

J. Shotton et al., “ Efficient human pose estimation from single depth images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 35, no. Issue 12, pp. 2821–2840, 2013.

Digital Library

[9]

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “ Indoor segmentation and support inference from RGBD images,” in Computer Vision—ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V . Berlin, Germany: Springer, 2012, pp. 746–760.

Digital Library

[10]

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “ Vision meets robotics: The kitti dataset,” Int. J. Robot. Res., vol. Volume 32, pp. 1231–1237, 2013.

Digital Library

[11]

M. Cordts et al., “ The cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3213–3223.

[12]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ Imagenet classification with deep convolutional neural networks,” in Proc. 25th Int. Conf. Neural Inf. Process. Syst., vol. Volume 1, Red Hook, NY, USA: Curran Associates Inc., 2012, pp. 1097–1105. {Online}. Available: http://dl.acm.org/citation.cfm?id=2999134.2999257

Digital Library

[13]

K. Simonyan and A. Zisserman, “ Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015.

[14]

K. He, X. Zhang, S. Ren, and J. Sun, “ Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[15]

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “ Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 580–587.

Digital Library

[16]

R. Girshick, “ Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1440–1448.

Digital Library

[17]

S. Ren, K. He, R. Girshick, and J. Sun, “ Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 39, no. Issue 6, pp. 1137–1149, 2017.

Digital Library

[18]

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “ You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 779–788.

[19]

S. Tripathi, Z. Lipton, S. Belongie, and T. Nguyen, “ Context matters: Refining object detection in video with recurrent neural networks,” in Proc. Brit. Mach. Vis. Conf., 2016, pp. 44.1–44.12.

[20]

J. Long, E. Shelhamer, and T. Darrell, “ Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3431–3440.

[21]

E. Shelhamer, J. Long, and T. Darrell, “ Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 39, no. Issue 4, pp. 640–651, 2017.

Digital Library

[22]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “ Semantic image segmentation with deep convolutional nets and fully connected CRFS,” in Proc. Int. Conf. Learn. Representations, 2015.

[23]

L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “ Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS,” IEEE Trans. Pattern Anal. Mach. Intell., to be published.

[24]

F. Yu and V. Koltun, “ Multi-scale context aggregation by dilated convolutions,” in Proc. Int. Conf. Learn. Representations, 2016.

[25]

J. Tompson, M. Stein, Y. Lecun, and K. Perlin, “ Real-time continuous pose recovery of human hands using convolutional networks,” ACM Trans. Graph., vol. Volume 33, no. Issue 5, pp. 169:1–169:10, 2014. {Online}. Available:

Digital Library

[26]

L. Ge, H. Liang, J. Yuan, and D. Thalmann, “ Robust 3d hand pose estimation in single depth images: From single-view CNN to multi-view CNNS,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3593–3601.

[27]

A. Sinha, C. Choi, and K. Ramani, “ Deephand: Robust hand pose estimation by completing a matrix imputed with deep features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4150–4158.

[28]

K. Wang, S. Zhai, H. Cheng, X. Liang, and L. Lin, “ Human pose estimation from depth images via inference embedded multi-task learning,” in Proc. ACM Multimedia Conf., New York, NY, USA: ACM, 2016, pp. 1227–1236.

Digital Library

[29]

B. Kang, S. Tripathi, and T. Q. Nguyen, “ Real-time sign language fingerspelling recognition using convolutional neural networks from depth map,” in Proc. 3rd IAPR Asian Conf. Pattern Recognit., 2015, pp. 136–140.

[30]

S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “ Learning rich features from RGB-D images for object detection and segmentation,” in Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, Proceedings, Part VII . Berlin, Germany: Springer, 2014, pp. 345–360.

[31]

S. Zheng et al., “ Conditional random fields as recurrent neural networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1529–1537.

Digital Library

[32]

I. Oikonomidis, N. Kyriazis, and A. A. Argyros, “ Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 2088–2095.

Digital Library

[33]

J. Romero, H. Kjellstrm, and D. Kragic, “ Hands in action: real-time 3D reconstruction of hands in interaction with objects,” in Proc. IEEE Int. Conf. Robot. Autom., 2010, pp. 458–463.

[34]

J. Romero, H. Kjellström, C. H. Ek, and D. Kragic, “ Non-parametric hand pose estimation with object context,” Image Vis. Comput., vol. Volume 31, no. Issue 8, pp. 555–564, 2013.

Digital Library

[35]

Y. Wang et al., “ Video-based hand manipulation capture through composite motion control,” ACM Trans. Graph., vol. Volume 32, no. Issue 4, pp. 43:1–43:14, 2013. {Online}. Available:

Digital Library

[36]

J. A. Palmer, K. Kreutz-Delgado, and S. Makeig, “ Super-Gaussian mixture source model for ICA,” in Independent Component Analysis and Blind Signal Separation, Berlin, Heidelberg, Germany: Springer, 2006, pp. 854–861.

Digital Library

[37]

M. J. Jones and J. M. Rehg, “ Statistical color models with application to skin detection,” Int. J. Comput. Vis., vol. Volume 46, pp. 81–96, 2002.

Digital Library

[38]

D. Tzionas and J. Gall, “ 3D object reconstruction from hand-object interactions,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 729–737.

Digital Library

[39]

Y. Jeon and J. Kim, “ Active convolution: Learning the shape of convolution for image classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1846–1854.

[40]

J. Dai et al., “ Deformable convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 764–773.

[41]

G. Strang and T. Nguyen, Wavelets and Filter Banks . Philadelphia, PA, USA: SIAM, 1996.

[42]

B. Kang, K.-H. Tan, N. Jiang, H.-S. Tai, D. Tretter, and T. Nguyen, “ Hand segmentation for hand-object interaction from depth map,” in Proc. IEEE Global Conf. Signal Inform. Process., 2017.

[43]

C. M. Bishop, Pattern Recognition and Machine Learning . New York, NY, USA: Springer-Verlag, 2006.

Digital Library

[44]

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.

Digital Library

[45]

T. Apostol, Mathematical Analysis (Ser. Addison-Wesley Series in Mathematics). Reading, MA, USA: Addison-Wesley, 1974.

[46]

S. Gupta, P. Arbelez, and J. Malik, “ Perceptual organization and recognition of indoor scenes from RGB-D images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 564–571.

Digital Library

[47]

O. Russakovsky et al., “ Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. Volume 115, no. Issue 3, pp. 211–252, 2015.

Digital Library

Cited By

Liang YChen NYu ZTang LYu HGuo BZeng D(2024)Learning Cross-modality Interaction for Robust Depth Perception of Autonomous DrivingACM Transactions on Intelligent Systems and Technology10.1145/365003915:3(1-26)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1145/3650039
Qin ZLiu JZhang XTian MZhou AYi SLi H(2024)Pyramid Fusion Transformer for Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2024.339628126(9630-9643)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3396281
Wen CHuang HMa YYuan FZhu H(2024)Dual-Guided Frequency Prototype Network for Few-Shot Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2024.338327626(8874-8888)Online publication date: 29-Mar-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3383276
Show More Cited By

Depth-Adaptive Deep Neural Network for Semantic Segmentation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Design of Adaptive Robot Control System Using Recurrent Neural Network

The use of a new Recurrent Neural Network (RNN) for controlling a robot manipulator is presented in this paper. The RNN is a modification of Elman network. In order to solve load uncertainties, a fast-load adaptive identification is also employed in a ...
A Simple Aplysia-Like Spiking Neural Network to Generate Adaptive Behavior in Autonomous Robots

In this article, we describe an adaptive controller for an autonomous mobile robot with a simple structure. Sensorimotor connections were made using a three-layered spiking neural network (SNN) with only one hidden-layer ...
A Deep-Learning-Based Method For Spike Sorting With Spiking Neural Network
ICBSP '22: Proceedings of the 2022 7th International Conference on Biomedical Imaging, Signal Processing

Spike sorting is a technique of detecting and assigning signals generated by the neurons of the brain and classifying which spike belongs to which neurons. Spike sorting is usually done by a clustering algorithm that groups the spikes coming from the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 20, Issue 9

September 2018

306 pages

ISSN:1520-9210

Issue’s Table of Contents

Copyright © 2018.

Publisher

IEEE Press

Publication History

Published: 01 September 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liang YChen NYu ZTang LYu HGuo BZeng D(2024)Learning Cross-modality Interaction for Robust Depth Perception of Autonomous DrivingACM Transactions on Intelligent Systems and Technology10.1145/365003915:3(1-26)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1145/3650039
Qin ZLiu JZhang XTian MZhou AYi SLi H(2024)Pyramid Fusion Transformer for Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2024.339628126(9630-9643)Online publication date: 28-May-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3396281
Wen CHuang HMa YYuan FZhu H(2024)Dual-Guided Frequency Prototype Network for Few-Shot Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2024.338327626(8874-8888)Online publication date: 29-Mar-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3383276
Cong RXiong HChen JZhang WHuang QZhao Y(2024)Query-Guided Prototype Evolution Network for Few-Shot SegmentationIEEE Transactions on Multimedia10.1109/TMM.2024.335292126(6501-6512)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3352921
Tran Thanh DLee YKang B(2024)Enhancing long-term person re-identification using global, local body part, and head streamsNeurocomputing10.1016/j.neucom.2024.127480580:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127480
Hoang CKang B(2024)Pixel-level clustering network for unsupervised image segmentationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107327127:PBOnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107327
Liu HPeng PChen TWang QYao YHua X(2023)FECANet: Boosting Few-Shot Semantic Segmentation With Feature-Enhanced Context-Aware NetworkIEEE Transactions on Multimedia10.1109/TMM.2023.323852125(8580-8592)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2023.3238521
Zhou XDing RWang YWei WLiu H(2023)Cellular Binary Neural Network for Accurate Image Classification and Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2022.323325525(8064-8075)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3233255
Xu YHe FDu BTao DZhang L(2023)Self-Ensembling GAN for Cross-Domain Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2022.322997625(7837-7850)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3229976
Niu KLiu YWu EXing G(2023)A Boundary-Aware Network for Shadow RemovalIEEE Transactions on Multimedia10.1109/TMM.2022.321442225(6782-6793)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3214422
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents