Abstract
In semantic scene segmentation, every pixel of an image is assigned a category label. This task can be made easier by incorporating depth information, which structured light sensors provide. Depth, however, has very different properties from RGB image channels. In this paper, we present a novel method to provide depth information to convolutional neural networks. For this purpose, we apply a simplified version of the histogram of oriented depth (HOD) descriptor to the depth channel. We evaluate the network on the challenging NYU Depth V2 dataset and show that with our method, we can reach competitive performance at a high frame rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Schulz, H., Behnke, S.: Learning object-class segmentation with convolutional neural networks. In: Eur. Symp. on Art. Neural Networks (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Adv. in Neural Information Processing Systems (2012)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Scene parsing with multiscale feature learning, purity trees, and optimal covers. arXiv preprint arXiv:1202.2160 (2012)
Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: Int. Conf. on Computer Vision (ICCV) Workshops (2011)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor Segmentation and Support Inference from RGBD Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)
Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor Semantic Segmentation using depth information. CoRR abs/1301.3572 (2013)
Sharp, T.: Implementing decision trees and forests on a GPU. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 595–608. Springer, Heidelberg (2008)
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Communications of the ACM (2013)
Stückler, J., Waldvogel, B., Schulz, H., Behnke, S.: Dense real-time mapping of object-class semantics from RGB-D video. Journal of Real-Time Image Processing (2013)
Müller, A.C., Behnke, S.: Learning Depth-Sensitive Conditional Random Fields for Semantic Segmentation of RGB-D Images. In: Int. Conf. on Robotics and Automation, ICRA (2014)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, CVPR (2005)
Spinello, L., Arras, K.O.: People detection in RGB-D data. In: Int. Conf. on Intelligent Robots and Systems (IROS). IEEE (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Höft, N., Schulz, H., Behnke, S. (2014). Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks. In: Lutz, C., Thielscher, M. (eds) KI 2014: Advances in Artificial Intelligence. KI 2014. Lecture Notes in Computer Science(), vol 8736. Springer, Cham. https://doi.org/10.1007/978-3-319-11206-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-11206-0_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11205-3
Online ISBN: 978-3-319-11206-0
eBook Packages: Computer ScienceComputer Science (R0)