Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Saurabh Gupta¹⁹,
Ross Girshick¹⁹,
Pablo Arbeláez^19,20 &
…
Jitendra Malik¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8695))

Included in the following conference series:

European Conference on Computer Vision

23k Accesses
6 Altmetric

Abstract

In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature representations with convolutional neural networks. Our final object detection system achieves an average precision of 37.3%, which is a 56% relative improvement over existing methods. We then focus on the task of instance segmentation where we label pixels belonging to object instances found by our detector. For this task, we propose a decision forest approach that classifies pixels in the detection window as foreground or background using a family of unary and binary tests that query shape and geocentric pose features. Finally, we use the output from our object detectors in an existing superpixel classification framework for semantic scene segmentation and achieve a 24% relative improvement over current state-of-the-art for the object categories that we study. We believe advances such as those represented in this paper will facilitate the use of perception in fields like robotics.

Download to read the full chapter text

Chapter PDF

Complete 3D Scene Parsing from an RGBD Image

Article 21 November 2018

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation

Keywords

References

Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR (2014)
Google Scholar
Arbeláez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. TPAMI (2011)
Google Scholar
Banica, D., Sminchisescu, C.: CPMC-3D-O2P: Semantic segmentation of RGB-D images using CPMC and second order pooling. CoRR abs/1312.7715 (2013)
Google Scholar
Bo, L., Ren, X., Fox, D.: Unsupervised Feature Learning for RGB-D Based Object Recognition. In: ISER (2012)
Google Scholar
Breiman, L.: Random forests. Machine Learning (2001)
Google Scholar
Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor semantic segmentation using depth information. CoRR abs/1301.3572 (2013)
Google Scholar
Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC 2012) (2012), http://www.image-net.org/challenges/LSVRC/2012/
Dollár, P.: Piotr’s Image and Video Matlab Toolbox (PMT), http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html
Dollár, P., Zitnick, C.L.: Structured forests for fast edge detection. In: ICCV (2013)
Google Scholar
Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. CoRR abs/1406.5549 (2014)
Google Scholar
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: A deep convolutional activation feature for generic visual recognition. In: ICML (2014)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. JMRL (2008)
Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. TPAMI (2013)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. TPAMI (2010)
Google Scholar
Geman, D., Amit, Y., Wilder, K.: Joint induction of shape features and tree classifiers. TPAMI (1997)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Guo, R., Hoiem, D.: Support surface prediction in indoor scenes. In: ICCV (2013)
Google Scholar
Gupta, S., Arbeláez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: CVPR (2013)
Google Scholar
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, Springer, Heidelberg (2014)
Google Scholar
Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3D object dataset: Putting the kinect to work. In: Consumer Depth Cameras for Computer Vision (2013)
Google Scholar
Jia, Y.: Caffe: An open source convolutional architecture for fast feature embedding (2013), http://caffe.berkeleyvision.org/
Soo Kim, B., Xu, S., Savarese, S.: Accurate localization of 3D objects from RGB-D data using segmentation hypotheses. In: CVPR (2013)
Google Scholar
Koppula, H., Anand, A., Joachims, T., Saxena, A.: Semantic labeling of 3D point clouds for indoor scenes. In: NIPS (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: ICRA (2011)
Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computation (1989)
Google Scholar
Lim, J.J., Zitnick, C.L., Dollár, P.: Sketch tokens: A learned mid-level representation for contour and object detection. In: CVPR (2013)
Google Scholar
Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: ICCV (2013)
Google Scholar
Ren, X., Bo, L.: Discriminatively trained sparse code gradients for contour detection. In: NIPS (2012)
Google Scholar
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: Features and algorithms. In: CVPR (2012)
Google Scholar
Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
Google Scholar
Shrivastava, A., Gupta, A.: Building part-based object detectors via 3D geometry. In: ICCV (2013)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)
Chapter Google Scholar
Socher, R., Huval, B., Bath, B.P., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS (2012)
Google Scholar
Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 525–538. Springer, Heidelberg (2013)
Chapter Google Scholar
Tighe, J., Niethammer, M., Lazebnik, S.: Scene parsing with object instances and occlusion ordering. In: CVPR (2014)
Google Scholar
Wang, T., He, X., Barnes, N.: Learning structured hough voting for joint object detection and occlusion reasoning. In: CVPR (2013)
Google Scholar
Ye, E.S.: Object Detection in RGB-D Indoor Scenes. Master’s thesis, EECS Department, University of California, Berkeley (January 2013), http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-3.html

Download references

Author information

Authors and Affiliations

University of California, Berkeley, USA
Saurabh Gupta, Ross Girshick, Pablo Arbeláez & Jitendra Malik
Universidad de los Andes, Colombia
Pablo Arbeláez

Authors

Saurabh Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ross Girshick
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Arbeláez
View author publications
You can also search for this author in PubMed Google Scholar
Jitendra Malik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
PSI, iMinds, KU Leuven, ESAT, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 12,149 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, S., Girshick, R., Arbeláez, P., Malik, J. (2014). Learning Rich Features from RGB-D Images for Object Detection and Segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8695. Springer, Cham. https://doi.org/10.1007/978-3-319-10584-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-10584-0_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10583-3
Online ISBN: 978-3-319-10584-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Abstract

Chapter PDF

Similar content being viewed by others

Complete 3D Scene Parsing from an RGBD Image

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 12,149 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Abstract

Chapter PDF

Similar content being viewed by others

Complete 3D Scene Parsing from an RGBD Image

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 12,149 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation