Joint Spatial-Depth Feature Pooling for RGB-D Object Classification

Hong Pan^15,16,
Søren Ingvor Olsen¹⁵ &
Yaping Zhu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9127))

Included in the following conference series:

Scandinavian Conference on Image Analysis

2722 Accesses

Abstract

RGB-D camera can provide effective support with additional depth cue for many RGB-D perception tasks beyond traditional RGB information. However, current feature representations based on RGB-D camera utilize depth information only to extract local features, without considering it for the improvement of robustness and discriminability of the feature representation by merging depth cues into feature pooling. Spatial pyramid model (SPM) has become the standard protocol to split 2D image plane into sub-regions for feature pooling in RGB-D object classification. We argue that SPM may not be the optimal pooling scheme for RGB-D images, as it only pools features spatially and completely discards the depth topological information. Instead, we propose a novel joint spatial-depth pooling scheme (JSDP) which further partitions SPM using the depth cue and pools features simultaneous in 2D image plane and the depth direction. Embedding the JSDP with the standard feature extraction and feature encoding modules, we achieve superior performance to the state-of-the-art methods on benchmarks for RGB-D object classification and detection.

Download to read the full chapter text

Chapter PDF

MI-RPN: Integrating multi-modalities and multi-scales information for region proposal

Article 20 December 2023

Object Detection in RGB-D Images via Anchor Box with Multi-Reduced Region Proposal Network and Multi-Pooling

Article 02 August 2021

RGB and Depth Image Fusion for Object Detection Using Deep Learning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Zhang, Q., et al.: When 3D reconstruction meets ubiquitous RGB-D images. In: CVPR, vol. 1, pp. 700–707 (2014)
Google Scholar
Ren, X., et al.: RGB-(D) scene labeling: features and algorithms. In: CVPR (2012)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark. Consumer Depth Cameras for Computer Vision, 167–192 (2013)
Google Scholar
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 345–360. Springer, Heidelberg (2014)
Chapter Google Scholar
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: International Conference on Intelligent Robots and Systems, vol. 1, pp. 821–826 (2011)
Google Scholar
Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 525–538. Springer, Heidelberg (2013)
Chapter Google Scholar
Ikemura, S., Fujiyoshi, H.: Real-time human detection using relational depth similarity features. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part IV. LNCS, vol. 6495, pp. 25–38. Springer, Heidelberg (2011)
Chapter Google Scholar
Fischer, J., Bormann, R., Arbeiter, G., Verl, A.: A feature descriptor for texture-less object representation using 2D and 3D cues from RGB-D data. In: ICRA, vol. 1, pp. 2112–2117 (2013)
Google Scholar
Blum, M., Springenberg, J.T., Wulfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-D data. In: ICRA, vol. 1, pp. 1298–1303 (2012)
Google Scholar
Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: Desai, J.P., Dudek, G., Khatib, O., Kumar, V. (eds.) Experimental Robotics. STAR, vol. 88, pp. 387–402. Springer, Heidelberg (2013)
Chapter Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, vol. 2, pp. 2169–2178 (2006)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B., Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005)
Google Scholar
Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS (2010)
Google Scholar
Wang, P., et al.: Supervised kernel descriptor for visual recognition. In: CVPR (2013)
Google Scholar
Pan, H., Olsen, S.I., Zhu, Y.: Object classification and detection with context kernel descriptors. In: Bayro-Corrochano, E., Hancock, E. (eds.) CIARP 2014. LNCS, vol. 8827, pp. 827–835. Springer, Heidelberg (2014)
Chapter Google Scholar
Banerjee, J., Moelker, A., Niessen, W.J., van Walsum, T.: 3D LBP-based rotationally invariant region description. In: Park, J.-I., Kim, J. (eds.) ACCV Workshops 2012, Part I. LNCS, vol. 7728, pp. 26–37. Springer, Heidelberg (2013)
Chapter Google Scholar
Spinello, L., Arras, K.: People detection in RGB-D data. In: ICIRS, vol. 1, pp. 3838–3843 (2011)
Google Scholar
Wu, J., Rehg, J.M.: Beyond the euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: ICCV, vol. 1, pp. 630–637 (2009)
Google Scholar
Wang, X., Bai, X., Liu, W., Latecki, L.J.: Feature context for image classification and object detection. In: CVPR, vol. 1, pp. 961–968 (2011)
Google Scholar
Yan, S., Xu, X., Xu, D., Lin, S., Li, X.: Beyond spatial pyramids: a new feature extraction framework with dense spatial sampling for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 473–487. Springer, Heidelberg (2012)
Chapter Google Scholar
Jia, Y., Huang, C., Darrell, T.: Beyond spatial pyramids: receptive field learning for pooled image features. In: CVPR, vol. 1, pp. 3370–3377 (2012)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI 24(7), 971–987 (2002)
Article Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: ICRA, vol. 1, pp. 1817–1824 (2011)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining RGB and depth information. In: ICRA, vol. 1, pp. 4007–4013 (2011)
Google Scholar
Jhuo, I.-H., Gao, S., Zhuang, L., Lee, D.T., Ma, Y.: Unsupervised feature learning for RGB-D image classification. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9003, pp. 276–289. Springer, Heidelberg (2015)
Chapter Google Scholar
Everingham, M., et al.: The PASCAL visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Copenhagen, 2100, København Ø, Denmark
Hong Pan, Søren Ingvor Olsen & Yaping Zhu
School of Automation, Southeast University, Nanjing, 210096, China
Hong Pan

Authors

Hong Pan
View author publications
You can also search for this author in PubMed Google Scholar
Søren Ingvor Olsen
View author publications
You can also search for this author in PubMed Google Scholar
Yaping Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Pan .

Editor information

Editors and Affiliations

Technical University of Denmark, Lyngby, Denmark
Rasmus R. Paulsen
University of Copenhagen, Copenhagen, Denmark
Kim S. Pedersen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pan, H., Olsen, S.I., Zhu, Y. (2015). Joint Spatial-Depth Feature Pooling for RGB-D Object Classification. In: Paulsen, R., Pedersen, K. (eds) Image Analysis. SCIA 2015. Lecture Notes in Computer Science(), vol 9127. Springer, Cham. https://doi.org/10.1007/978-3-319-19665-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-19665-7_26
Published: 09 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19664-0
Online ISBN: 978-3-319-19665-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)