Abstract
RGB-D camera can provide effective support with additional depth cue for many RGB-D perception tasks beyond traditional RGB information. However, current feature representations based on RGB-D camera utilize depth information only to extract local features, without considering it for the improvement of robustness and discriminability of the feature representation by merging depth cues into feature pooling. Spatial pyramid model (SPM) has become the standard protocol to split 2D image plane into sub-regions for feature pooling in RGB-D object classification. We argue that SPM may not be the optimal pooling scheme for RGB-D images, as it only pools features spatially and completely discards the depth topological information. Instead, we propose a novel joint spatial-depth pooling scheme (JSDP) which further partitions SPM using the depth cue and pools features simultaneous in 2D image plane and the depth direction. Embedding the JSDP with the standard feature extraction and feature encoding modules, we achieve superior performance to the state-of-the-art methods on benchmarks for RGB-D object classification and detection.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Zhang, Q., et al.: When 3D reconstruction meets ubiquitous RGB-D images. In: CVPR, vol. 1, pp. 700–707 (2014)
Ren, X., et al.: RGB-(D) scene labeling: features and algorithms. In: CVPR (2012)
Lai, K., Bo, L., Ren, X., Fox, D.: RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark. Consumer Depth Cameras for Computer Vision, 167–192 (2013)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 345–360. Springer, Heidelberg (2014)
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: International Conference on Intelligent Robots and Systems, vol. 1, pp. 821–826 (2011)
Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 525–538. Springer, Heidelberg (2013)
Ikemura, S., Fujiyoshi, H.: Real-time human detection using relational depth similarity features. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part IV. LNCS, vol. 6495, pp. 25–38. Springer, Heidelberg (2011)
Fischer, J., Bormann, R., Arbeiter, G., Verl, A.: A feature descriptor for texture-less object representation using 2D and 3D cues from RGB-D data. In: ICRA, vol. 1, pp. 2112–2117 (2013)
Blum, M., Springenberg, J.T., Wulfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-D data. In: ICRA, vol. 1, pp. 1298–1303 (2012)
Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: Desai, J.P., Dudek, G., Khatib, O., Kumar, V. (eds.) Experimental Robotics. STAR, vol. 88, pp. 387–402. Springer, Heidelberg (2013)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, vol. 2, pp. 2169–2178 (2006)
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Dalal, N., Triggs, B., Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005)
Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS (2010)
Wang, P., et al.: Supervised kernel descriptor for visual recognition. In: CVPR (2013)
Pan, H., Olsen, S.I., Zhu, Y.: Object classification and detection with context kernel descriptors. In: Bayro-Corrochano, E., Hancock, E. (eds.) CIARP 2014. LNCS, vol. 8827, pp. 827–835. Springer, Heidelberg (2014)
Banerjee, J., Moelker, A., Niessen, W.J., van Walsum, T.: 3D LBP-based rotationally invariant region description. In: Park, J.-I., Kim, J. (eds.) ACCV Workshops 2012, Part I. LNCS, vol. 7728, pp. 26–37. Springer, Heidelberg (2013)
Spinello, L., Arras, K.: People detection in RGB-D data. In: ICIRS, vol. 1, pp. 3838–3843 (2011)
Wu, J., Rehg, J.M.: Beyond the euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: ICCV, vol. 1, pp. 630–637 (2009)
Wang, X., Bai, X., Liu, W., Latecki, L.J.: Feature context for image classification and object detection. In: CVPR, vol. 1, pp. 961–968 (2011)
Yan, S., Xu, X., Xu, D., Lin, S., Li, X.: Beyond spatial pyramids: a new feature extraction framework with dense spatial sampling for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 473–487. Springer, Heidelberg (2012)
Jia, Y., Huang, C., Darrell, T.: Beyond spatial pyramids: receptive field learning for pooled image features. In: CVPR, vol. 1, pp. 3370–3377 (2012)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI 24(7), 971–987 (2002)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: ICRA, vol. 1, pp. 1817–1824 (2011)
Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining RGB and depth information. In: ICRA, vol. 1, pp. 4007–4013 (2011)
Jhuo, I.-H., Gao, S., Zhuang, L., Lee, D.T., Ma, Y.: Unsupervised feature learning for RGB-D image classification. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9003, pp. 276–289. Springer, Heidelberg (2015)
Everingham, M., et al.: The PASCAL visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Pan, H., Olsen, S.I., Zhu, Y. (2015). Joint Spatial-Depth Feature Pooling for RGB-D Object Classification. In: Paulsen, R., Pedersen, K. (eds) Image Analysis. SCIA 2015. Lecture Notes in Computer Science(), vol 9127. Springer, Cham. https://doi.org/10.1007/978-3-319-19665-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-19665-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19664-0
Online ISBN: 978-3-319-19665-7
eBook Packages: Computer ScienceComputer Science (R0)