Learning 6D Object Pose Estimation Using 3D Object Coordinates

Eric Brachmann¹⁹,
Alexander Krull¹⁹,
Frank Michel¹⁹,
Stefan Gumhold¹⁹,
Jamie Shotton²⁰ &
…
Carsten Rother¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8690))

Included in the following conference series:

European Conference on Computer Vision

21k Accesses
252 Citations
6 Altmetric

Abstract

This work addresses the problem of estimating the 6D Pose of specific objects from a single RGB-D image. We present a flexible approach that can deal with generic objects, both textured and texture-less. The key new concept is a learned, intermediate representation in form of a dense 3D object coordinate labelling paired with a dense class labelling. We are able to show that for a common dataset with texture-less objects, where template-based techniques are suitable and state of the art, our approach is slightly superior in terms of accuracy. We also demonstrate the benefits of our approach, compared to template-based techniques, in terms of robustness with respect to varying lighting conditions. Towards this end, we contribute a new ground truth dataset with 10k images of 20 objects captured each under three different lighting conditions. We demonstrate that our approach scales well with the number of objects and has capabilities to run fast.

Download to read the full chapter text

Chapter PDF

BOP: Benchmark for 6D Object Pose Estimation

FoundPose: Unseen Object Pose Estimation with Foundation Features

Q-PSO: Fast Quaternion-Based Pose Estimation from RGB-D Images

Article 17 October 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: ISER (2012)
Google Scholar
Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image Analysis. Springer (2013)
Google Scholar
Damen, D., Bunnun, P., Calway, A., Mayol-Cuevas, W.: Real-time learning and detection of 3D texture-less objects: A scalable approach. In: BMVC (2012)
Google Scholar
Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: Efficient and robust 3D object recognition. In: CVPR (2010)
Google Scholar
Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough Forests for object detection, tracking, and action recognition. IEEE Trans. on PAMI 33(11) (2011)
Google Scholar
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: ICCV (2011)
Google Scholar
Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.: Gradient response maps for real-time detection of texture-less objects. IEEE Trans. on PAMI (2012)
Google Scholar
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013)
Chapter Google Scholar
Hoiem, D., Rother, C., Winn, J.: 3D LayoutCRF for multi-view object class recognition and segmentation. In: CVPR (2007)
Google Scholar
Holzer, S., Shotton, J., Kohli, P.: Learning to efficiently detect repeatable interest points in depth data. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 200–213. Springer, Heidelberg (2012)
Chapter Google Scholar
Huttenlocher, D., Klanderman, G., Rucklidge, W.: Comparing images using the hausdorff distance. IEEE Trans. on PAMI (1993)
Google Scholar
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., Fitzgibbon, A.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST (2011)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: ICRA. IEEE (2011)
Google Scholar
Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Trans. on PAMI 28(9) (2006)
Google Scholar
Lowe, D.G.: Local feature view clustering for 3d object recognition. In: CVPR (2001)
Google Scholar
Martinez, M., Collet, A., Srinivasa, S.S.: Moped: A scalable and low latency object recognition and pose estimation system. In: ICRA (2010)
Google Scholar
Newcombe, R., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: Real-time dense surface mapping and tracking. In: ISMAR (2011)
Google Scholar
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)
Google Scholar
Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Trans. on PAMI (2010)
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)
Google Scholar
Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: A real time scalable approach. In: ICCV (2013)
Google Scholar
Rosten, E., Porter, R., Drummond, T.: FASTER and better: A machine learning approach to corner detection. IEEE Trans. on PAMI 32 (2010)
Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: CVPR (2011)
Google Scholar
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in rgb-d images. In: CVPR (2013)
Google Scholar
Shotton, J., Girshick, R.B., Fitzgibbon, A.W., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A.: Efficient human pose estimation from single depth images. IEEE Trans. on PAMI 35(12) (2013)
Google Scholar
Steger, C.: Similarity measures for occlusion, clutter, and illumination invariant object recognition. In: DAGM-S (2001)
Google Scholar
Sun, M., Bradski, G., Xu, B.-X., Savarese, S.: Depth-encoded hough voting for joint object detection and shape recovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 658–671. Springer, Heidelberg (2010)
Chapter Google Scholar
Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.: The Vitruvian Manifold: Inferring dense correspondences for one-shot human pose estimation. In: CVPR (2012)
Google Scholar
Ferrari, V., Jurie, F., Schmid, C.: From images to shape models for object detection. In: IJCV (2009)
Google Scholar
Winder, S., Hua, G., Brown, M.: Picking the best DAISY. In: CVPR (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

TU Dresden, Dresden, Germany
Eric Brachmann, Alexander Krull, Frank Michel, Stefan Gumhold & Carsten Rother
Microsoft Research, Cambridge, UK
Jamie Shotton

Authors

Eric Brachmann
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Krull
View author publications
You can also search for this author in PubMed Google Scholar
Frank Michel
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Gumhold
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Shotton
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Rother
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
KU Leuven, ESAT - PSI, iMinds, Kasteelpark Arenberg, 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 11,260 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C. (2014). Learning 6D Object Pose Estimation Using 3D Object Coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8690. Springer, Cham. https://doi.org/10.1007/978-3-319-10605-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-10605-2_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10604-5
Online ISBN: 978-3-319-10605-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning 6D Object Pose Estimation Using 3D Object Coordinates

Abstract

Chapter PDF

Similar content being viewed by others

BOP: Benchmark for 6D Object Pose Estimation

FoundPose: Unseen Object Pose Estimation with Foundation Features

Q-PSO: Fast Quaternion-Based Pose Estimation from RGB-D Images

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 11,260 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning 6D Object Pose Estimation Using 3D Object Coordinates

Abstract

Chapter PDF

Similar content being viewed by others

BOP: Benchmark for 6D Object Pose Estimation

FoundPose: Unseen Object Pose Estimation with Foundation Features

Q-PSO: Fast Quaternion-Based Pose Estimation from RGB-D Images

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 11,260 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation