[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15096))

Included in the following conference series:

Abstract

Predicting camera-space hand meshes from single RGB images is crucial for enabling realistic hand interactions in 3D virtual and augmented worlds. Previous work typically divided the task into two stages: given a cropped image of the hand, predict meshes in relative coordinates, followed by lifting these predictions into camera space in a separate and independent stage, often resulting in the loss of valuable contextual and scale information. To prevent the loss of these cues, we propose unifying these two stages into an end-to-end solution that addresses the 2D-3D correspondence problem. This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module. We also introduce an image rectification step that harmonizes both the training dataset and the input image as if they were acquired with the same camera, helping to alleviate the inherent scale-depth ambiguity of the problem. We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches across three public benchmarks.

E. Valassakis—Now at Synthesia. Work done while at Niantic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 49.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Antotsiou, D., Garcia-Hernando, G., Kim, T.K.: Task-oriented hand motion retargeting for dexterous manipulation imitation. In: ECCV Workshop (2018)

    Google Scholar 

  2. Apple: Vision Pro. https://www.apple.com/apple-vision-pro/. Accessed 7 Mar 2024

  3. Armagan, A., et al.: Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3D hand pose estimation under hand-object interaction. In: ECCV (2020)

    Google Scholar 

  4. Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: CVPR (2019)

    Google Scholar 

  5. Baek, S., Kim, K.I., Kim, T.K.: Weakly-supervised domain adaptation via GAN and mesh model for estimating 3D hand poses interacting objects. In: CVPR (2020)

    Google Scholar 

  6. Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Loopreg: self-supervised learning of implicit surface correspondences, pose and shape for 3D human mesh registration. In: NeurIPS (2020)

    Google Scholar 

  7. Bhowmik, A., Gumhold, S., Rother, C., Brachmann, E.: Reinforced feature points: optimizing feature detection and description for a high-level task. In: CVPR (2020)

    Google Scholar 

  8. Boukhayma, A., Bem, R.D., Torr, P.H.: 3D hand shape and pose from images in the wild. In: CVPR (2019)

    Google Scholar 

  9. Brachmann, E., et al.: DSAC-differentiable RANSAC for camera localization. In: CVPR (2017)

    Google Scholar 

  10. Cao, Z., Radosavovic, I., Kanazawa, A., Malik, J.: Reconstructing hand-object interactions in the wild. In: CVPR (2021)

    Google Scholar 

  11. Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating PNP optimization. In: CVPR (2020)

    Google Scholar 

  12. Chen, H., Wang, P., Wang, F., Tian, W., Xiong, L., Li, H.: EPro-PnP: generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In: CVPR (2022)

    Google Scholar 

  13. Chen, P., et al.: I2UV-HandNet: image-to-UV prediction network for accurate and high-fidelity 3D hand mesh modeling. In: ICCV (2021)

    Google Scholar 

  14. Chen, X., et al.: Mobrecon: mobile-friendly hand mesh reconstruction from monocular image. In: CVPR (2022)

    Google Scholar 

  15. Chen, X., et al.: Camera-space hand mesh recovery via semantic aggregation and adaptive 2D-1D registration. In: CVPR (2021)

    Google Scholar 

  16. Chen, X., Wang, B., Shum, H.Y.: Hand avatar: free-pose hand animation and rendering from monocular video. In: CVPR (2023)

    Google Scholar 

  17. Chen, Y., et al.: Model-based 3D hand reconstruction via self-supervised learning. In: CVPR (2021)

    Google Scholar 

  18. Garcia-Hernando, G., Johns, E., Kim, T.K.: Physics-based dexterous manipulations with estimated hand poses and residual reinforcement learning. In: IROS (2020)

    Google Scholar 

  19. Ge, L., et al.: 3D hand shape and pose estimation from a single RGB image. In: CVPR (2019)

    Google Scholar 

  20. Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3D annotation of hand and object poses. In: CVPR (2020)

    Google Scholar 

  21. Hampali, S., Sarkar, S.D., Rad, M., Lepetit, V.: Keypoint transformer: solving joint identification in challenging hands and object interactions for accurate 3D pose estimation. In: CVPR (2022)

    Google Scholar 

  22. Han, S., et al.: Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM TOG (2020)

    Google Scholar 

  23. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  24. Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: CVPR (2020)

    Google Scholar 

  25. Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: CVPR (2019)

    Google Scholar 

  26. Huang, L., et al.: Neural voting field for camera-space 3D hand pose estimation. In: CVPR (2023)

    Google Scholar 

  27. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. TPAMI 36(7), 1325–1339 (2013)

    Google Scholar 

  28. Iqbal, U., Molchanov, P., Breuel Juergen Gall, T., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: ECCV (2018)

    Google Scholar 

  29. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)

    Google Scholar 

  30. Karunratanakul, K., Spurr, A., Fan, Z., Hilliges, O., Tang, S.: A skeleton-driven neural occupancy representation for articulated hands. In: 3DV (2021)

    Google Scholar 

  31. Karunratanakul, K., Yang, J., Zhang, Y., Black, M.J., Muandet, K., Tang, S.: Grasping field: learning implicit representations for human grasps. In: 3DV (2020)

    Google Scholar 

  32. Kulon, D., Guler, R.A., Kokkinos, I., Bronstein, M.M., Zafeiriou, S.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: CVPR (2020)

    Google Scholar 

  33. Kuznetsova, A., et al.: The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. IJCV 128(7), 1956–1981 (2020)

    Article  Google Scholar 

  34. Li, Z., Liu, J., Zhang, Z., Xu, S., Yan, Y.: Cliff: carrying location information in full frames into human pose and shape estimation. In: ECCV (2022)

    Google Scholar 

  35. Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: CVPR (2021)

    Google Scholar 

  36. Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: ICCV (2021)

    Google Scholar 

  37. Meta: Quest 3. https://www.meta.com/us/quest/quest-3/. Accessed 7 Mar 2024

  38. Mihajlovic, M., Zhang, Y., Black, M.J., Tang, S.: Leap: learning articulated occupancy of people. In: CVPR (2021)

    Google Scholar 

  39. Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: ICCV (2019)

    Google Scholar 

  40. Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: ECCV (2020)

    Google Scholar 

  41. Park, J., Oh, Y., Moon, G., Choi, H., Lee, K.M.: Handoccnet: occlusion-robust 3D hand mesh estimation network. In: CVPR (2022)

    Google Scholar 

  42. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR (2019)

    Google Scholar 

  43. Peng, S., et al.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: CVPR (2021)

    Google Scholar 

  44. Prince, S.J.: Computer Vision: Models, Learning, and Inference. Cambridge University Press, Cambridge (2012)

    Book  Google Scholar 

  45. Remelli, E., Han, S., Honari, S., Fua, P., Wang, R.: Lightweight multi-view 3D pose estimation through camera-disentangled representation. In: CVPR (2020)

    Google Scholar 

  46. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM TOG (2017)

    Google Scholar 

  47. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: CVPR (2019)

    Google Scholar 

  48. Spurr, A., Iqbal, U., Molchanov, P., Hilliges, O., Kautz, J.: Weakly supervised 3D hand pose estimation via biomechanical constraints. In: ECCV (2020)

    Google Scholar 

  49. Tang, X., Wang, T., Fu, C.W.: Towards accurate alignment in real-time 3D hand-mesh reconstruction. In: ICCV (2021)

    Google Scholar 

  50. Wei, T., Patel, Y., Shekhovtsov, A., Matas, J., Barath, D.: Generalized differentiable RANSAC. In: ICCV (2023)

    Google Scholar 

  51. Yin, W., et al.: Metric3D: towards zero-shot metric 3D prediction from a single image. In: ICCV (2023)

    Google Scholar 

  52. Yuan, S., et al.: Depth-based 3D hand pose estimation: from current achievements to future goals. In: CVPR (2018)

    Google Scholar 

  53. Zhang, X., et al.: Hand image understanding via deep multi-task learning. In: ICCV (2021)

    Google Scholar 

  54. Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular RGB image. In: ICCV (2019)

    Google Scholar 

  55. Zhou, Y., Habermann, M., Xu, W., Habibie, I., Theobalt, C., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. In: CVPR (2020)

    Google Scholar 

  56. Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV (2019)

    Google Scholar 

Download references

Acknowledgements

We would like to thank Filippo Aleotti for his help with baseline experiments and infrastructure; Jamie Watson, Zawar Qureshi, and Jakub Powierza for their help with infrastructure; Axel Laguna for his insightful discussions on minimal solvers and network architectures; Daniyar Turmukhambetov for valuable technical discussions; and Gabriel Brostow, Sara Vicente, Jessica Van Brummelen, and Michael Firman for their valuable feedback on different versions of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillermo Garcia-Hernando .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Valassakis, E., Garcia-Hernando, G. (2025). HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15096. Springer, Cham. https://doi.org/10.1007/978-3-031-72920-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72920-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72919-5

  • Online ISBN: 978-3-031-72920-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics