[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-031-72920-1_27guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning

Published: 01 October 2024 Publication History

Abstract

Predicting camera-space hand meshes from single RGB images is crucial for enabling realistic hand interactions in 3D virtual and augmented worlds. Previous work typically divided the task into two stages: given a cropped image of the hand, predict meshes in relative coordinates, followed by lifting these predictions into camera space in a separate and independent stage, often resulting in the loss of valuable contextual and scale information. To prevent the loss of these cues, we propose unifying these two stages into an end-to-end solution that addresses the 2D-3D correspondence problem. This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module. We also introduce an image rectification step that harmonizes both the training dataset and the input image as if they were acquired with the same camera, helping to alleviate the inherent scale-depth ambiguity of the problem. We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches across three public benchmarks.

References

[1]
Antotsiou, D., Garcia-Hernando, G., Kim, T.K.: Task-oriented hand motion retargeting for dexterous manipulation imitation. In: ECCV Workshop (2018)
[2]
Apple: Vision Pro. https://www.apple.com/apple-vision-pro/. Accessed 7 Mar 2024
[3]
Armagan, A., et al.: Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3D hand pose estimation under hand-object interaction. In: ECCV (2020)
[4]
Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: CVPR (2019)
[5]
Baek, S., Kim, K.I., Kim, T.K.: Weakly-supervised domain adaptation via GAN and mesh model for estimating 3D hand poses interacting objects. In: CVPR (2020)
[6]
Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Loopreg: self-supervised learning of implicit surface correspondences, pose and shape for 3D human mesh registration. In: NeurIPS (2020)
[7]
Bhowmik, A., Gumhold, S., Rother, C., Brachmann, E.: Reinforced feature points: optimizing feature detection and description for a high-level task. In: CVPR (2020)
[8]
Boukhayma, A., Bem, R.D., Torr, P.H.: 3D hand shape and pose from images in the wild. In: CVPR (2019)
[9]
Brachmann, E., et al.: DSAC-differentiable RANSAC for camera localization. In: CVPR (2017)
[10]
Cao, Z., Radosavovic, I., Kanazawa, A., Malik, J.: Reconstructing hand-object interactions in the wild. In: CVPR (2021)
[11]
Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating PNP optimization. In: CVPR (2020)
[12]
Chen, H., Wang, P., Wang, F., Tian, W., Xiong, L., Li, H.: EPro-PnP: generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In: CVPR (2022)
[13]
Chen, P., et al.: I2UV-HandNet: image-to-UV prediction network for accurate and high-fidelity 3D hand mesh modeling. In: ICCV (2021)
[14]
Chen, X., et al.: Mobrecon: mobile-friendly hand mesh reconstruction from monocular image. In: CVPR (2022)
[15]
Chen, X., et al.: Camera-space hand mesh recovery via semantic aggregation and adaptive 2D-1D registration. In: CVPR (2021)
[16]
Chen, X., Wang, B., Shum, H.Y.: Hand avatar: free-pose hand animation and rendering from monocular video. In: CVPR (2023)
[17]
Chen, Y., et al.: Model-based 3D hand reconstruction via self-supervised learning. In: CVPR (2021)
[18]
Garcia-Hernando, G., Johns, E., Kim, T.K.: Physics-based dexterous manipulations with estimated hand poses and residual reinforcement learning. In: IROS (2020)
[19]
Ge, L., et al.: 3D hand shape and pose estimation from a single RGB image. In: CVPR (2019)
[20]
Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3D annotation of hand and object poses. In: CVPR (2020)
[21]
Hampali, S., Sarkar, S.D., Rad, M., Lepetit, V.: Keypoint transformer: solving joint identification in challenging hands and object interactions for accurate 3D pose estimation. In: CVPR (2022)
[22]
Han, S., et al.: Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM TOG (2020)
[23]
Hartley R and Zisserman A Multiple View Geometry in Computer Vision 2003 Cambridge Cambridge University Press
[24]
Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: CVPR (2020)
[25]
Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: CVPR (2019)
[26]
Huang, L., et al.: Neural voting field for camera-space 3D hand pose estimation. In: CVPR (2023)
[27]
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. TPAMI 36(7), 1325–1339 (2013)
[28]
Iqbal, U., Molchanov, P., Breuel Juergen Gall, T., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: ECCV (2018)
[29]
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)
[30]
Karunratanakul, K., Spurr, A., Fan, Z., Hilliges, O., Tang, S.: A skeleton-driven neural occupancy representation for articulated hands. In: 3DV (2021)
[31]
Karunratanakul, K., Yang, J., Zhang, Y., Black, M.J., Muandet, K., Tang, S.: Grasping field: learning implicit representations for human grasps. In: 3DV (2020)
[32]
Kulon, D., Guler, R.A., Kokkinos, I., Bronstein, M.M., Zafeiriou, S.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: CVPR (2020)
[33]
Kuznetsova A et al. The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale IJCV 2020 128 7 1956-1981
[34]
Li, Z., Liu, J., Zhang, Z., Xu, S., Yan, Y.: Cliff: carrying location information in full frames into human pose and shape estimation. In: ECCV (2022)
[35]
Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: CVPR (2021)
[36]
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: ICCV (2021)
[37]
Meta: Quest 3. https://www.meta.com/us/quest/quest-3/. Accessed 7 Mar 2024
[38]
Mihajlovic, M., Zhang, Y., Black, M.J., Tang, S.: Leap: learning articulated occupancy of people. In: CVPR (2021)
[39]
Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: ICCV (2019)
[40]
Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: ECCV (2020)
[41]
Park, J., Oh, Y., Moon, G., Choi, H., Lee, K.M.: Handoccnet: occlusion-robust 3D hand mesh estimation network. In: CVPR (2022)
[42]
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR (2019)
[43]
Peng, S., et al.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: CVPR (2021)
[44]
Prince SJ Computer Vision: Models, Learning, and Inference 2012 Cambridge Cambridge University Press
[45]
Remelli, E., Han, S., Honari, S., Fua, P., Wang, R.: Lightweight multi-view 3D pose estimation through camera-disentangled representation. In: CVPR (2020)
[46]
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM TOG (2017)
[47]
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: CVPR (2019)
[48]
Spurr, A., Iqbal, U., Molchanov, P., Hilliges, O., Kautz, J.: Weakly supervised 3D hand pose estimation via biomechanical constraints. In: ECCV (2020)
[49]
Tang, X., Wang, T., Fu, C.W.: Towards accurate alignment in real-time 3D hand-mesh reconstruction. In: ICCV (2021)
[50]
Wei, T., Patel, Y., Shekhovtsov, A., Matas, J., Barath, D.: Generalized differentiable RANSAC. In: ICCV (2023)
[51]
Yin, W., et al.: Metric3D: towards zero-shot metric 3D prediction from a single image. In: ICCV (2023)
[52]
Yuan, S., et al.: Depth-based 3D hand pose estimation: from current achievements to future goals. In: CVPR (2018)
[53]
Zhang, X., et al.: Hand image understanding via deep multi-task learning. In: ICCV (2021)
[54]
Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular RGB image. In: ICCV (2019)
[55]
Zhou, Y., Habermann, M., Xu, W., Habibie, I., Theobalt, C., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. In: CVPR (2020)
[56]
Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV (2019)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXXVIII
Sep 2024
583 pages
ISBN:978-3-031-72919-5
DOI:10.1007/978-3-031-72920-1
  • Editors:
  • Aleš Leonardis,
  • Elisa Ricci,
  • Stefan Roth,
  • Olga Russakovsky,
  • Torsten Sattler,
  • Gül Varol

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 October 2024

Author Tags

  1. camera-space hand mesh estimation
  2. hand and body pose shape from RGB images
  3. 3D-to-2D scale ambiguity
  4. differentiable solver

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media