[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions

Published: 15 July 2021 Publication History

Abstract

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras.

References

[1]
Luca Ballan, Aparna Taneja, Jürgen Gall, Luc Van Gool, and Marc Pollefeys. 2012. Motion capture of hands in action using discriminative salient points. In Proceedings of the European Conference on Computer Vision. 640–653.
[2]
Zi-Hao Bo, Hao Zhang, Jun-Hai Yong, Hao Gao, and Feng Xu. 2020. DenseAttentionSeg: Segment hands from interacted objects using depth input. Applied Soft Computing 92 (2020), 106297.
[3]
Géry Casiez, Nicolas Roussel, and Daniel Vogel. 2012. 1 filter: A simple speed-based low-pass filter for noisy input in interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2527–2530.
[4]
Woojin Cho, Gabyong Park, and Woontack Woo. 2018. Tracking an object-grabbing hand using occluded depth reconstruction. In Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct’18). IEEE, Los Alamitos, CA, 232–235.
[5]
Chiho Choi, Sang Ho Yoon, Chin-Ning Chen, and Karthik Ramani. 2017. Robust hand pose estimation during the interaction with an unknown object. In Proceedings of the IEEE International Conference on Computer Vision. 3123–3132.
[6]
Kaiwen Guo, Feng Xu, Tao Yu, Xiaoyang Liu, Qionghai Dai, and Yebin Liu. 2017. Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM Transactions on Graphics 36, 4 (2017), 1.
[7]
Henning Hamer, Konrad Schindler, Esther Koller-Meier, and Luc Van Gool. 2009. Tracking a hand manipulating an object. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, Los Alamitos, CA, 1475–1482.
[8]
Yana Hasson, Gul Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, and Cordelia Schmid. 2019. Learning joint reconstruction of hands and manipulated objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11807–11816.
[9]
Nikolaos Kyriazis and Antonis Argyros. 2013. Physically plausible 3D scene tracking: The single actor hypothesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9–16.
[10]
Nikolaos Kyriazis and Antonis Argyros. 2014. Scalable 3D tracking of multiple interacting objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3430–3437.
[11]
Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. GANerated hands for real-time 3D hand tracking from monocular RGB. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 49–59.
[12]
Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1284–1293.
[13]
Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 343–352.
[14]
Markus Oberweger, Paul Wohlhart, and Vincent Lepetit. 2019. Generalized feedback loop for joint hand-object pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2019), 1898–1912.
[15]
Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A. Argyros. 2011. Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In Proceedings of the 2011 International Conference on Computer Vision. IEEE, Los Alamitos, CA, 2088–2095.
[16]
Paschalis Panteleris and Antonis Argyros. 2017. Back to RGB: 3D tracking of hands and hand-object interactions based on short-baseline stereo. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 575–584.
[17]
Paschalis Panteleris, Nikolaos Kyriazis, and Antonis A. Argyros. 2015. 3D tracking of human hands in interaction with unknown objects. In Proceedings of the 26t British Machine Vision Conference (BMVC’15). 123.
[18]
Antoine Petit, Stéphane Cotin, Vincenzo Lippiello, and Bruno Siciliano. 2018. Capturing deformations of interacting non-rigid objects using RGB-D data. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18). IEEE, Los Alamitos, CA, 491–497.
[19]
Grégory Rogez, James S. Supancic, and Deva Ramanan. 2015. First-person pose recognition using egocentric workspaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4325–4333.
[20]
Javier Romero, Hedvig Kjellström, and Danica Kragic. 2010. Hands in action: Real-time 3D reconstruction of hands in interaction with objects. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation. IEEE, Los Alamitos, CA, 458–463.
[21]
Szymon Rusinkiewicz, Olaf Hall-Holt, and Marc Levoy. 2002. Real-time 3D model acquisition. ACM Transactions on Graphics 21, 3 (2002), 438–446.
[22]
Tanner Schmidt, Katharina Hertkorn, Richard Newcombe, Zoltan Marton, Michael Suppa, and Dieter Fox. 2015. Depth-based tracking with physical constraints for robot manipulation. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA’15). IEEE, Los Alamitos, CA, 119–126.
[23]
Miroslava Slavcheva, Maximilian Baust, Daniel Cremers, and Slobodan Ilic. 2017. Killingfusion: Non-rigid 3D reconstruction without correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1386–1395.
[24]
Miroslava Slavcheva, Maximilian Baust, and Slobodan Ilic. 2018. SobolevFusion: 3D reconstruction of scenes undergoing free non-rigid motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2646–2655.
[25]
Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, and Christian Theobalt. 2016. Real-time joint tracking of a hand manipulating an object from RGB-D input. In Proceedings of the European Conference on Computer Vision. 294–310.
[26]
Jonathan Taylor, Vladimir Tankovich, Danhang Tang, Cem Keskin, David Kim, Philip Davidson, Adarsh Kowdle, and Shahram Izadi. 2017. Articulated distance fields for ultra-fast tracking of hands interacting. ACM Transactions on Graphics 36, 6 (2017), 1–12.
[27]
Bugra Tekin, Federica Bogo, and Marc Pollefeys. 2019. H+ O: Unified egocentric recognition of 3D hand-object poses and interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4511–4520.
[28]
Anastasia Tkach, Mark Pauly, and Andrea Tagliasacchi. 2016. Sphere-meshes for real-time hand modeling and tracking. ACM Transactions on Graphics 35, 6 (2016), 1–11.
[29]
Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics 33, 5 (2014), 1–10.
[30]
Aggeliki Tsoli and Antonis A. Argyros. 2018. Joint 3D tracking of a deformable object in interaction with a hand. In Proceedings of the European Conference on Computer Vision (ECCV’18). 484–500.
[31]
Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. 2016. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision 118, 2 (2016), 172–193.
[32]
Dimitrios Tzionas and Juergen Gall. 2015. 3D object reconstruction from hand-object interactions. In Proceedings of the IEEE International Conference on Computer Vision. 729–737.
[33]
Mickeal Verschoor, Daniel Lobo, and Miguel A. Otaduy. 2018. Soft hand simulation for smooth and robust natural interaction. In Proceedings of the 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR’18). IEEE, Los Alamitos, CA, 183–190.
[34]
Fan Wang and Kris Hauser. 2019. In-hand object scanning via RGB-D video segmentation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA’19). IEEE, Los Alamitos, CA, 3296–3302.
[35]
Yangang Wang, Jianyuan Min, Jianjie Zhang, Yebin Liu, Feng Xu, Qionghai Dai, and Jinxiang Chai. 2013. Video-based hand manipulation capture through composite motion control. ACM Transactions on Graphics 32, 4 (2013), 1–14.
[36]
Thibaut Weise, Bastian Leibe, and Luc Van Gool. 2008. Accurate and robust registration for in-hand modeling. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1–8.
[37]
Thibaut Weise, Thomas Wismer, Bastian Leibe, and Luc Van Gool. 2011. Online loop closure for real-time interactive 3D scanning. Computer Vision and Image Understanding 115, 5 (2011), 635–648.
[38]
Hao Zhang, Zi-Hao Bo, Jun-Hai Yong, and Feng Xu. 2019. InteractionFusion: Real-time reconstruction of hand poses and deformable objects in hand-object interactions. ACM Transactions on Graphics 38, 4 (2019), 1–11.
[39]
Yuxiao Zhou, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. 2020. Monocular real-time hand shape and motion capture using multi-modal data. arXiv:2003.09572

Cited By

View all
  • (2024)Towards Vietnamese Question and Answer Generation: An Empirical StudyACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3675781Online publication date: 29-Jun-2024
  • (2024)Collaborative Sequential Recommendations via Multi-view GNN-transformersACM Transactions on Information Systems10.1145/364943642:6(1-27)Online publication date: 25-Jun-2024
  • (2024)Geography-aware Heterogeneous Graph Contrastive Learning for Travel RecommendationACM Transactions on Spatial Algorithms and Systems10.1145/3641277Online publication date: 22-Jan-2024
  • Show More Cited By

Index Terms

  1. Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 40, Issue 3
    June 2021
    264 pages
    ISSN:0730-0301
    EISSN:1557-7368
    DOI:10.1145/3463476
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2021
    Accepted: 01 February 2021
    Revised: 01 December 2020
    Received: 01 October 2020
    Published in TOG Volume 40, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Single depth camera
    2. hand tracking
    3. object reconstruction
    4. hand-object interaction

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Key R(&)D Program of China
    • NSFC
    • Beijing Natural Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)121
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Towards Vietnamese Question and Answer Generation: An Empirical StudyACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3675781Online publication date: 29-Jun-2024
    • (2024)Collaborative Sequential Recommendations via Multi-view GNN-transformersACM Transactions on Information Systems10.1145/364943642:6(1-27)Online publication date: 25-Jun-2024
    • (2024)Geography-aware Heterogeneous Graph Contrastive Learning for Travel RecommendationACM Transactions on Spatial Algorithms and Systems10.1145/3641277Online publication date: 22-Jan-2024
    • (2024)DGEKT: A Dual Graph Ensemble Learning Method for Knowledge TracingACM Transactions on Information Systems10.1145/363835042:3(1-24)Online publication date: 22-Jan-2024
    • (2024)Knowledge Graph-based Session Recommendation with Session-Adaptive PropagationCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648324(264-273)Online publication date: 13-May-2024
    • (2024)ViTA: Video Transformer Adaptor for Robust Video Depth EstimationIEEE Transactions on Multimedia10.1109/TMM.2023.330955926(3302-3316)Online publication date: 1-Jan-2024
    • (2024)ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction From a Single Depth MapIEEE Access10.1109/ACCESS.2024.344599312(124021-124031)Online publication date: 2024
    • (2024)Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction LearningIEEE Access10.1109/ACCESS.2024.338887012(54339-54351)Online publication date: 2024
    • (2024)ContactArt: Learning 3D Interaction Priors for Category-level Articulated Object and Hand Poses Estimation2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00028(201-212)Online publication date: 18-Mar-2024
    • (2024)Virtual reality and ANN-based three-dimensional tactical training model for football playersSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-024-09634-x28:4(3633-3648)Online publication date: 25-Jan-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media