More Web Proxy on the site http://driver.im/

research-article

Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions

Authors:

Feng XuAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 40, Issue 3

Article No.: 29, Pages 1 - 12

https://doi.org/10.1145/3451341

Published: 15 July 2021 Publication History

Abstract

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras.

References

[1]

Luca Ballan, Aparna Taneja, Jürgen Gall, Luc Van Gool, and Marc Pollefeys. 2012. Motion capture of hands in action using discriminative salient points. In Proceedings of the European Conference on Computer Vision. 640–653.

Digital Library

[2]

Zi-Hao Bo, Hao Zhang, Jun-Hai Yong, Hao Gao, and Feng Xu. 2020. DenseAttentionSeg: Segment hands from interacted objects using depth input. Applied Soft Computing 92 (2020), 106297.

[3]

Géry Casiez, Nicolas Roussel, and Daniel Vogel. 2012. 1 filter: A simple speed-based low-pass filter for noisy input in interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2527–2530.

Digital Library

[4]

Woojin Cho, Gabyong Park, and Woontack Woo. 2018. Tracking an object-grabbing hand using occluded depth reconstruction. In Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct’18). IEEE, Los Alamitos, CA, 232–235.

[5]

Chiho Choi, Sang Ho Yoon, Chin-Ning Chen, and Karthik Ramani. 2017. Robust hand pose estimation during the interaction with an unknown object. In Proceedings of the IEEE International Conference on Computer Vision. 3123–3132.

[6]

Kaiwen Guo, Feng Xu, Tao Yu, Xiaoyang Liu, Qionghai Dai, and Yebin Liu. 2017. Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM Transactions on Graphics 36, 4 (2017), 1.

Digital Library

[7]

Henning Hamer, Konrad Schindler, Esther Koller-Meier, and Luc Van Gool. 2009. Tracking a hand manipulating an object. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, Los Alamitos, CA, 1475–1482.

[8]

Yana Hasson, Gul Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, and Cordelia Schmid. 2019. Learning joint reconstruction of hands and manipulated objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11807–11816.

[9]

Nikolaos Kyriazis and Antonis Argyros. 2013. Physically plausible 3D scene tracking: The single actor hypothesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9–16.

Digital Library

[10]

Nikolaos Kyriazis and Antonis Argyros. 2014. Scalable 3D tracking of multiple interacting objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3430–3437.

Digital Library

[11]

Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. GANerated hands for real-time 3D hand tracking from monocular RGB. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 49–59.

[12]

Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1284–1293.

[13]

Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 343–352.

[14]

Markus Oberweger, Paul Wohlhart, and Vincent Lepetit. 2019. Generalized feedback loop for joint hand-object pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2019), 1898–1912.

Digital Library

[15]

Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A. Argyros. 2011. Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In Proceedings of the 2011 International Conference on Computer Vision. IEEE, Los Alamitos, CA, 2088–2095.

Digital Library

[16]

Paschalis Panteleris and Antonis Argyros. 2017. Back to RGB: 3D tracking of hands and hand-object interactions based on short-baseline stereo. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 575–584.

[17]

Paschalis Panteleris, Nikolaos Kyriazis, and Antonis A. Argyros. 2015. 3D tracking of human hands in interaction with unknown objects. In Proceedings of the 26t British Machine Vision Conference (BMVC’15). 123.

[18]

Antoine Petit, Stéphane Cotin, Vincenzo Lippiello, and Bruno Siciliano. 2018. Capturing deformations of interacting non-rigid objects using RGB-D data. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18). IEEE, Los Alamitos, CA, 491–497.

[19]

Grégory Rogez, James S. Supancic, and Deva Ramanan. 2015. First-person pose recognition using egocentric workspaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4325–4333.

[20]

Javier Romero, Hedvig Kjellström, and Danica Kragic. 2010. Hands in action: Real-time 3D reconstruction of hands in interaction with objects. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation. IEEE, Los Alamitos, CA, 458–463.

[21]

Szymon Rusinkiewicz, Olaf Hall-Holt, and Marc Levoy. 2002. Real-time 3D model acquisition. ACM Transactions on Graphics 21, 3 (2002), 438–446.

Digital Library

[22]

Tanner Schmidt, Katharina Hertkorn, Richard Newcombe, Zoltan Marton, Michael Suppa, and Dieter Fox. 2015. Depth-based tracking with physical constraints for robot manipulation. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA’15). IEEE, Los Alamitos, CA, 119–126.

[23]

Miroslava Slavcheva, Maximilian Baust, Daniel Cremers, and Slobodan Ilic. 2017. Killingfusion: Non-rigid 3D reconstruction without correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1386–1395.

[24]

Miroslava Slavcheva, Maximilian Baust, and Slobodan Ilic. 2018. SobolevFusion: 3D reconstruction of scenes undergoing free non-rigid motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2646–2655.

[25]

Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, and Christian Theobalt. 2016. Real-time joint tracking of a hand manipulating an object from RGB-D input. In Proceedings of the European Conference on Computer Vision. 294–310.

[26]

Jonathan Taylor, Vladimir Tankovich, Danhang Tang, Cem Keskin, David Kim, Philip Davidson, Adarsh Kowdle, and Shahram Izadi. 2017. Articulated distance fields for ultra-fast tracking of hands interacting. ACM Transactions on Graphics 36, 6 (2017), 1–12.

Digital Library

[27]

Bugra Tekin, Federica Bogo, and Marc Pollefeys. 2019. H+ O: Unified egocentric recognition of 3D hand-object poses and interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4511–4520.

[28]

Anastasia Tkach, Mark Pauly, and Andrea Tagliasacchi. 2016. Sphere-meshes for real-time hand modeling and tracking. ACM Transactions on Graphics 35, 6 (2016), 1–11.

Digital Library

[29]

Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics 33, 5 (2014), 1–10.

Digital Library

[30]

Aggeliki Tsoli and Antonis A. Argyros. 2018. Joint 3D tracking of a deformable object in interaction with a hand. In Proceedings of the European Conference on Computer Vision (ECCV’18). 484–500.

[31]

Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. 2016. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision 118, 2 (2016), 172–193.

Digital Library

[32]

Dimitrios Tzionas and Juergen Gall. 2015. 3D object reconstruction from hand-object interactions. In Proceedings of the IEEE International Conference on Computer Vision. 729–737.

Digital Library

[33]

Mickeal Verschoor, Daniel Lobo, and Miguel A. Otaduy. 2018. Soft hand simulation for smooth and robust natural interaction. In Proceedings of the 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR’18). IEEE, Los Alamitos, CA, 183–190.

[34]

Fan Wang and Kris Hauser. 2019. In-hand object scanning via RGB-D video segmentation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA’19). IEEE, Los Alamitos, CA, 3296–3302.

[35]

Yangang Wang, Jianyuan Min, Jianjie Zhang, Yebin Liu, Feng Xu, Qionghai Dai, and Jinxiang Chai. 2013. Video-based hand manipulation capture through composite motion control. ACM Transactions on Graphics 32, 4 (2013), 1–14.

Digital Library

[36]

Thibaut Weise, Bastian Leibe, and Luc Van Gool. 2008. Accurate and robust registration for in-hand modeling. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1–8.

[37]

Thibaut Weise, Thomas Wismer, Bastian Leibe, and Luc Van Gool. 2011. Online loop closure for real-time interactive 3D scanning. Computer Vision and Image Understanding 115, 5 (2011), 635–648.

Digital Library

[38]

Hao Zhang, Zi-Hao Bo, Jun-Hai Yong, and Feng Xu. 2019. InteractionFusion: Real-time reconstruction of hand poses and deformable objects in hand-object interactions. ACM Transactions on Graphics 38, 4 (2019), 1–11.

Digital Library

[39]

Yuxiao Zhou, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. 2020. Monocular real-time hand shape and motion capture using multi-modal data. arXiv:2003.09572

Cited By

Pham QLe HDang Nhat MTran T. KTran-Tien MDang VVu HNguyen MPhan X(2024)Towards Vietnamese Question and Answer Generation: An Empirical StudyACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3675781Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3675781
Luo TLiu YPan S(2024)Collaborative Sequential Recommendations via Multi-view GNN-transformersACM Transactions on Information Systems10.1145/364943642:6(1-27)Online publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1145/3649436
Chen LCao JLiang WYe Q(2024)Geography-aware Heterogeneous Graph Contrastive Learning for Travel RecommendationACM Transactions on Spatial Algorithms and Systems10.1145/3641277Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3641277
Show More Cited By

Index Terms

Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Perception

Recommendations

Physical Interaction: Reconstructing Hand-object Interactions with Physics
SA '22: SIGGRAPH Asia 2022 Conference Papers

Single view-based reconstruction of hand-object interaction is challenging due to the severe observation missing caused by occlusions. This paper proposes a physics-based method to better solve the ambiguities in the reconstruction. It first proposes a ...
InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions

Hand-object interaction is challenging to reconstruct but important for many applications like HCI, robotics and so on. Previous works focus on either the hand or the object while we jointly track the hand poses, fuse the 3D object model and reconstruct ...
Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics
SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 40, Issue 3

June 2021

264 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3463476

Editor:
Marc Alexa
TU Berlin, Germany

Issue’s Table of Contents

Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2021

Accepted: 01 February 2021

Revised: 01 December 2020

Received: 01 October 2020

Published in TOG Volume 40, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Key R(&)D Program of China
NSFC
Beijing Natural Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

97
Total Citations
View Citations
593
Total Downloads

Downloads (Last 12 months)121
Downloads (Last 6 weeks)13

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pham QLe HDang Nhat MTran T. KTran-Tien MDang VVu HNguyen MPhan X(2024)Towards Vietnamese Question and Answer Generation: An Empirical StudyACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3675781Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3675781
Luo TLiu YPan S(2024)Collaborative Sequential Recommendations via Multi-view GNN-transformersACM Transactions on Information Systems10.1145/364943642:6(1-27)Online publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1145/3649436
Chen LCao JLiang WYe Q(2024)Geography-aware Heterogeneous Graph Contrastive Learning for Travel RecommendationACM Transactions on Spatial Algorithms and Systems10.1145/3641277Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3641277
Cui CYao YZhang CMa HMa YRen ZZhang CKo J(2024)DGEKT: A Dual Graph Ensemble Learning Method for Knowledge TracingACM Transactions on Information Systems10.1145/363835042:3(1-24)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3638350
Wang YJavari ABalaji JShalaby WDerr TCui XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Knowledge Graph-based Session Recommendation with Session-Adaptive PropagationCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648324(264-273)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648324
Xian KPeng JCao ZZhang JLin G(2024)ViTA: Video Transformer Adaptor for Robust Video Depth EstimationIEEE Transactions on Multimedia10.1109/TMM.2023.330955926(3302-3316)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3309559
Aboukhadra AMalik JRobertini NElhayek AStricker D(2024)ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction From a Single Depth MapIEEE Access10.1109/ACCESS.2024.344599312(124021-124031)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3445993
Hoang DTan PNguyen AVu DVu VNguyen THoang NPhan KTran DNguyen VDuong QHo NTran CDuong VNgo P(2024)Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction LearningIEEE Access10.1109/ACCESS.2024.338887012(54339-54351)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3388870
Zhu ZWang JQin YSun DJampani VWang X(2024)ContactArt: Learning 3D Interaction Priors for Category-level Articulated Object and Hand Poses Estimation2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00028(201-212)Online publication date: 18-Mar-2024
https://doi.org/10.1109/3DV62453.2024.00028
Shao Q(2024)Virtual reality and ANN-based three-dimensional tactical training model for football playersSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-024-09634-x28:4(3633-3648)Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1007/s00500-024-09634-x
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents