[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3332165.3347889acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open access

MeCap: Whole-Body Digitization for Low-Cost VR/AR Headsets

Published: 17 October 2019 Publication History

Abstract

Low-cost, smartphone-powered VR/AR headsets are becoming more popular. These basic devices - little more than plastic or cardboard shells - lack advanced features, such as controllers for the hands, limiting their interactive capability. Moreover, even high-end consumer headsets lack the ability to track the body and face. For this reason, interactive experiences like social VR are underdeveloped. We introduce MeCap, which enables commodity VR headsets to be augmented with powerful motion capture ("MoCap") and user-sensing capabilities at very low cost (under $5). Using only a pair of hemi-spherical mirrors and the existing rear-facing camera of a smartphone, MeCap provides real-time estimates of a wearer's 3D body pose, hand pose, facial expression, physical appearance and surrounding environment - capabilities which are either absent in contemporary VR/AR systems or which require specialized hardware and controllers. We evaluate the accuracy of each of our tracking features, the results of which show imminent feasibility.

Supplementary Material

MP4 File (ufp3300pv.mp4)
Preview video
MP4 File (ufp3300vf.mp4)
Supplemental video
MP4 File (p453-ahuja.mp4)

References

[1]
Karan Ahuja, Rahul Islam, Varun Parashar, Kuntal Dey, Chris Harrison, and Mayank Goel. 2018. Eye-SpyVR: Interactive Eye Sensing Using Off-the-Shelf, Smartphone-Based VR Headsets. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2, Article 57 (July 2018), 10 pages.
[2]
AltspaceVR, Retrieved: 12-Sep-2018, https://altvr.com
[3]
Brian Amento, Will Hill, and Loren Terveen. 2002. The sound of one hand: a wrist-mounted bio-acoustic fingertip gesture interface. In CHI '02 Extended Ab-stracts on Human Factors in Computing Systems (CHI EA '02). ACM, New York, NY, USA, 724--725.
[4]
Thomas Augsten, Konstantin Kaefer, René Meusel, Caroline Fetzer, Dorian Kanitz, Thomas Stoff, Torsten Becker, Christian Holz, and Patrick Baudisch. 2010. Multitoe: high-precision interaction with back-projected floors based on high-resolution multi-touch input. In Proceedings of the 23nd annual ACM sympo-sium on User interface software and technology (UIST '10). ACM, New York, NY, USA, 209--218.
[5]
Alan Bränzel, Christian Holz, Daniel Hoffmann, Dominik Schmidt, Marius Knaust, Patrick Lühne, René Meusel, Stephan Richter, and Patrick Baudisch. 2013. GravitySpace: tracking users and their poses in a smart room using a pressure-sensing floor. In Proceed-ings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 725--734.
[6]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation us-ing part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion, pp. 7291--7299. 2017.
[7]
Chollet, François. "Keras" (2015).
[8]
Dance Central, Xbox 360. (2012). Microsoft Studios, Redmond, Washington.
[9]
Paul Debevec. Image-based lighting. IEEE Computer Graphics and Applications 22, no. 2 (2002): 26--34.
[10]
Paul Debevec. 2008. Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography. In ACM SIGGRAPH 2008 clas-ses (SIGGRAPH '08). ACM, New York, NY, USA, Article 32, 10 pages.
[11]
Dissanayake, MWM Gamini, Paul Newman, Steve Clark, Hugh F. Durrant-Whyte, and Michael Csorba. A solution to the simultaneous localization and map building (SLAM) problem. IEEE Transactions on ro-botics and automation 17, no. 3 (2001): 229--241.
[12]
Jun Gong, Xing-Dong Yang, and Pourang Irani. 2016. WristWhirl: One-handed Continuous Smartwatch In-put using Wrist Gestures. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST '16). ACM, New York, NY, USA, 861--872.
[13]
Jun Gong, Yang Zhang, Xia Zhou, and Xing-Dong Yang. 2017. Pyro: Thumb-Tip Gesture Recognition Using Pyroelectric Infrared Sensing. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST '17). ACM, New York, NY, USA, 553--563.
[14]
Google Cardboard. (2014). https://vr.google.com/cardboard
[15]
Google Daydream. (2016). https://vr.google.com/daydream
[16]
Sehoon Ha, Yunfei Bai, and C. Karen Liu. 2011. Hu-man motion reconstruction from force sensors. In Pro-ceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '11), ACM, New York, NY, USA, 129--138.
[17]
Chris Harrison, Desney Tan, and Dan Morris. 2010. Skinput: appropriating the body as an input surface. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 453--462.
[18]
Chris Harrison, Hrvoje Benko, and Andrew D. Wil-son. 2011. OmniTouch: wearable multitouch interac-tion everywhere. In Proceedings of the 24th annual ACM symposium on User interface software and tech-nology (UIST '11). ACM, New York, NY, USA, 441--450.
[19]
Wolfgang Heidrich. "Environment Maps and Their Applications." Max-Planck-Institute for Computer Sci-ence, Saarbrücken, Germany 19 (2000).
[20]
HoloKit. https://holokit.io
[21]
HTC Vive. https://www.vive.com
[22]
PoseNet on TensorFlow Lite. https://github.com/tensorflow/tfjs-models/tree/master/posenet
[23]
David Kim, Otmar Hilliges, Shahram Izadi, Alex D. Butler, Jiawen Chen, Iason Oikonomidis, and Patrick Olivier. 2012. Digits: freehand 3D interactions any-where using a wrist-worn gloveless sensor. In Proceed-ings of the 25th annual ACM symposium on User in-terface software and technology (UIST '12). ACM, New York, NY, USA, 167--176.
[24]
Chloe LeGendre, Xueming Yu, Dai Liu, Jay Busch, Andrew Jones, Sumanta Pattanaik, and Paul Debevec. 2016. Practical multispectral lighting reproduction. ACM Trans. Graph. 35, 4, Article 32 (July 2016), 11 pages.
[25]
Cheng Li, and Kris M. Kitani. Pixel-level hand detec-tion in ego-centric videos. In Proceedings of the IEEE conference on computer vision and pattern recogni-tion, pp. 3570--3577. 2013.
[26]
Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial performance sensing head-mounted display. ACM Trans. Graph. 34, 4, Ar-ticle 47 (July 2015), 9 pages.
[27]
Yingliang Ma, Helena M. Paterson, and Frank E. Pol-lick. A motion capture library for the study of identity, gender, and emotion perception from biological mo-tion. Behavior research methods 38.1 (2006): 134--141.
[28]
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4, Article 44 (July 2017), 14 pages.
[29]
David Merrill and Hayes Raffle. 2007. The sound of touch. In ACM SIGGRAPH 2007 posters (SIGGRAPH '07). ACM, New York, NY, USA, Article 138.
[30]
METAmotion, http://metamotion.com/gypsy/gypsy-motion-capture-system.htm, Retrieved: 12-Sep-2018
[31]
Branislav Micuk and Tomá Pajdla. Autocalibration & 3D reconstruction with non-central catadioptric cam-eras. In Proceedings of the IEEE conference on com-puter vision and pattern recognition, pp. 58--65. 2004.
[32]
Shree K. Nayar. Sphereo: Determining depth using two specular spheres and a single camera. Optics, Illumina-tion, and Image Sensing for Machine Vision III. Vol. 1005. International Society for Optics and Photonics, 1989.
[33]
Stepan Obdrzalek, Gregorij Kurillo, Ferda Ofli, Ruzena Bajcsy, Edmund Seto, Holly Jimison, and Michael Pavel. Accuracy and robustness of Kinect pose estima-tion in the context of coaching of elderly population. In Engineering in medicine and biology society (EMBC), 2012 annual international conference of the IEEE, vol. 28, pp. 1188--1193. IEEE, 2012.
[34]
Oculus. https://www.oculus.com
[35]
Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity facial and speech anima-tion for VR HMDs. ACM Trans. Graph. 35, 6, Article 221 (November 2016), 14 pages.
[36]
OptiTrack, http://optitrack.com/(Retreived:12-Sep-18)
[37]
George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, and Kevin Mur-phy. Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 269--286. 2018.
[38]
Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016. EgoCap: egocentric marker-less motion capture with two fisheye cameras. ACM Trans. Graph. 35, 6, Article 162 (No-vember 2016), 11 pages.
[39]
Daniel Roetenberg, Henk Luinge, and Per Slycke. "Xsens MVN: full 6DOF human motion tracking us-ing miniature inertial sensors." Xsens Motion Tech-nologies BV, Tech. Rep 1 (2009).
[40]
Silonie Sachdeva. Fitzpatrick skin typing: applications in dermatology. Indian Journal of Dermatology, Vene-reology, and Leprology 75, no. 1 (2009): 93.
[41]
Samsung GearVR (2015). https://www.samsung.com/global/galaxy/gear-vr/
[42]
Takaaki Shiratori, Hyun Soo Park, Leonid Sigal, Yaser Sheikh, and Jessica K. Hodgins. 2011. Motion capture from body-mounted cameras. In ACM SIGGRAPH 2011 papers (SIGGRAPH '11), Hugues Hoppe (Ed.) ACM, New York, NY, USA, Article 31, 10 pages.
[43]
Karen Simonyan, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recogni-tion." arXiv preprint arXiv:1409.1556 (2014).
[44]
Sony Playstation VR (2016). https://www.playstation.com/explore/playstation-vr/
[45]
Jochen Tautges, Arno Zinke, Björn Krüger, Jan Bau-mann, Andreas Weber, Thomas Helten, Meinard Mül-ler, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion reconstruction using sparse accelerometer data. ACM Trans. Graph. 30, 3, Article 18 (May 2011), 12 pages.
[46]
TechCrunch. Retrieved: 12-Sep-2018, https://techcrunch.com/2016/10/06/facebook-social-vr/
[47]
Daniel Thalmann, Hui Liang, and Junsong Yuan. "First-Person Palm Pose Tracking and Gesture Recog-nition in Augmented Reality." In International Joint Conference on Computer Vision, Imaging and Com-puter Graphics, pp. 3--15. Springer, Cham, 2015.
[48]
Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2018. FaceVR: Real-Time Gaze-Aware Facial Reenactment in Virtual Reality. ACM Trans. Graph. 37, 2, Article 25 (June 2018), 15 pages.
[49]
Vicon, https://vicon.com, Retreived:12-Sep-2018
[50]
VR Box. 2018. VR Box Headset. (2018). https://www.amazon.com/Virtual-Reality-Headset-Glasses-Smartphones/dp/B01IXLEJCM
[51]
VRChat. Retrieved: 12-Sep-2018, https://vrchat.net
[52]
Yan Yukang, Chun Yu, Xiaojuan Ma, Xin Yi, Ke Sun, and Yuanchun Shi. 2018. VirtualGrasp: Leverag-ing Experience of Interacting with Physical Objects to Facilitate Digital Object Retrieval. In Proceedings of the 2018 CHI Conference on Human Factors in Com-puting Systems (CHI '18). ACM, New York, NY, USA, Paper 78, 13 pages.
[53]
Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7356--7365. 2018.
[54]
Cheng Zhang, Qiuyue Xue, Anandghan Waghmare, Ruichen Meng, Sumeet Jain, Yizeng Han, Xinyu Li, Kenneth Cunefare, Thomas Ploetz, Thad Starner, Omer Inan, and Gregory D. Abowd. 2018. FingerPing: Rec-ognizing Fine-grained Hand Poses using Active Acous-tic On-body Sensing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Paper 437, 10 pages.
[55]
Yang Zhang and Chris Harrison. 2015. Tomo: Weara-ble, Low-Cost Electrical Impedance Tomography for Hand Gesture Recognition. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (UIST '15). ACM, New York, NY, USA, 167--173.
[56]
Yang Zhang, Chouchang (Jack) Yang, Scott E. Hud-son, Chris Harrison, and Alanson Sample. 2018. Wall++: Room-Scale Interactive and Context-Aware Sensing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Paper 273, 15 pages.
[57]
Junhan Zhou, Yang Zhang, Gierad Laput, and Chris Harrison. 2016. AuraSense: Enabling Expressive Around-Smartwatch Interactions with Electric Field Sensing. In Proceedings of the 29th Annual Symposi-um on User Interface Software and Technology (UIST '16). ACM, New York, NY, USA, 81--86.

Cited By

View all
  • (2024)ChromaFlash: Snapshot Hyperspectral Imaging Using Rolling Shutter CamerasProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785828:3(1-31)Online publication date: 9-Sep-2024
  • (2024)Mapping Real World Locomotion Speed to the Virtual World in Large Field of View Virtual EnvironmentsProceedings of the 2024 ACM Symposium on Spatial User Interaction10.1145/3677386.3682077(1-12)Online publication date: 7-Oct-2024
  • (2024)Upper-Body Posture Estimation Employing both Palm and Back of the User's Hand-Mounted CamerasProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656712(1-2)Online publication date: 3-Jun-2024
  • Show More Cited By

Index Terms

  1. MeCap: Whole-Body Digitization for Low-Cost VR/AR Headsets

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '19: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology
    October 2019
    1229 pages
    ISBN:9781450368162
    DOI:10.1145/3332165
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Honorable Mention

    Author Tags

    1. augmented reality
    2. hand gestures
    3. headset
    4. motion cap-ture
    5. on-body sensing
    6. virtual reality

    Qualifiers

    • Research-article

    Conference

    UIST '19

    Acceptance Rates

    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)397
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 22 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ChromaFlash: Snapshot Hyperspectral Imaging Using Rolling Shutter CamerasProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785828:3(1-31)Online publication date: 9-Sep-2024
    • (2024)Mapping Real World Locomotion Speed to the Virtual World in Large Field of View Virtual EnvironmentsProceedings of the 2024 ACM Symposium on Spatial User Interaction10.1145/3677386.3682077(1-12)Online publication date: 7-Oct-2024
    • (2024)Upper-Body Posture Estimation Employing both Palm and Back of the User's Hand-Mounted CamerasProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656712(1-2)Online publication date: 3-Jun-2024
    • (2024)SolePoser: Full Body Pose Estimation using a Single Pair of Insole SensorProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676418(1-9)Online publication date: 13-Oct-2024
    • (2024)BodyTouchProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314267:4(1-22)Online publication date: 12-Jan-2024
    • (2024)EITPose: Wearable and Practical Electrical Impedance Tomography for Continuous Hand Pose EstimationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642663(1-10)Online publication date: 11-May-2024
    • (2024)Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01834(19383-19400)Online publication date: 16-Jun-2024
    • (2024)Interaction in Virtual Environments Using Smartwatches: A Comparative Usability Study Between Continuous Gesture Recognition and MDDTW2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00269(1705-1710)Online publication date: 2-Jul-2024
    • (2024)EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations EverywhereComputer Vision – ECCV 202410.1007/978-3-031-72627-9_16(277-294)Online publication date: 20-Oct-2024
    • (2023)Harnessing the Potential of the Metaverse and Artificial Intelligence for the Internet of City Things: Cost-Effective XReality and Synergistic AIoT TechnologiesSmart Cities10.3390/smartcities60501096:5(2397-2429)Online publication date: 13-Sep-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media