[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3543664.3543672acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances.

Published: 07 August 2022 Publication History

Abstract

Visual effects commonly requires both the creation of realistic synthetic humans as well as retargeting actors’ performances to humanoid characters such as aliens and monsters. Achieving the expressive performances demanded in entertainment requires manipulating complex models with hundreds of parameters. Full creative control requires the freedom to make edits at any stage of the production, which prohibits the use of a fully automatic “black box” solution with uninterpretable parameters. On the other hand, producing realistic animation with these sophisticated models is difficult and laborious.   This paper describes FDLS (Facial Deep Learning Solver), which is Weta Digital’s solution to these challenges. FDLS adopts a coarse-to-fine and human-in-the-loop strategy, allowing a solved performance to be verified and (if needed) edited at several stages in the solving process. To train FDLS, we first transform the raw motion-captured data into robust graph features. The feature extraction algorithms were devised after carefully observing the artists’ interpretation of the 3d facial landmarks. Secondly, based on the observation that the artists typically finalize the jaw pass animation before proceeding to finer detail, we solve for the jaw motion first and predict fine expressions with region-based networks conditioned on the jaw position. Finally, artists can optionally invoke a non-linear finetuning process on top of the FDLS solution to follow the motion-captured virtual markers as closely as possible. FDLS supports editing if needed to improve the results of the deep learning solution and it can handle small daily changes in the actor’s face shape.   FDLS permits reliable and production-quality performance solving with minimal training and little or no manual effort in many cases, while also allowing the solve to be guided and edited in unusual and difficult cases. The system has been under development for several years and has been used in major movies.

Supplemental Material

MP4 File
Supplemental video
PDF File
Supplemental document

References

[1]
Autodesk, Inc.2019. Maya. https://autodesk.com/maya
[2]
Thabo Beeler and Derek Bradley. 2014. Rigid Stabilization of Facial Expressions. ACM Trans. Graph. 33, 4, Article 44 (2014), 44:1–44:9 pages.
[3]
Amit Bermano, Thabo Beeler, Yeara Kozlov, Derek Bradley, Bernd Bickel, and Markus Gross. 2015. Detailed Spatio-Temporal Reconstruction of Eyelids. ACM Trans. Graph. 34, 4, Article 44 (jul 2015).
[4]
Eloïse Berson, Catherine Soladié, Vincent Barrielle, and Nicolas Stoiber. 2019. A Robust Interactive Facial Animation Editing System. Motion, Interaction and Games (Oct 2019).
[5]
Eloïse Berson, Catherine Soladié, and Nicolas Stoiber. 2020. Intuitive Facial Animation Editing Based On A Generative RNN Framework. Computer Graphics Forum 39, 8 (Nov 2020), 241–251.
[6]
Kiran S. Bhat, Rony Goldenthal, Yuting Ye, Ronald Mallet, and Michael Koperwas. 2013. High Fidelity Facial Animation Capture and Retargeting with Contours. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Anaheim, California) (SCA ’13). Association for Computing Machinery, New York, NY, USA, 7–14.
[7]
Bernd Bickel, Mario Botsch, Roland Angst, Wojciech Matusik, Miguel Otaduy, Hanspeter Pfister, and Markus Gross. 2007. Multi-Scale Capture of Facial Geometry and Motion. In ACM SIGGRAPH 2007 Papers (San Diego, California) (SIGGRAPH ’07). Association for Computing Machinery, 33–es.
[8]
Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 187–194.
[9]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
[10]
Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. 2010. High Resolution Passive Facial Performance Capture. ACM Trans. Graph. 29, 4, Article 41 (jul 2010).
[11]
D.S. Broomhead and D. Lowe. 1988. Multivariable Functional Interpolation and Adaptive Networks. Complex Systems 2(1988), 321–355.
[12]
Byoungwon Choe, Hanook Lee, and Hyeong seok Ko. 2001. Performance-Driven Muscle-Based Facial Animation. The Journal of Visualization and Computer Animation 12 (2001), 67–79.
[13]
Dimitar Dinev, Thabo Beeler, Derek Bradley, Moritz Bächer, Hongyi Xu, and Ladislav Kavan. 2018. User‐Guided Lip Correction for Facial Performance Capture. Computer Graphics Forum 37 (2018).
[14]
Bernhard Egger, William A. P. Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhöfer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romdhani, Christian Theobalt, Volker Blanz, and Thomas Vetter. 2020. 3D Morphable Face Models - Past, Present, and Future. ACM Trans. Graph. 39, 5 (2020), 157:1–157:38.
[15]
Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).
[16]
Luca Fascione. 2017. FACETS Sci-Tech Academy Award. http://oscars.org.
[17]
Yasutaka Furukawa and Jean Ponce. 2009. Dense 3D motion capture for human faces. IEEE Conference on Computer Vision and Pattern Recognition (Jun 2009).
[18]
Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2015. Driving High-Resolution Facial Scans with Video Performance Capture. ACM Trans. Graph. 34, 1, Article 8 (2015).
[19]
Pablo Garrido, Michael Zollhöfer, Chenglei Wu, Derek Bradley, Patrick Pérez, Thabo Beeler, and Christian Theobalt. 2016. Corrective 3D Reconstruction of Lips from Monocular Video. ACM Trans. Graph. 35, 6, Article 219 (2016).
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.
[21]
Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging Motion Capture and 3D Scanning for High-Fidelity Facial Performance Acquisition. In ACM SIGGRAPH 2011 Papers (Vancouver, British Columbia, Canada) (SIGGRAPH ’11). Association for Computing Machinery, New York, NY, USA, Article 74.
[22]
Oliver Klehm, Fabrice Rousselle, Marios Papas, Derek Bradley, Christophe Hery, Bernd Bickel, Wojciech Jarosz, and Thabo Beeler. 2015. Recent Advances in Facial Appearance Capture. Comput. Graph. Forum 34, 2 (2015), 709–733.
[23]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (Lake Tahoe, Nevada) (NIPS’12). Curran Associates Inc., Red Hook, NY, USA, 1097–1105.
[24]
Samuli Laine, Tero Karras, Timo Aila, Antti Herva, Shunsuke Saito, Ronald Yu, Hao Li, and Jaakko Lehtinen. 2017. Production-level facial performance capture using deep convolutional neural networks. In Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Los Angeles, CA, USA, July 28-30, 2017, Joseph Teran, Changxi Zheng, Stephen N. Spencer, Bernhard Thomaszewski, and KangKang Yin (Eds.). Eurographics Association / ACM, 10:1–10:10.
[25]
Mathieu Lamarre, J.P. Lewis, and Etienne Danvoye. 2018. Face Stabilization by Mode Pursuit for Avatar Construction in the Universe. In 2018 International Conference on Image and Vision Computing, IVCNZ 2018. IEEE, 1–6.
[26]
J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and Theory of Blendshape Facial Models. In Eurographics 2014 - State of the Art Reports, S. Lefebvre and M. Spagnuolo (Eds.).
[27]
Qing Li and Zhigang Deng. 2008. Orthogonal-Blendshape-Based Editing System for Facial Motion Capture Data. IEEE Comput. Graph. Appl. 28, 6 (nov 2008), 76–82.
[28]
Dong C. Liu and Jorge Nocedal. 1989. On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 1 (01 Aug 1989), 503–528.
[29]
Wan-Chun Ma, Mathieu Lamarre, Etienne Danvoye, Chongyang Ma, Manny Ko, Javier von der Pahlen, and Cyrus A. Wilson. 2016. Semantically-aware blendshape rigs from facial performance measurements. In SIGGRAPH ASIA 2016, Macao, December 5-8, 2016 - Technical Briefs, Johannes Kopf and Phillip Chi-Wing Fu (Eds.). ACM, 3.
[30]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
[31]
Lucio Moser, Mark Williams, Darren Hendler, and Doug Roble. 2018. High-Quality, Cost-Effective Facial Motion Capture Pipeline with 3D Regression. In ACM SIGGRAPH 2018 Talks (Vancouver, British Columbia, Canada) (SIGGRAPH ’18). 2 pages.
[32]
Jason Osipa. 2010. Stop Staring: Facial Modeling and Animation Done Right, 3rd Ed.Sybex.
[33]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5099–5108.
[34]
Jaewoo Seo, Geoffrey Irving, J.P. Lewis, and Junyong Noh. 2011. Compression and direct manipulation of complex blendshape models. ACM Trans. Graph. 30, 6, Article 164 (Dec. 2011), 164:1–164:10 pages.
[35]
Yeongho Seol, J.P. Lewis, Jaewoo Seo, Byungkuk Choi, Ken Anjyo, and Junyong Noh. 2012. Spacetime expression cloning for blendshapes. ACM Trans. Graph. 31, 2, Article 14 (April 2012), 14:1–14:12 pages.
[36]
Yeongho Seol, Wan-Chun Ma, and J. P. Lewis. 2016. Creating an Actor-Specific Facial Rig from Performance Capture. In Proceedings of the 2016 Symposium on Digital Production (Anaheim, California) (DigiPro ’16). Association for Computing Machinery, New York, NY, USA, 13–17. https://doi.org/10.1145/2947688.2947693
[37]
Yeongho Seol, Jaewoo Seo, Paul Hyunjin Kim, John P. Lewis, and Junyong Noh. 2011. Artist friendly facial animation retargeting. ACM Trans. Graph. 30, 6 (2011), 162.
[38]
Alex Smith, Sven Pohle, Wan-Chun Ma, Chongyang Ma, Xian-Chun Wu, Yanbing Chen, Etienne Danvoye, Jorge Jimenez, Sanjit Patel, Mike Sanders, and Cyrus A. Wilson. 2017. Emotion challenge: building a new photoreal facial performance pipeline for games. In Proc.  ACM SIGGRAPH Digital Production Symposium, Christopher Horvath, Cary B. Phillips, Andrew Pearce, Corban Gossett, and Stephen N. Spencer (Eds.). ACM, 8:1–8:2.
[39]
Andrea Tagliasacchi, Sofien Bouaziz, Mark Pauly, and Hao Li. 2016. Modern techniques and applications for real-time non-rigid registration. In SIGGRAPH ASIA 2016, Macao, December 5-8, 2016 - Courses. 11:1–11:25.
[40]
Vladimir Vapnik. 1998. Statistical learning theory. Wiley.
[41]
Various 2022. Commercial performance capture systems. https://di4d.com, https://www.dynamixyz.com, https://www.synthesia.io, https://studios.disneyresearch.com/medusa, https://image-metrics.com.
[42]
Lance Williams. 1990. Performance-Driven Facial Animation. SIGGRAPH Comput. Graph. 24, 4 (sep 1990), 235–242.
[43]
Wenwu Yang, Nathan Marshak, Daniel Sýkora, Srikumar Ramalingam, and Ladislav Kavan. 2019. Building anatomically realistic jaw kinematics model from data. The Visual Computer 35, 6-8 (2019), 1105–1118.
[44]
Li Zhang, Noah Snavely, Brian Curless, and Steven M. Seitz. 2004. Spacetime Faces: High Resolution Capture for Modeling and Animation. ACM Trans. Graph. 23, 3 (aug 2004), 548–558.
[45]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
[46]
M. Zollhöfer, J. Thies, P. Garrido, D. Bradley, T. Beeler, P. Pérez, M. Stamminger, M. Nießner, and C. Theobalt. 2018. State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications. Computer Graphics Forum 37, 2 (2018), 523–550.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DigiPro '22: Proceedings of the 2022 Digital Production Symposium
August 2022
85 pages
ISBN:9781450394185
DOI:10.1145/3543664
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep learning
  2. Facial animation
  3. Motion capture
  4. Optimization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DigiPro '22
Sponsor:
DigiPro '22: The Digital Production Symposium
August 7, 2022
BC, Vancouver, Canada

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 304
    Total Downloads
  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)8
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media