Abstract
Large-scale multimedia datasets such as the Internet image and video collections provide new opportunities to understand and analyze human actions, among which one of the most interesting type is facial performance. In this paper, we present an automatic reconstruction system of detailed face performances. Many existing facial performance reconstruction systems rely on data captured under controlled environments with densely spaced cameras and lights. On the contrary, our system reconstructs detailed facial geometry from just one image or a monocular video sequence with unknown lighting. To achieve this, we first simultaneously track 2D and 3D sparse features, then reconstruct the low frequency facial geometry by performing a 2D-3D feature trajectory fusion optimization, which we formulate as a linear problem that can be solved efficiently. Finally, we use a per-pixel shape-from-shading algorithm to estimate the fine-scale geometry details such as wrinkles to further improve the reconstruction fidelity. We demonstrate the accuracy of our system with reconstruction results using both single images and monocular video sequences.
Similar content being viewed by others
References
Aldrian O, Smith WAP (2013) Inverse rendering of faces with a 3D morphable model. IEEE Trans Pattern Anal Mach Intell 35(5):1080–1093
Basri R, Jacobs D (2003) Lambertian reflectance and linear subspaces. IEEE Trans Pattern Anal Mach Intell 25(2):218–233
Beeler T, Bickel B, Beardsley P, Sumner B, Gross M (2010) High-quality single-shot capture of facial geometry. ACM Trans Graph 29(4):40:1–40:9
Bickel B, Botsch M, Angst R, Matusik W, Otaduy M, Pfister H, Gross M (2007) Multi-scale capture of facial geometry and motion. ACM Trans Graph 26 (3):33:1–33:10
Bouaziz S, Wang YY, Pauly Mark (2013) Online modeling for realtime facial animation. ACM Trans Graph 32(4):40:1–40:10
Bradley D, Heidrich W, Popa T, Sheffer A (2010) High resolution passive facial performance capture. ACM Trans Graph 29(4):41:1–41:10
Bregler C, Hertzmann A, Biermann H (2000) Recovering non-rigid 3D shape from image streams. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 690–696
Cao C, Hou Q, Zhou K (2014) Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans Graph 33(4):43:1–43:10
Cao C, Weng Y, Zhou S, Tong Y, Zhou K (2014) FaceWarehouse: a 3D facial expression database for visual computing. IEEE Trans Vis Comput Graph 20(3):413–425
Cao C, Bradley D, Zhou K, Beeler T (2015) Real-time high-fidelity facial performance capture. ACM Trans Graph 34(4):46:1–46:9
Dai Y, Li H, He M (2012) A simple prior-free method for non-rigid structure-from-motion factorization. In: Proceeding of IEEE conference on computer vision and pattern recognition, pp 2018–2025
Gao Z, Zhang L-F, Chen M-Y, Hauptmann A, Zhang H, Cai A (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657
Gao Z, Zhang H, Xu GP, Xue YB, Hauptmannc A G (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97
Garrido P, Valgaert L, Wu C, Theobalt C (2013) Reconstructing detailed dynamic face geometry from monocular video. ACM Trans Graph 32(6):158:1–158:10
Garrido P, Valgaerts L, Sarmadi H, Steiner I, Varanasi K, Perez P, Theobalt C (2015) VDub: modifying face vedio of actors for plausible visual alignment to a dubbed audio track. Comput Graphic Forum 34(2):193–204
Garrido P, Zollhofer M, Casas D, Valgaerts L (2016) Reconstruction of personalized 3D face rigs from monocular video. ACM Trans Graph 35(3):28:1–28:15
Guenter B, Grimm C, Wood D (1998) Making faces. In: Processing of ACM SIGGRAPH 1998, pp 55–66
Hartley R, Ziserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, p 2003
He X, Gao M, Kan M, Wang D (2017) BiRank: towards ranking on bipartite graphs. IEEE Trans Knowl Data Eng 29(1):57–71
Huang H, Chai J, Tong X, Wu H-T (2011) Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM Trans Graph 30 (4):74:1–74:10
Huber P, Hu G, Tena R, Kittler J (2016) A multiresolution 3D Morphable Face Model and fitting framework. In: Proceeding of international conference on computer vision theory and applications, pp 1–8
Li H, Adams B, Guibas LJ, Pauly M (2009) Robust single-view geometry and motion reconstruction. ACM Trans Graph 28(5):175:1–175:10
Li H, Yu J, Ye Y, Bregler C (2013) Realtime facial animation with on-the-fly correctives. ACM Trans Graph 32(4):42:1–42:10
Ma W-C, Jones A, Chiang J-Y, Hawkins T, Frederiksen S, Peers P, Vukovic M, Ouhyong M, Debevec P (2008) Facial performance synthesis using deformation-driven polynomial displacement maps. ACM Trans Graph 27(5):121:1–121:10
Matthews I, Baker S (2004) Active appearance models revisited. Int J Comput Vis 60(2):135–164
Shi F, Wu H-T, Tong X, Chai J (2014) Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans Graph 33(6):222:1–222:13
Suwajanakorn S, Kemelmacher-Shlizerman I, Seitz SM (2014) Total moving face reconstruction. In: Processing of European conference on computer vision (ECCV), pp 796–812
Tian F, Liu X, Liu Z, Sun N,Wang M,Wang H, Zhang F (2017) Multimedia integrated annotation based on common space learning. Multimed Tools Appl 1–20. https://doi.org/10.1007/s11042-017-5068-0
Tian F, Shen X, Liu X (2017) Multimedia automatic annotation by mining label set correlation. Multimed Tools Appl 1–17. https://doi.org/10.1007/s11042-017-5170-3
Tian F, Shen X, Shang F (2017) Automatic image annotation with real-world community contributed data set. Multimed Syst 1–12. https://doi.org/10.1007/s00530-017-0548-7
Valgaerts L, Wu C, Bruhn A, Seidel H-P, Theobalt C (2012) Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans Graph 31(6):187:1–187:11
Weise T, Bouaziz S, Li H, Pauly M (2011) Realtime performance-based facial animation. ACM Trans Graph 30(4):77:1–77:10
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) Multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742
Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15(3):572–581
Zhang L, Snavely N, Curless B, Seitz SM (2004) Spacetime faces: high resolution capture for modeling and animation. ACM Trans Graph 23(3):548–558
Zhang H, Yang Y, Luan H, Yang S, Chua T-S (2014) Start from scratch: towards automatically identifying, modeling, and naming visual attributes. In: Proceedings of the 22nd ACM international conference on multimedia, pp 187–196
Zhang H, Wang M, Hong R, Chua T-S (2016) Play and rewind: optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on multimedia conference, pp 781–790
Acknowledgements
This work is supported by National Key R&D Program of China (2017YFB1002702).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, S., Shen, X. & Zhang, Y. 3D facial feature and expression computing from Internet image or video. Multimed Tools Appl 77, 22231–22246 (2018). https://doi.org/10.1007/s11042-018-5895-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5895-7