Abstract
Landmarks often play a key role in face analysis, but many aspects of identity or expression cannot be represented by sparse landmarks alone. Thus, in order to reconstruct faces more accurately, landmarks are often combined with additional signals like depth images or techniques like differentiable rendering. Can we keep things simple by just using more landmarks? In answer, we present the first method that accurately predicts 10\(\times \) as many landmarks as usual, covering the whole head, including the eyes and teeth. This is accomplished using synthetic training data, which guarantees perfect landmark annotations. By fitting a morphable model to these dense landmarks, we achieve state-of-the-art results for monocular 3D face reconstruction in the wild. We show that dense landmarks are an ideal signal for integrating face shape information across frames by demonstrating accurate and expressive facial performance capture in both monocular and multi-view scenarios. Finally, our method is highly efficient: we can predict dense landmarks and fit our 3D face model at over 150FPS on a single CPU thread. Please see our website: https://microsoft.github.io/DenseLandmarks/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alp Güler, R., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: DenseReg: fully convolutional dense shape regression in-the-wild. In: CVPR (2017)
Bagdanov, A.D., Del Bimbo, A., Masi, I.: The Florence 2D/3D hybrid face dataset. In: Workshop on Human Gesture and Behavior Understanding. ACM (2011)
Bai, Z., Cui, Z., Liu, X., Tan, P.: Riggable 3D face reconstruction via in-network optimization. In: CVPR (2021)
Beeler, T., Bickel, B., Beardsley, P., Sumner, B., Gross, M.: High-quality single-shot capture of facial geometry. In: ACM Transactions on Graphics (2010)
Beeler, T., et al.: High-quality passive facial performance capture using anchor frames. In: ACM Transactions on Graphics (2011)
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Computer Graphics and Interactive Techniques (1999)
Blanz, V., Vetter, T.: Face recognition based on fitting a 3d morphable model. TPAMI 25(9), 1063–1074 (2003)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3d human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Bradley, D., Heidrich, W., Popa, T., Sheffer, A.: High resolution passive facial performance capture. In: ACM Transactions on Graphics, vol. 29, no. 4 (2010)
Browatzki, B., Wallraven, C.: 3FabRec: Fast Few-shot Face alignment by Reconstruction. In: CVPR (2020)
Bulat, A., Sanchez, E., Tzimiropoulos, G.: Subpixel heatmap regression for facial landmark Localization. In: BMVC (2021)
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: ICCV (2017)
Cao, C., Chai, M., Woodford, O., Luo, L.: Stabilized real-time face tracking via a learned dynamic rigidity prior. ACM Trans. Graph. 37(6), 1–11 (2018)
Chandran, P., Bradley, D., Gross, M., Beeler, T.: Semantic deep face models. In: International Conference on 3D Vision (3DV) (2020)
Cong, M., Lan, L., Fedkiw, R.: Local geometric indexing of high resolution data for facial reconstruction from sparse markers. CoRR abs/1903.00119 (2019). www.arxiv.org/abs/1903.00119
Deng, J., Guo, J., Ververas, E., Kotsia, I., Zafeiriou, S.: RetinaFace: single-shot multi-level face localisation in the wild. In: CVPR (2020)
Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3d face reconstruction with weakly-supervised learning: from single image to image set. In: CVPR Workshops (2019)
Dib, A., et al.: Practical face reconstruction via differentiable ray tracing. Comput. Graph. Forum 40(2), 153–164 (2021)
Dib, A., Thebault, C., Ahn, J., Gosselin, P.H., Theobalt, C., Chevallier, L.: Towards high fidelity monocular face reconstruction with rich reflectance using self-supervised learning and ray tracing. In: CVPR (2021)
Dou, P., Kakadiaris, I.A.: Multi-view 3D face reconstruction with deep recurrent neural networks. Image Vis. Comput. 80, 80–91 (2018)
Dou, P., Shah, S.K., Kakadiaris, I.A.: End-to-end 3D face reconstruction with deep neural networks. In: CVPR (2017)
Falcon, W., et al.: Pytorch lightning 3(6) (2019). GitHub. Note. https://github.com/PyTorchLightning/pytorch-lightning
Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. (ToG) 40(4), 1–13 (2021)
Feng, Y., Wu, F., Shao, X., Wang, Y., Zhou, X.: Joint 3d face reconstruction and dense alignment with position map regression network. In: ECCV (2018)
Garrido, P., et al.: Reconstruction of personalized 3d face rigs from monocular video. ACM Trans. Graph. 35(3), 1–15 (2016)
Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., Freeman, W.T.: Unsupervised training for 3d morphable model regression. In: CVPR (2018)
Gerig, T., et al.: Morphable face models-an open framework. In: Automatic Face & Gesture Recognition (FG). IEEE (2018)
Grishchenko, I., Ablavatski, A., Kartynnik, Y., Raveendran, K., Grundmann, M.: Attention mesh: high-fidelity face mesh prediction in real-time. In: CVPR Workshops (2020)
Güler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: CVPR (2018)
Guo, J., Zhu, X., Yang, Y., Yang, F., Lei, Z., Li, S.Z.: Towards fast, accurate and stable 3d dense face alignment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 152–168. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_10
Guo, Y., Cai, J., Jiang, B., Zheng, J., et al.: Cnn-based real-time dense face reconstruction with inverse-rendered photo-realistic face images. TPAMI 41(6), 1294–1307 (2018)
Han, S., et al.: Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Trans. Graph. (TOG) 39(4), 1–87 (2020)
Hassner, T., Harel, S., Paz, E., Enbar, R.: Effective face frontalization in unconstrained images. In: CVPR (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3D face alignment from 2D videos in real-time. In: Automatic Face and Gesture Recognition (FG). IEEE (2015)
Kartynnik, Y., Ablavatski, A., Grishchenko, I., Grundmann, M.: Real-time facial surface geometry from monocular video on mobile GPUs. In: CVPR Workshops (2019)
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Kumar, A., et al.: Luvli face alignment: estimating landmarks’ location, uncertainty, and visibility likelihood. In: CVPR (2020)
Lewis, J.P., Cordner, M., Fong, N.: Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: SIGGRAPH (2000)
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. In: ACM Transactions on Graphics, (Proceedings SIGGRAPH Asia) (2017)
Li, Y., Yang, S., Zhang, S., Wang, Z., Yang, W., Xia, S.T., Zhou, E.: Is 2d heatmap representation even necessary for human pose estimation? (2021)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989). https://doi.org/10.1007/BF01589116
Liu, F., Zhu, R., Zeng, D., Zhao, Q., Liu, X.: Disentangling features in 3D face shapes for joint face reconstruction and recognition. In: CVPR (2018)
Liu, Y., Jourabloo, A., Ren, W., Liu, X.: Dense face alignment. In: ICCV Workshops (2017)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Morales, A., Piella, G., Sukno, F.M.: Survey on 3d face reconstruction from uncalibrated images. Comput. Sci. Rev. 40, 100400 (2021)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)
Piotraschke, M., Blanz, V.: Automated 3D face reconstruction from multiple images using quality measures. In: CVPR (2016)
Popa, T., South-Dickinson, I., Bradley, D., Sheffer, A., Heidrich, W.: Globally consistent space-time reconstruction. Comput. Graph. Forum 29(5), 1633–1642 (2010)
Richardson, E., Sela, M., Kimmel, R.: 3D face reconstruction by learning from synthetic data. In: 3DV. IEEE (2016)
Richardson, E., Sela, M., Or-El, R., Kimmel, R.: Learning detailed face reconstruction from a single image. In: CVPR (2017)
Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: database and results. Image Vis. Computi. (IMAVIS) 47, 3–18 (2016)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenet V2: Inverted residuals and linear bottlenecks. In: CVPR (2018)
Sanyal, S., Bolkart, T., Feng, H., Black, M.: Learning to regress 3d face shape and expression from an image without 3d supervision. In: CVPR (2019)
Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: CVPR (2006)
Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: ICCV (2017)
Shang, J.: Self-supervised monocular 3d face reconstruction by occlusion-aware multi-view geometry consistency. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 53–70. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_4
Taylor, J., et al.: Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Trans. Graph. (ToG) 35(4), 1–12 (2016)
Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.: The vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: CVPR (2012)
Tewari, A., et al.: FML: face model learning from videos. In: CVPR (2019)
Tewari, A., et al: Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz. In: CVPR (2018)
Tewari, A., et al.: Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: ICCV Workshops (2017)
Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34(6), 1–183 (2015)
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2Face: real-time face capture and reenactment of RGB videos. In: CVPR (2016)
Tran, L., Liu, F., Liu, X.: Towards high-fidelity nonlinear 3D face morphable model. In: CVPR (2019)
Tran, L., Liu, X.: Nonlinear 3d face morphable model. In: CVPR (2018)
Tuan Tran, A., Hassner, T., Masi, I., Medioni, G.: Regressing robust and discriminative 3D morphable models with a very deep neural network. In: CVPR (2017)
Wang, X., Bo, L., Fuxin, L.: Adaptive wing loss for robust face alignment via heatmap regression. In: ICCV (2019)
Wightman, R.: Pytorch image models (2019). https://www.github.com/rwightman/pytorch-image-models, https://doi.org/10.5281/zenodo.4414861
Wood, E., et al.: Fake it till you make it: Face analysis in the wild using synthetic data alone (2021)
Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. In: CVPR (2018)
Yi, H., et al.: MMFace: a multi-metric regression network for unconstrained face reconstruction. In: CVPR (2019)
Yoon, J.S., Shiratori, T., Yu, S.I., Park, H.S.: Self-supervised adaptation of high-fidelity face models for monocular performance tracking. In: CVPR (2019)
Zhou, Y., Deng, J., Kotsia, I., Zafeiriou, S.: Dense 3d face decoding over 2500fps: joint texture & shape convolutional mesh decoders. In: CVPR (2019)
Zhu, M., Shi, D., Zheng, M., Sadiq, M.: Robust facial landmark detection via occlusion-adaptive deep networks. In: CVPR (2019)
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3d solution. In: CVPR (2016)
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3d solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 146–155 (2016)
Zollhöfer, M., et al.: State of the art on monocular 3d face reconstruction, tracking, and applications. Comput. Graph. Forum 37(2), 523–550 (2018)
Acknowledgements
Thanks to Chirag Raman and Jamie Shotton for their contributions, and Jiaolong Yang and Timo Bolkart for help with evaluation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wood, E. et al. (2022). 3D Face Reconstruction with Dense Landmarks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13673. Springer, Cham. https://doi.org/10.1007/978-3-031-19778-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-19778-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19777-2
Online ISBN: 978-3-031-19778-9
eBook Packages: Computer ScienceComputer Science (R0)