More Web Proxy on the site http://driver.im/

Article

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360 $^{\circ}$

Authors:

Yuxiao He, Yiyu Zhuang,

Hao ZhuAuthors Info & Claims

Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LVI

Pages 254 - 272

https://doi.org/10.1007/978-3-031-72992-8_15

Published: 30 October 2024 Publication History

Abstract

Creating a

360^{\circ}

parametric model of a human head is a very challenging task. While recent advancements have demonstrated the efficacy of leveraging synthetic data for building such parametric head models, their performance remains inadequate in crucial areas such as expression-driven animation, hairstyle editing, and text-based modifications. In this paper, we build a dataset of artist-designed high-fidelity human heads and propose to create a novel parametric

360^{\circ}

renderable parametric head model from it. Our scheme decouples the facial motion/shape and facial appearance, which are represented by a classic parametric 3D mesh model and an attached neural texture, respectively. We further propose a training method for decompositing hairstyle and facial appearance, allowing free-swapping of the hairstyle. A novel inversion fitting method is presented based on single image input with high generalization and fidelity. To the best of our knowledge, our model is the first parametric 3D full-head that achieves

360^{\circ}

free-view synthesis, image-based fitting, appearance editing, and animation within a single model. Experiments show that facial motions and appearances are well disentangled in the parametric space, leading to SOTA performance in rendering and animating quality. The code and SynHead100 dataset are released in https://nju-3dv.github.io/projects/Head360.

References

[1]

An, S., Xu, H., Shi, Y., Song, G., Ogras, U.Y., Luo, L.: PanoHead: geometry-aware 3D full-head synthesis in 360deg. In: CVPR, pp. 20950–20959 (2023)

[2]

Apple: Arkit (2023). https://developer.apple.com/augmented-reality/arkit/

[3]

Bagautdinov, T., Wu, C., Saragih, J., Fua, P., Sheikh, Y.: Modeling facial geometry using compositional VAEs. In: CVPR, pp. 3877–3886 (2018)

[4]

Baocai Y, Yanfeng S, Chengzhang W, and Yun G BJUT-3D large scale 3D face database and information processing J. Comput. Res. Dev. 2009 6 020 4

[5]

Blanz, V., Vetter, T., et al.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH, vol. 99, pp. 187–194 (1999)

[6]

Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions. In: CVPR, pp. 18392–18402 (2023)

[7]

Cao C, Weng Y, Zhou S, Tong Y, and Zhou K FaceWarehouse: a 3D facial expression database for visual computing TVCG 2013 20 3 413-425

[8]

Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR, pp. 16123–16133 (2022)

[9]

Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR, pp. 5799–5809 (2021)

[10]

Cheng, S., Bronstein, M., Zhou, Y., Kotsia, I., Pantic, M., Zafeiriou, S.: MeshGAN: non-linear 3D morphable models of faces. arXiv preprint arXiv:1903.10384 (2019)

[11]

Cheng, S., Kotsia, I., Pantic, M., Zafeiriou, S.: 4DFAB: a large scale 4D database for facial expression analysis and biometric applications. In: CVPR, pp. 5117–5126 (2018)

[12]

Cosker, D., Krumhuber, E., Hilton, A.: A FACS valid 3D dynamic action unit database with applications to 3D dynamic morphable facial modeling. In: ICCV, pp. 2296–2303. IEEE (2011)

[13]

Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, and Bharath AA Generative adversarial networks: an overview IEEE Sig. Process. Mag. 2018 35 1 53-65

[14]

Dai H, Pears N, Smith W, and Duncan C Statistical modeling of craniofacial shape and texture IJCV 2020 128 547-571

Digital Library

[15]

Debevec P The light stages and their applications to photoreal digital actors SIGGRAPH Asia 2012 2 4 1-6

[16]

Deng, Y., Yang, J., Chen, D., Wen, F., Tong, X.: Disentangled and controllable face image generation via 3D imitative-contrastive learning. In: CVPR, pp. 5154–5163 (2020)

[17]

Deng, Y., Yang, J., Xiang, J., Tong, X.: GRAM: generative radiance manifolds for 3D-aware image generation. In: CVPR, pp. 10673–10683 (2022)

[18]

Egger B et al. 3D morphable face models-past, present, and future ToG 2020 39 5 1-38

Digital Library

[19]

Goodfellow I et al. Generative adversarial networks Commun. ACM 2020 63 11 139-144

Digital Library

[20]

Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3D aware generator for high-resolution image synthesis. In: ICLR (2021)

[21]

Gui J, Sun Z, Wen Y, Tao D, and Ye J A review on generative adversarial networks: algorithms, theory, and applications TKDE 2021 35 4 3313-3332

Digital Library

[22]

Haque, A., Tancik, M., Efros, A., Holynski, A., Kanazawa, A.: Instruct-NeRF2NeRF: editing 3D scenes with instructions. In: ICCV (2023)

[23]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

[24]

He Q et al. Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, et al. EmoTalk3D: high-fidelity free-view synthesis of emotional 3D talking head ECCV 2024 2024 Cham Springer 55-72

Digital Library

[25]

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NIPS, vol. 33, pp. 6840–6851 (2020)

[26]

Hong, Y., Peng, B., Xiao, H., Liu, L., Zhang, J.: HeadNeRF: a real-time nerf-based parametric head model. In: CVPR, pp. 20374–20384 (2022)

[27]

Hore, A., Ziou, D.: Image quality metrics: PSNR vs. SSIM. In: ICPR, pp. 2366–2369. IEEE (2010)

[28]

Huang, Z., Chan, K.C., Jiang, Y., Liu, Z.: Collaborative diffusion for multi-modal face generation and editing. In: CVPR, pp. 6080–6090 (2023)

[29]

Jiang, Z.H., Wu, Q., Chen, K., Zhang, J.: Disentangled representation learning for 3D face shape. In: CVPR, pp. 11957–11966 (2019)

[30]

Kammoun A, Slama R, Tabia H, Ouni T, and Abid M Generative adversarial networks for face generation: a survey ACM Comput. Surv. 2022 55 5 1-37

Digital Library

[31]

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)

[32]

Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR, pp. 8110–8119 (2020)

[33]

Kim, M., Liu, F., Jain, A., Liu, X.: DCFace: synthetic face generation with dual condition diffusion model. In: CVPR, pp. 12715–12725 (2023)

[34]

Li T, Bolkart T, Black MJ, Li H, and Romero J Learning a model of facial shape and expression from 4D scans ToG 2017 36 6 194

Digital Library

[35]

Manjunath, B., Chellappa, R., von der Malsburg, C.: A feature based approach to face recognition. In: Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1992)

[36]

Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, and Ng R Vedaldi A, Bischof H, Brox T, and Frahm J-M NeRF: representing scenes as neural radiance fields for view synthesis Computer Vision – ECCV 2020 2020 Cham Springer 405-421

Digital Library

[37]

Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, and Ng R NeRF: representing scenes as neural radiance fields for view synthesis Commun. ACM 2021 65 1 99-106

Digital Library

[38]

Niemeyer, M., Geiger, A.: GIRAFFE: representing scenes as compositional generative neural feature fields. In: CVPR, pp. 11453–11464 (2021)

[39]

Pan, D., et al.: RenderMe-360: a large digital asset library and benchmarks towards high-fidelity head avatars. In: NIPS Datasets and Benchmarks Track (2023)

[40]

Pearson, K.: LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)

[41]

Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: Seminal Graphics Papers: Pushing the Boundaries, vol. 2, pp. 577–582 (2023)

[42]

Savran A et al. Schouten B, Juul NC, Drygajlo A, Tistarelli M, et al. Bosphorus database for 3D face analysis Biometrics and Identity Management 2008 Heidelberg Springer 47-56

Digital Library

[43]

Sun, J., et al.: Next3D: generative neural texture rasterization for 3D-aware head avatars. In: CVPR, pp. 20991–21002 (2023)

[44]

Sun, J., et al.: FENeRF: face editing in neural radiance fields. In: CVPR, pp. 7672–7682 (2022)

[45]

Sun, X., et al.: VividTalk: one-shot audio-driven talking head generation based on 3D hybrid prior. arXiv preprint arXiv:2312.01841 (2023)

[46]

Tewari, A., et al.: Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz. In: CVPR, pp. 2549–2559 (2018)

[47]

Thies J, Zollhöfer M, and Nießner M Deferred neural rendering: image synthesis using neural textures ToG 2019 38 4 1-12

Digital Library

[48]

Toshpulatov M, Lee W, and Lee S Generative adversarial networks and their application to 3D face generation: a survey Image Vis. Comput. 2021 108

[49]

Tran, L., Liu, F., Liu, X.: Towards high-fidelity nonlinear 3D face morphable model. In: CVPR, pp. 1126–1135 (2019)

[50]

Tran, L., Liu, X.: Nonlinear 3D face morphable model. In: CVPR, pp. 7346–7355 (2018)

[51]

Tran L and Liu X On learning 3D face morphable model from in-the-wild images PAMI 2019 43 1 157-171

[52]

Tucker L Some mathematical notes on three-mode factor analysis Psychometrika 1966 31 3 279-311

[53]

Vesdapunt N, Rundle M, Wu HT, and Wang B Vedaldi A, Bischof H, Brox T, and Frahm J-M JNR: joint-based neural rig representation for compact 3D face modeling Computer Vision – ECCV 2020 2020 Cham Springer 389-405

Digital Library

[54]

Vlasic D, Brand M, Pfister H, and Popović J Face transfer with multilinear models ToG 2005 24 3 426-433

Digital Library

[55]

Wang, L., Chen, Z., Yu, T., Ma, C., Li, L., Liu, Y.: FaceVerse: a fine-grained and detail-controllable 3D face morphable model from a hybrid dataset. In: CVPR, pp. 20333–20342 (2022)

[56]

Wang, T., et al.: RODIN: a generative model for sculpting 3D digital avatars using diffusion. In: CVPR, pp. 4563–4573 (2023)

[57]

Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR, pp. 8798–8807 (2018)

[58]

Wang Z, Bovik AC, Sheikh HR, and Simoncelli EP Image quality assessment: from error visibility to structural similarity TIP 2004 13 4 600-612

Digital Library

[59]

Wood, E., Baltrušaitis, T., Hewitt, C., Dziadzio, S., Cashman, T.J., Shotton, J.: Fake it till you make it: face analysis in the wild using synthetic data alone. In: CVPR, pp. 3681–3691 (2021)

[60]

Wu, M., Zhu, H., Huang, L., Zhuang, Y., Lu, Y., Cao, X.: High-fidelity 3D face generation from natural language descriptions. In: CVPR, pp. 4521–4530 (2023)

[61]

Xia W, Zhang Y, Yang Y, Xue JH, Zhou B, and Yang MH GAN inversion: a survey PAMI 2022 45 3 3121-3138

[62]

Xiao, Y., Zhu, H., Yang, H., Diao, Z., Lu, X., Cao, X.: Detailed facial geometry recovery from multi-view images by learning an implicit function. In: AAAI, vol. 36, pp. 2839–2847 (2022)

[63]

Yang, H., et al.: FaceScape: a large-scale high quality 3D face dataset and detailed riggable 3D face prediction. In: CVPR (2020)

[64]

Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: FG, pp. 211–216. IEEE (2006)

[65]

Yu C, Gao C, Wang J, Yu G, Shen C, and Sang N BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation IJCV 2021 129 3051-3068

Digital Library

[66]

Yu C, Wang J, Peng C, Gao C, Yu G, and Sang N Ferrari V, Hebert M, Sminchisescu C, and Weiss Y BiSeNet: bilateral segmentation network for real-time semantic segmentation Computer Vision – ECCV 2018 2018 Cham Springer 334-349

Digital Library

[67]

Yu, H., Zhu, H., Lu, X., Liu, J.: Migrating face swap to mobile devices: a lightweight framework and a supervised training solution. In: ICME, pp. 1–6. IEEE (2022)

[68]

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595 (2018)

[69]

Zhang, X., et al.: A high-resolution spontaneous 3D dynamic facial expression database. In: FG, pp. 1–6. IEEE (2013)

[70]

Zhang X et al. BP4D-spontaneous: a high-resolution spontaneous 3D dynamic facial expression database Image Vis. Comput. 2014 32 10 692-706

[71]

Zhu H et al. FaceScape: 3D facial dataset and benchmark for single-view 3D face reconstruction PAMI 2023 45 12 14528-14545

Digital Library

[72]

Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: CVPR, pp. 146–155 (2016)

[73]

Zhu X, Liu X, Lei Z, and Li SZ Face alignment in full pose range: a 3D total solution PAMI 2017 41 1 78-92

Digital Library

[74]

Zhuang Y, Zhu H, Sun X, and Cao X Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T MoFaNeRF: morphable facial neural radiance field ECCV 2022 2022 Cham Springer 268-285

Digital Library

Index Terms

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360 $^{\circ}$
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Computational photography
  2. Computer graphics
2. Social and professional topics
  1. Professional topics
    1. History of computing
      1. History of computing theory

Index terms have been assigned to the content through auto-classification.

Recommendations

3D Gaussian Parametric Head Model
Computer Vision – ECCV 2024
Abstract
Creating high-fidelity 3D human head avatars is crucial for applications in VR/AR, telepresence, digital human interfaces, and film production. Recent advances have leveraged morphable face models to generate animated head avatars from easily ...
3D head-talk: speech synthesis 3D head movement face animation
Abstract
Speech-driven 3D human face animation has made admirable progress. However, synthesizing 3D facial speakers with head motion is still an unsolved problem. This is because head motion, as a speech-independent appearance representation, is difficult ...
EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
Computer Vision – ECCV 2024
Abstract
We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LVI

Sep 2024

582 pages

ISBN:978-3-031-72991-1

DOI:10.1007/978-3-031-72992-8

Editors:
Aleš Leonardis
University of Birmingham, Birmingham, UK
,
Elisa Ricci
https://ror.org/05trd4x28University of Trento, Trento, Italy
,
Stefan Roth
Technical University of Darmstadt, Darmstadt, Hessen, Germany
,
Olga Russakovsky
Princeton University, Palo Alto, CA, USA
,
Torsten Sattler
Czech Technical University in Prague, Prague, Czech Republic
,
Gül Varol
École des Ponts ParisTech, Marne-la-Vallée, France

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 October 2024

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents