More Web Proxy on the site http://driver.im/

research-article

Toward 3D Face Reconstruction in Perspective Projection: Estimating 6DoF Face Pose From Monocular Image

Authors:

Yuanzhang Chang,

Zhen LeiAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 32

Pages 3080 - 3091

https://doi.org/10.1109/TIP.2023.3275535

Published: 01 January 2023 Publication History

Abstract

In 3D face reconstruction, orthogonal projection has been widely employed to substitute perspective projection to simplify the fitting process. This approximation performs well when the distance between camera and face is far enough. However, in some scenarios that the face is very close to camera or moving along the camera axis, the methods suffer from the inaccurate reconstruction and unstable temporal fitting due to the distortion under the perspective projection. In this paper, we aim to address the problem of single-image 3D face reconstruction under perspective projection. Specifically, a deep neural network, Perspective Network (PerspNet), is proposed to simultaneously reconstruct 3D face shape in canonical space and learn the correspondence between 2D pixels and 3D points, by which the 6DoF (6 Degrees of Freedom) face pose can be estimated to represent perspective projection. Besides, we contribute a large ARKitFace dataset to enable the training and evaluation of 3D face reconstruction solutions under the scenarios of perspective projection, which has 902,724 2D facial images with ground-truth 3D face mesh and annotated 6DoF pose parameters. Experimental results show that our approach outperforms current state-of-the-art methods by a significant margin. The code and data are available at <uri>https://github.com/cbsropenproject/6dof_face</uri>.

References

[1]

T. Yang, Y. Chen, Y. Lin, and Y. Chuang, “FSA-Net: Learning fine-grained structure aggregation for head pose estimation from a single image,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 1087–1096.

[2]

V. Albiero, X. Chen, X. Yin, G. Pang, and T. Hassner, “Img2pose: Face alignment and detection via 6DoF, face pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 7617–7627.

[3]

S. Romdhani and T. Vetter, “Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 2, Jun. 2005, pp. 986–993.

[4]

Y. Feng, F. Wu, X. Shao, Y. Wang, and X. Zhou, “Joint 3D face reconstruction and dense alignment with position map regression network,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 534–551.

[5]

Z. Bai, Z. Cui, X. Liu, and P. Tan, “Riggable 3D face reconstruction via in-network optimization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6216–6225.

[6]

Y. Feng, H. Feng, M. J. Black, and T. Bolkart, “Learning an animatable detailed 3D face model from in-the-wild images,” ACM Trans. Graph., vol. 40, no. 4, pp. 1–13, Aug. 2021.

Digital Library

[7]

X. Zhuet al., “Beyond 3DMM space: Towards fine-grained 3D face reconstruction,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 343–358.

[8]

A. Tewariet al., “MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1274–1283.

[9]

X. Zhu, X. Liu, Z. Lei, and S. Z. Li, “Face alignment in full pose range: A 3D total solution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 1, pp. 78–92, Jan. 2019.

Digital Library

[10]

J. Guo, X. Zhu, Y. Yang, F. Yang, Z. Lei, and S. Z. Li, “Towards fast, accurate and stable 3D dense face alignment,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 152–168.

[11]

CVCS. (2021). The Geometry of Perspective Projection. [Online]. Available: https://www.cse.unr.edu/~ebis/CS791E/Notes/PerspectiveProjection.pdf

[12]

D. Forsyth and J. Ponce, Computer Vision: A Modern Approach. Upper Saddle River, NJ, USA: Prentice-Hall, 2011.

Digital Library

[13]

O. Fried, E. Shechtman, D. B. Goldman, and A. Finkelstein, “Perspective-aware manipulation of portrait photos,” ACM Trans. Graph., vol. 35, no. 4, pp. 1–10, Jul. 2016.

Digital Library

[14]

Y. Zhaoet al., “Learning perspective undistortion of portraits,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7849–7859.

[15]

S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, “PVNet: Pixel-wise voting network for 6DoF pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4561–4570.

[16]

Y. He, W. Sun, H. Huang, J. Liu, H. Fan, and J. Sun, “PVN3D: A deep point-wise 3D keypoints voting network for 6DoF pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 11632–11641.

[17]

M. Tian, M. H. Ang, and G. H. Lee, “Shape prior deformation for categorical 6D object pose and size estimation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 530–546.

[18]

V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vis., vol. 81, no. 2, pp. 155–166, Feb. 2009.

Digital Library

[19]

G. Fanelli, M. Dantone, J. Gall, A. Fossati, and L. Van Gool, “Random forests for real time 3D face analysis,” Int. J. Comput. Vis., vol. 101, no. 3, pp. 437–458, Feb. 2013.

Digital Library

[20]

S. Sanyal, T. Bolkart, H. Feng, and M. J. Black, “Learning to regress 3D face shape and expression from an image without 3D supervision,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 7763–7772.

[21]

H. Yanget al., “FaceScape: A large-scale high quality 3D face dataset and detailed riggable 3D face prediction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 601–610.

[22]

A. Brunton, T. Bolkart, and S. Wuhrer, “Multilinear wavelets: A statistical shape space for human faces,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 297–312.

[23]

Y. Liu, A. Jourabloo, W. Ren, and X. Liu, “Dense face alignment,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2017, pp. 1619–1628.

[24]

A. T. Tran, T. Hassner, I. Masi, E. Paz, Y. Nirkin, and G. Medioni, “Extreme 3D face reconstruction: Seeing through occlusions,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3935–3944.

[25]

Y. Guo, L. Cai, and J. Zhang, “3D face from X: Learning face shape from diverse sources,” IEEE Trans. Image Process., vol. 30, pp. 3815–3827, 2021.

[26]

N. Chinaev, A. Chigorin, and I. Laptev, “MobileFace: 3D face reconstruction with efficient CNN regression,” in Proc. Comput. Vis.-ECCV Workshops, 2019, pp. 15–30.

[27]

X. Tuet al., “3D face reconstruction from a single image assisted by 2D face images in the wild,” IEEE Trans. Multimedia, vol. 23, pp. 1160–1172, 2021.

Digital Library

[28]

A. Lattaset al., “AvatarMe: Realistically renderable 3D facial reconstruction ‘in-the-wild,”’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 760–769.

[29]

A. Lattas, S. Moschoglou, S. Ploumpis, B. Gecer, A. Ghosh, and S. Zafeiriou, “AvatarMe++: Facial shape and BRDF inference with photorealistic rendering-aware GANs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 9269–9284, Dec. 2022.

[30]

M. Yang, J. Guo, Z. Cheng, X. Zhang, and D.-M. Yan, “Self-supervised high-fidelity and re-renderable 3D facial reconstruction from a single image,” 2021, arXiv:2111.08282.

[31]

L. Wang, Z. Chen, T. Yu, C. Ma, L. Li, and Y. Liu, “FaceVerse: A fine-grained and detail-controllable 3D face morphable model from a hybrid dataset,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 20333–20342.

[32]

R. Daněček, M. Black, and T. Bolkart, “EMOCA: Emotion driven monocular face capture and animation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 20311–20322.

[33]

Y. Deng, J. Yang, S. Xu, D. Chen, Y. Jia, and X. Tong, “Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019, pp. 285–295.

[34]

H. Hsu, T. Wu, S. Wan, W. H. Wong, and C. Lee, “QuatNet: Quaternion-based head pose estimation with multiregression loss,” IEEE Trans. Multimedia, vol. 21, no. 4, pp. 1035–1046, Apr. 2019.

[35]

M. Xin, S. Mo, and Y. Lin, “EVA-GCN: Head pose estimation based on graph convolutional networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2021, pp. 1462–1471.

[36]

F.-J. Chang, A. T. Tran, T. Hassner, I. Masi, R. Nevatia, and G. Medioni, “Deep, landmark-free FAME: Face alignment, modeling, and expression estimation,” Int. J. Comput. Vis., vol. 127, nos. 6–7, pp. 930–956, Jun. 2019.

Digital Library

[37]

F.-J. Chang, A. T. Tran, T. Hassner, I. Masi, R. Nevatia, and G. Medioni, “FacePoseNet: Making a case for landmark-free face alignment,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2017, pp. 1599–1608.

[38]

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Adv. Neural Inf. Process. Syst., vol. 28, 2015, pp. 91–99.

[39]

A. Savranet al., “Bosphorus database for 3D face analysis,” in Proc. Eur. Workshop Biometrics Identity Manage., 2008, pp. 47–56.

[40]

P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter, “A 3D face model for pose and illumination invariant face recognition,” in Proc. 6th IEEE Int. Conf. Adv. Video Signal Based Surveill., Sep. 2009, pp. 296–301.

[41]

C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou, “FaceWarehouse: A 3D facial expression database for visual computing,” IEEE Trans. Vis. Comput. Graphics, vol. 20, no. 3, pp. 413–425, Mar. 2014.

Digital Library

[42]

A. D. Bagdanov, A. Del Bimbo, and I. Masi, “The Florence 2D/3D hybrid face dataset,” in Proc. Joint ACM Workshop Hum. Gesture Behav. Understand., Dec. 2011, pp. 79–80.

[43]

R. K. Pillai, L. A. Jeni, H. Yang, Z. Zhang, L. Yin, and J. F. Cohn, “The 2nd 3D face alignment in the wild challenge (3DFAW-Video): Dense reconstruction from video,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 3082–3089.

[44]

X. Zhanget al., “BP4D-spontaneous: A high-resolution spontaneous 3D dynamic facial expression database,” Image Vis. Comput., vol. 32, no. 10, pp. 692–706, Oct. 2014.

[45]

Z. Wang and J.-C. Liu, “Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training,” Int. J. Document Anal. Recognit., vol. 24, nos. 1–2, pp. 63–75, Jun. 2021.

Digital Library

[46]

A. Vaswaniet al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5998–6008.

[47]

E. W. Weisstein. (2003). Barycentric Coordinates. [Online]. Available: https://mathworld.wolfram.com/

[48]

M. A. Fischler and R. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.

Digital Library

[49]

A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 8026–8037.

[50]

X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li, “Face alignment across large poses: A 3D solution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 146–155.

[51]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.

[52]

G. Bradski, “The OpenCV library,” Dr. Dobb’s J. Softw. Tools, 2000, Art. no. [Online]. Available: https://github.com/opencv/opencv/wiki/CiteOpenCV

[53]

K. Park, T. Patten, and M. Vincze, “Pix2Pose: Pixel-wise coordinate regression of objects for 6D pose estimation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7668–7677.

[54]

V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1867–1874.

[55]

A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks),” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1021–1030.

[56]

N. Ruiz, E. Chong, and J. M. Rehg, “Fine-grained head pose estimation without keypoints,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 2074–2083.

[57]

B. Huang, R. Chen, W. Xu, and Q. Zhou, “Improving head pose estimation using two-stage ensembles with top-k regression,” Image Vis. Comput., vol. 93, Jan. 2020, Art. no.

Digital Library

[58]

Z. Cao, Z. Chu, D. Liu, and Y. Chen, “A vector-based representation to enhance head pose estimation,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2021, pp. 1188–1197.

[59]

J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “RetinaFace: Single-shot multi-level face localisation in the wild,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5203–5212.

[60]

S. Hinterstoisseret al., “Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes,” in Proc. Asian Conf. Comput. Vis., 2012, pp. 548–562.

Cited By

Wang ZZhu XYu JZhang TLei ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single SketchProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681159(6453-6462)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681159
Xu MZhu XKao YChen ZLyu JLei Z(2024)Multi-Level Pixel-Wise Correspondence Learning for 6DoF Face Pose EstimationIEEE Transactions on Multimedia10.1109/TMM.2024.339188826(9423-9435)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3391888
Wang YHuang LZhao QXia ZZhao N(2024)Hybrid Shape Deformation for Face Reconstruction in Aesthetic OrthodonticsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338667134:9(8107-8121)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3386671
Show More Cited By

Recommendations

2D face fitting-assisted 3D face reconstruction for pose-robust face recognition
Special issue on Digital Information Forensics

Recent face recognition algorithm can achieve high accuracy when the tested face samples are frontal. However, when the face pose changes largely, the performance of existing methods drop drastically. Efforts on pose-robust face recognition are highly ...
2D-to-3D integrated face reconstruction for face recognition
ISCIT'09: Proceedings of the 9th international conference on Communications and information technologies

This paper proposes an analysis-by-synthesis framework for face recognition with variant pose, illumination and expression. First, an efficient 2D-to-3D Integrated face reconstruction approach is introduced to reconstruct a personalized 3D face model ...
Efficient 3D reconstruction for face recognition

Face recognition with variant pose, illumination and expression (PIE) is a challenging problem. In this paper, we propose an analysis-by-synthesis framework for face recognition with variant PIE. First, an efficient two-dimensional (2D)-to-three-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 32, Issue

2023

5324 pages

ISSN:1057-7149

Issue’s Table of Contents

1941-0042 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZZhu XYu JZhang TLei ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single SketchProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681159(6453-6462)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681159
Xu MZhu XKao YChen ZLyu JLei Z(2024)Multi-Level Pixel-Wise Correspondence Learning for 6DoF Face Pose EstimationIEEE Transactions on Multimedia10.1109/TMM.2024.339188826(9423-9435)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3391888
Wang YHuang LZhao QXia ZZhao N(2024)Hybrid Shape Deformation for Face Reconstruction in Aesthetic OrthodonticsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338667134:9(8107-8121)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3386671
Huang DShi YLiu JTang W(2024)Self-supervised learning for fine-grained monocular 3D face reconstruction in the wildMultimedia Systems10.1007/s00530-024-01436-330:4Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1007/s00530-024-01436-3

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents