[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Toward 3D Face Reconstruction in Perspective Projection: Estimating 6DoF Face Pose From Monocular Image

Published: 01 January 2023 Publication History

Abstract

In 3D face reconstruction, orthogonal projection has been widely employed to substitute perspective projection to simplify the fitting process. This approximation performs well when the distance between camera and face is far enough. However, in some scenarios that the face is very close to camera or moving along the camera axis, the methods suffer from the inaccurate reconstruction and unstable temporal fitting due to the distortion under the perspective projection. In this paper, we aim to address the problem of single-image 3D face reconstruction under perspective projection. Specifically, a deep neural network, Perspective Network (PerspNet), is proposed to simultaneously reconstruct 3D face shape in canonical space and learn the correspondence between 2D pixels and 3D points, by which the 6DoF (6 Degrees of Freedom) face pose can be estimated to represent perspective projection. Besides, we contribute a large ARKitFace dataset to enable the training and evaluation of 3D face reconstruction solutions under the scenarios of perspective projection, which has 902,724 2D facial images with ground-truth 3D face mesh and annotated 6DoF pose parameters. Experimental results show that our approach outperforms current state-of-the-art methods by a significant margin. The code and data are available at <uri>https://github.com/cbsropenproject/6dof_face</uri>.

References

[1]
T. Yang, Y. Chen, Y. Lin, and Y. Chuang, “FSA-Net: Learning fine-grained structure aggregation for head pose estimation from a single image,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 1087–1096.
[2]
V. Albiero, X. Chen, X. Yin, G. Pang, and T. Hassner, “Img2pose: Face alignment and detection via 6DoF, face pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 7617–7627.
[3]
S. Romdhani and T. Vetter, “Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 2, Jun. 2005, pp. 986–993.
[4]
Y. Feng, F. Wu, X. Shao, Y. Wang, and X. Zhou, “Joint 3D face reconstruction and dense alignment with position map regression network,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 534–551.
[5]
Z. Bai, Z. Cui, X. Liu, and P. Tan, “Riggable 3D face reconstruction via in-network optimization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6216–6225.
[6]
Y. Feng, H. Feng, M. J. Black, and T. Bolkart, “Learning an animatable detailed 3D face model from in-the-wild images,” ACM Trans. Graph., vol. 40, no. 4, pp. 1–13, Aug. 2021.
[7]
X. Zhuet al., “Beyond 3DMM space: Towards fine-grained 3D face reconstruction,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 343–358.
[8]
A. Tewariet al., “MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1274–1283.
[9]
X. Zhu, X. Liu, Z. Lei, and S. Z. Li, “Face alignment in full pose range: A 3D total solution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 1, pp. 78–92, Jan. 2019.
[10]
J. Guo, X. Zhu, Y. Yang, F. Yang, Z. Lei, and S. Z. Li, “Towards fast, accurate and stable 3D dense face alignment,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 152–168.
[11]
CVCS. (2021). The Geometry of Perspective Projection. [Online]. Available: https://www.cse.unr.edu/~ebis/CS791E/Notes/PerspectiveProjection.pdf
[12]
D. Forsyth and J. Ponce, Computer Vision: A Modern Approach. Upper Saddle River, NJ, USA: Prentice-Hall, 2011.
[13]
O. Fried, E. Shechtman, D. B. Goldman, and A. Finkelstein, “Perspective-aware manipulation of portrait photos,” ACM Trans. Graph., vol. 35, no. 4, pp. 1–10, Jul. 2016.
[14]
Y. Zhaoet al., “Learning perspective undistortion of portraits,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7849–7859.
[15]
S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, “PVNet: Pixel-wise voting network for 6DoF pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4561–4570.
[16]
Y. He, W. Sun, H. Huang, J. Liu, H. Fan, and J. Sun, “PVN3D: A deep point-wise 3D keypoints voting network for 6DoF pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 11632–11641.
[17]
M. Tian, M. H. Ang, and G. H. Lee, “Shape prior deformation for categorical 6D object pose and size estimation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 530–546.
[18]
V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” Int. J. Comput. Vis., vol. 81, no. 2, pp. 155–166, Feb. 2009.
[19]
G. Fanelli, M. Dantone, J. Gall, A. Fossati, and L. Van Gool, “Random forests for real time 3D face analysis,” Int. J. Comput. Vis., vol. 101, no. 3, pp. 437–458, Feb. 2013.
[20]
S. Sanyal, T. Bolkart, H. Feng, and M. J. Black, “Learning to regress 3D face shape and expression from an image without 3D supervision,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 7763–7772.
[21]
H. Yanget al., “FaceScape: A large-scale high quality 3D face dataset and detailed riggable 3D face prediction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 601–610.
[22]
A. Brunton, T. Bolkart, and S. Wuhrer, “Multilinear wavelets: A statistical shape space for human faces,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 297–312.
[23]
Y. Liu, A. Jourabloo, W. Ren, and X. Liu, “Dense face alignment,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2017, pp. 1619–1628.
[24]
A. T. Tran, T. Hassner, I. Masi, E. Paz, Y. Nirkin, and G. Medioni, “Extreme 3D face reconstruction: Seeing through occlusions,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3935–3944.
[25]
Y. Guo, L. Cai, and J. Zhang, “3D face from X: Learning face shape from diverse sources,” IEEE Trans. Image Process., vol. 30, pp. 3815–3827, 2021.
[26]
N. Chinaev, A. Chigorin, and I. Laptev, “MobileFace: 3D face reconstruction with efficient CNN regression,” in Proc. Comput. Vis.-ECCV Workshops, 2019, pp. 15–30.
[27]
X. Tuet al., “3D face reconstruction from a single image assisted by 2D face images in the wild,” IEEE Trans. Multimedia, vol. 23, pp. 1160–1172, 2021.
[28]
A. Lattaset al., “AvatarMe: Realistically renderable 3D facial reconstruction ‘in-the-wild,”’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 760–769.
[29]
A. Lattas, S. Moschoglou, S. Ploumpis, B. Gecer, A. Ghosh, and S. Zafeiriou, “AvatarMe++: Facial shape and BRDF inference with photorealistic rendering-aware GANs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 9269–9284, Dec. 2022.
[30]
M. Yang, J. Guo, Z. Cheng, X. Zhang, and D.-M. Yan, “Self-supervised high-fidelity and re-renderable 3D facial reconstruction from a single image,” 2021, arXiv:2111.08282.
[31]
L. Wang, Z. Chen, T. Yu, C. Ma, L. Li, and Y. Liu, “FaceVerse: A fine-grained and detail-controllable 3D face morphable model from a hybrid dataset,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 20333–20342.
[32]
R. Daněček, M. Black, and T. Bolkart, “EMOCA: Emotion driven monocular face capture and animation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 20311–20322.
[33]
Y. Deng, J. Yang, S. Xu, D. Chen, Y. Jia, and X. Tong, “Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019, pp. 285–295.
[34]
H. Hsu, T. Wu, S. Wan, W. H. Wong, and C. Lee, “QuatNet: Quaternion-based head pose estimation with multiregression loss,” IEEE Trans. Multimedia, vol. 21, no. 4, pp. 1035–1046, Apr. 2019.
[35]
M. Xin, S. Mo, and Y. Lin, “EVA-GCN: Head pose estimation based on graph convolutional networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2021, pp. 1462–1471.
[36]
F.-J. Chang, A. T. Tran, T. Hassner, I. Masi, R. Nevatia, and G. Medioni, “Deep, landmark-free FAME: Face alignment, modeling, and expression estimation,” Int. J. Comput. Vis., vol. 127, nos. 6–7, pp. 930–956, Jun. 2019.
[37]
F.-J. Chang, A. T. Tran, T. Hassner, I. Masi, R. Nevatia, and G. Medioni, “FacePoseNet: Making a case for landmark-free face alignment,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2017, pp. 1599–1608.
[38]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Adv. Neural Inf. Process. Syst., vol. 28, 2015, pp. 91–99.
[39]
A. Savranet al., “Bosphorus database for 3D face analysis,” in Proc. Eur. Workshop Biometrics Identity Manage., 2008, pp. 47–56.
[40]
P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter, “A 3D face model for pose and illumination invariant face recognition,” in Proc. 6th IEEE Int. Conf. Adv. Video Signal Based Surveill., Sep. 2009, pp. 296–301.
[41]
C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou, “FaceWarehouse: A 3D facial expression database for visual computing,” IEEE Trans. Vis. Comput. Graphics, vol. 20, no. 3, pp. 413–425, Mar. 2014.
[42]
A. D. Bagdanov, A. Del Bimbo, and I. Masi, “The Florence 2D/3D hybrid face dataset,” in Proc. Joint ACM Workshop Hum. Gesture Behav. Understand., Dec. 2011, pp. 79–80.
[43]
R. K. Pillai, L. A. Jeni, H. Yang, Z. Zhang, L. Yin, and J. F. Cohn, “The 2nd 3D face alignment in the wild challenge (3DFAW-Video): Dense reconstruction from video,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 3082–3089.
[44]
X. Zhanget al., “BP4D-spontaneous: A high-resolution spontaneous 3D dynamic facial expression database,” Image Vis. Comput., vol. 32, no. 10, pp. 692–706, Oct. 2014.
[45]
Z. Wang and J.-C. Liu, “Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training,” Int. J. Document Anal. Recognit., vol. 24, nos. 1–2, pp. 63–75, Jun. 2021.
[46]
A. Vaswaniet al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5998–6008.
[47]
E. W. Weisstein. (2003). Barycentric Coordinates. [Online]. Available: https://mathworld.wolfram.com/
[48]
M. A. Fischler and R. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.
[49]
A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 8026–8037.
[50]
X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li, “Face alignment across large poses: A 3D solution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 146–155.
[51]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[52]
G. Bradski, “The OpenCV library,” Dr. Dobb’s J. Softw. Tools, 2000, Art. no. [Online]. Available: https://github.com/opencv/opencv/wiki/CiteOpenCV
[53]
K. Park, T. Patten, and M. Vincze, “Pix2Pose: Pixel-wise coordinate regression of objects for 6D pose estimation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7668–7677.
[54]
V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1867–1874.
[55]
A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks),” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1021–1030.
[56]
N. Ruiz, E. Chong, and J. M. Rehg, “Fine-grained head pose estimation without keypoints,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 2074–2083.
[57]
B. Huang, R. Chen, W. Xu, and Q. Zhou, “Improving head pose estimation using two-stage ensembles with top-k regression,” Image Vis. Comput., vol. 93, Jan. 2020, Art. no.
[58]
Z. Cao, Z. Chu, D. Liu, and Y. Chen, “A vector-based representation to enhance head pose estimation,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2021, pp. 1188–1197.
[59]
J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “RetinaFace: Single-shot multi-level face localisation in the wild,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5203–5212.
[60]
S. Hinterstoisseret al., “Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes,” in Proc. Asian Conf. Comput. Vis., 2012, pp. 548–562.

Cited By

View all
  • (2024)S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single SketchProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681159(6453-6462)Online publication date: 28-Oct-2024
  • (2024)Multi-Level Pixel-Wise Correspondence Learning for 6DoF Face Pose EstimationIEEE Transactions on Multimedia10.1109/TMM.2024.339188826(9423-9435)Online publication date: 22-Apr-2024
  • (2024)Hybrid Shape Deformation for Face Reconstruction in Aesthetic OrthodonticsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338667134:9(8107-8121)Online publication date: 1-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 32, Issue
2023
5324 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single SketchProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681159(6453-6462)Online publication date: 28-Oct-2024
  • (2024)Multi-Level Pixel-Wise Correspondence Learning for 6DoF Face Pose EstimationIEEE Transactions on Multimedia10.1109/TMM.2024.339188826(9423-9435)Online publication date: 22-Apr-2024
  • (2024)Hybrid Shape Deformation for Face Reconstruction in Aesthetic OrthodonticsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338667134:9(8107-8121)Online publication date: 1-Sep-2024
  • (2024)Self-supervised learning for fine-grained monocular 3D face reconstruction in the wildMultimedia Systems10.1007/s00530-024-01436-330:4Online publication date: 5-Aug-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media