More Web Proxy on the site http://driver.im/

research-article

Self‐supervised non‐rigid structure from motion with improved training of Wasserstein GANs

Authors:

Xiangyang Peng,

Mingfeng JiangAuthors Info & Claims

IET Computer Vision, Volume 17, Issue 4

Pages 404 - 414

https://doi.org/10.1049/cvi2.12175

Published: 06 February 2023 Publication History

Abstract

This study proposes a self‐supervised method to reconstruct 3D limbic structures from 2D landmarks extracted from a single view. The loss of self‐consistency can be reduced by performing a random orthogonal projection of the reconstructed 3D structure. Thus, the training process can be self‐supervised by using geometric self‐consistency in the reconstruction–projection–reconstruction process. The self‐supervised network mainly consists of graph convolution and Transformer encoders. This network is called the SS‐Graphformer. By adding a discriminator, the SS‐Graphformer is used as a generator to form a Wasserstein Generative Adversarial Network architecture with a Gradient Penalty to improve the accuracy of the reconstruction. It is experimentally demonstrated that the addition of the 2D structure discriminator can significantly improve the accuracy of the reconstruction.

Graphical Abstract

We present SS‐Graphformer, a graph convolution and Transformer‐based method for 3D structure reconstruction from 2D landmarks. In addition, geometric self‐consistency is used to achieve self‐supervision; when combined with the 2D structure discriminator, the accuracy of the reconstruction can be improved. Extensive experiments show that our model achieves state‐of‐the‐art performance on two popular data sets.

References

[1]

Bregler, C., Hertzmann, A., Biermann, H.: Recovering non‐rigid 3D shape from image streams. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 690–696. IEEE (2000)

[2]

Novotny, D., et al.: C3DPO: canonical 3D pose networks for non‐rigid structure from motion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7688–7697. (2019)

[3]

Zeng, H., et al.: PR‐RRN: pairwise‐regularized residual‐recursive networks for non‐rigid structure‐from‐motion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5600–5609. (2021)

[4]

Akhter, I., et al.: Nonrigid structure from motion in trajectory space. Adv. Neural Inf. Process. Syst. 21, 41–48 (2008)

[5]

Dai, Y., Li, H., He, M.: A simple prior‐free method for non‐rigid structure‐from‐motion factorization. Int. J. Comput. Vis. 107(2), 101–122 (2014). https://doi.org/10.1007/s11263-013-0684-2

Digital Library

[6]

Kumar, S.: Non‐rigid structure from motion: prior‐free factorization method revisited. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 51–60. (2020)

[7]

Akhter, I., Sheikh, Y., Khan, S.: Defense of orthonormality constraints for nonrigid structure from motion. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1534–1541. IEEE (2009)

[8]

Kong, C., Lucey, S.: Deep non‐rigid structure from motion with missing data. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4365–4377 (2020). https://doi.org/10.1109/tpami.2020.2997026

[9]

Park, S., Lee, M., Kwak, N.: Procrustean regression: a flexible alignment‐based framework for nonrigid structure estimation. IEEE Trans. Image Process. 27(1), 249–264 (2017). https://doi.org/10.1109/tip.2017.2757280

[10]

Park, S., Lee, M., Kwak, N.: Procrustean regression networks: learning 3D structure of non‐rigid objects from 2D annotations. In: European Conference on Computer Vision, pp. 1–18. Springer (2020)

[11]

Wang, C., Paul, L.S.: Procrustean autoencoder for unsupervised lifting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 434–443 (2021)

[12]

Sidhu, V., et al.: Neural dense non‐rigid structure from motion with latent space constraints. In: European Conference on Computer Vision, pp. 204–222. Springer (2020)

[13]

Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton‐based action recognition. In: Thirty‐Second AAAI Conference on Artificial Intelligence (2018)

[14]

Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 6000–6010 (2017)

[15]

Bozic, A., et al.: Transformerfusion: monocular RGB scene reconstruction using transformers. Adv. Neural Inf. Process. Syst. 34, 1403–1414(2021)

[16]

Zheng, C., et al.: 3D human pose estimation with spatial and temporal transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11656–11665 (2021)

[17]

Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014)

Digital Library

[18]

Hoffman, J., et al.: CyCADA: cycle‐consistent adversarial domain adaptation. In: International Conference on Machine Learning, pp. 1989–1998. Pmlr (2018)

[19]

Zhu, J.Y., et al.: Unpaired image‐to‐image translation using cycle‐consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232. (2017)

[20]

Drover, D., et al.: Can 3D pose be learned from 2D projections alone? In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, p. 0. (2018)

[21]

Kudo, Y., et al.: Unsupervised adversarial learning of 3D human pose from 2D joint locations. arXiv preprint arXiv:1803.08244 (2018)

[22]

Wandt, B., Rosenhahn, B.: RepNet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7782–7791. (2019)

[23]

Chen, Y., et al.: Adversarial PoseNet: a structure‐aware convolutional network for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1212–1221. (2017)

[24]

Fish Tung, H.Y., et al.: Adversarial inverse graphics networks: learning 2D‐to‐3D lifting and image‐to‐image translation from unpaired supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4354–4362. (2017)

[25]

Kanazawa, A., et al.: End‐to‐end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131. (2018)

[26]

Chen, C.H., et al.: Unsupervised 3D pose estimation with geometric self‐supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5714–5724. (2019)

[27]

Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

[28]

Gulrajani, I., et al.: Improved training of Wasserstein GANs. Adv. Neural Inf. Process. Syst. 30, 5769–5779 (2017)

[29]

Akhter, I., et al.: Trajectory space: a dual representation for nonrigid structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1442–1456 (2010). https://doi.org/10.1109/tpami.2010.201

Digital Library

[30]

Kingma, D.P., Jimmy, B.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

[31]

Lee, M., et al.: Procrustean normal distribution for non‐rigid structure from motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1280–1287. (2013)

[32]

Deng, H., et al.: Deep non‐rigid structure‐from‐motion: a sequence‐to‐sequence translation perspective. arXiv preprint arXiv:2204.04730 (2022)

[33]

Wang, C., Lin, C.H., Lucey, S.: Deep NRSfM++: towards unsupervised 2D‐3D lifting in the wild. In: 2020 International Conference on 3D Vision (3DV), pp. 12–22. IEEE (2020)

Recommendations

Improved training of wasserstein GANs
NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only poor samples ...
Sparse-View CT Reconstruction Using Wasserstein GANs
Machine Learning for Medical Image Reconstruction
Abstract
We propose a 2D computed tomography (CT) slice image reconstruction method from a limited number of projection images using Wasserstein generative adversarial networks (wGAN). Our wGAN optimizes the 2D CT image reconstruction by utilizing an ... $_{}$ $_{}$ $_{}$
Revisit Self-supervised Depth Estimation with Local Structure-from-Motion
Computer Vision – ECCV 2024
Abstract
Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IET Computer Vision

IET Computer Vision Volume 17, Issue 4

June 2023

129 pages

EISSN:1751-9640

DOI:10.1049/cvi2.v17.4

Issue’s Table of Contents

© 2023 The Authors. IET Computer Vision published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 06 February 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents