[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-031-25063-7_13guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Third Time’s the Charm? Image and Video Editing with StyleGAN3

Published: 16 February 2023 Publication History

Abstract

StyleGAN is arguably one of the most intriguing and well-studied generative models, demonstrating impressive performance in image generation, inversion, and manipulation. In this work, we explore the recent StyleGAN3 architecture, compare it to its predecessor, and investigate its unique advantages, as well as drawbacks. In particular, we demonstrate that while StyleGAN3 can be trained on unaligned data, one can still use aligned data for training, without hindering the ability to generate unaligned imagery. Next, our analysis of the disentanglement of the different latent spaces of StyleGAN3 indicates that the commonly used W/W+ spaces are more entangled than their StyleGAN2 counterparts, underscoring the benefits of using the StyleSpace for fine-grained editing. Considering image inversion, we observe that existing encoder-based techniques struggle when trained on unaligned data. We therefore propose an encoding scheme trained solely on aligned data, yet can still invert unaligned images. Finally, we introduce a novel video inversion and editing workflow that leverages the capabilities of a fine-tuned StyleGAN3 generator to reduce texture sticking and expand the field of view of the edited video.

References

[1]
Abdal, R., Qin, Y., Wonka, P.: Image2stylegan: how to embed images into the stylegan latent space? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4432–4441 (2019)
[2]
Abdal, R., Qin, Y., Wonka, P.: Image2stylegan++: how to edit the embedded images? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8296–8305 (2020)
[3]
Abdal, R., Zhu, P., Mitra, N., Wonka, P.: Styleflow: attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows (2020)
[4]
Alaluf, Y., Patashnik, O., Cohen-Or, D.: Only a matter of style: age transformation using a style-based regression model. ACM Trans. Graph. 40(4) (2021).
[5]
Alaluf, Y., Patashnik, O., Cohen-Or, D.: Restyle: a residual-based stylegan encoder via iterative refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021
[6]
Alaluf, Y., Tov, O., Mokady, R., Gal, R., Bermano, A.H.: Hyperstyle: Stylegan inversion with hypernetworks for real image editing (2021)
[7]
Bau, D., et al.: Paint by word (2021)
[8]
Bau, D., et al.: Semantic photo manipulation with a generative image prior 38(4) (2019)., https://doi.org/10.1145/3306346.3323023
[9]
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: Stargan v2: diverse image synthesis for multiple domains (2020)
[10]
Collins, E., Bala, R., Price, B., Süsstrunk, S.: Editing in style: uncovering the local semantics of GANs. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5770–5779 (2020)
[11]
Eastwood, C., Williams, C.K.: A framework for the quantitative evaluation of disentangled representations. In: International Conference on Learning Representations (2018)
[12]
Gal, R., Patashnik, O., Maron, H., Chechik, G., Cohen-Or, D.: Stylegan-nada: clip-guided domain adaptation of image generators (2021)
[13]
Goodfellow, I.J., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2014, Cambridge, MA, USA, pp. 2672–2680. MIT Press (2014)
[14]
Guan, S., Tai, Y., Ni, B., Zhu, F., Huang, F., Yang, X.: Collaborative learning for faster stylegan embedding. arXiv preprint arXiv:2007.01758 (2020)
[15]
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: Ganspace: discovering interpretable GAN controls. arXiv preprint arXiv:2004.02546 (2020)
[16]
Hou, X., Zhang, X., Shen, L., Lai, Z., Wan, J.: Guidedstyle: Attribute knowledge guided style manipulation for semantic face editing (2020)
[17]
Huang, Y., et al.: Curricularface: adaptive curriculum learning loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5901–5910 (2020)
[18]
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
[19]
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data (2020)
[20]
Karras, T., et al.: Alias-free generative adversarial networks. CoRR abs/2106.12423 (2021)
[21]
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
[22]
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
[23]
King DE DLIB-ML: a machine learning toolkit J. Mach. Learn. Res. 2009 10 1755-1758
[24]
Ling, H., Kreis, K., Li, D., Kim, S.W., Torralba, A., Fidler, S.: Editgan: high-precision semantic image editing. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
[25]
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild (2015)
[26]
Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2437–2445 (2020)
[27]
Park, T., et al.: Swapping autoencoder for deep image manipulation. arXiv preprint arXiv:2007.00653 (2020)
[28]
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: Styleclip: text-driven manipulation of stylegan imagery (2021)
[29]
Pidhorskyi, S., Adjeroh, D.A., Doretto, G.: Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14104–14113 (2020)
[30]
Rahaman, N., et al: On the spectral bias of neural networks. In: International Conference on Machine Learning, pp. 5301–5310. PMLR (2019)
[31]
Richardson, E., et al.: Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
[32]
Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal tuning for latent-based editing of real images. arXiv preprint arXiv:2106.05744 (2021)
[33]
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020)
[34]
Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in GANs. arXiv preprint arXiv:2007.06600 (2020)
[35]
Skorokhodov, I., Sotnikov, G., Elhoseiny, M.: Aligning latent and image spaces to connect the unconnectable. arXiv preprint arXiv:2104.06954 (2021)
[36]
Tewari, A., et al.: Stylerig: rigging stylegan for 3D control over portrait images. arXiv preprint arXiv:2004.00121 (2020)
[37]
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for stylegan image manipulation (2021)
[38]
Tzaban, R., Mokady, R., Gal, R., Bermano, A.H., Cohen-Or, D.: Stitch it in time: gan-based facial editing of real videos (2022)
[39]
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402. IEEE (2003)
[40]
Wu, Z., Lischinski, D., Shechtman, E.: Stylespace analysis: disentangled controls for stylegan image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12863–12872 (2021)
[41]
Xia, W., Zhang, Y., Yang, Y., Xue, J.H., Zhou, B., Yang, M.H.: Gan inversion: a survey (2021)
[42]
Xu, Y., AlBahar, B., Huang, J.B.: Temporally consistent semantic video editing. arXiv e-prints pp. arXiv-2206 (2022)
[43]
Yao, X., Newson, A., Gousseau, Y., Hellier, P.: A latent transformer for disentangled face editing in images and videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13789–13798 (2021)
[44]
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
[45]
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image editing. arXiv preprint arXiv:2004.00049 (2020)
[46]
Zhu, P., Abdal, R., Femiani, J., Wonka, P.: Barbershop: GAN-based image compositing using segmentation masks. ACM Trans. Graph. 40(6) (2021).
[47]
Zhu, P., Abdal, R., Qin, Y., Wonka, P.: Improved stylegan embedding: where are the good latents? ArXiv abs/2012.09036 (2020)

Cited By

View all
  • (2024)Empirical comparison of evolutionary approaches for searching the latent space of Generative Adversarial Networks for the human face generation problemProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664147(1631-1639)Online publication date: 14-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II
Oct 2022
788 pages
ISBN:978-3-031-25062-0
DOI:10.1007/978-3-031-25063-7

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 16 February 2023

Author Tags

  1. Generative Adversarial Networks
  2. Image and video editing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Empirical comparison of evolutionary approaches for searching the latent space of Generative Adversarial Networks for the human face generation problemProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664147(1631-1639)Online publication date: 14-Jul-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media