[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3617695.3617701acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdiotConference Proceedingsconference-collections
research-article

Transformer-based high-fidelity StyleGAN inversion for face image editing

Published: 02 November 2023 Publication History

Abstract

Recently, there have been many methods utilizing pre-trained GAN generators for image editing. To apply these methods to the editing of real images, it is necessary to invert real images into the latent space before proceeding with the editing process.Existing GAN inversion methods have difficulty providing both high-fidelity inversion and high editability for face images. The most common inversion method based on the W+ space has enough ability to describe image details, but sacrifices editability. To address this problem, this paper proposes a Transformer-based StyleGAN inversion model to achieve high-quality editing of real face images. This paper uses the code of the StyleGAN2 mapping network as the initial query vector and the image features extracted by a CNN encoder as the key and value. Multiple updates to the query vector are made using a multi-head cross-attention module, and the final output query vector is used as the latent code of the inverted image and fed into the generator. The experimental results prove that our method can provide high-fidelity inversion for face images, and the edited images can effectively preserve identity information while achieving precise control over attributes.

References

[1]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.
[2]
Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.
[3]
Razavi, A., Van den Oord, A., & Vinyals, O. (2019). Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems, 32.
[4]
Abdal, R., Zhu, P., Mitra, N. J., & Wonka, P. (2021). Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG), 40(3), 1-21.
[5]
Shen, Y., Yang, C., Tang, X., & Zhou, B. (2020). Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE transactions on pattern analysis and machine intelligence, 44(4), 2004-2018.
[6]
Abdal, R., Zhu, P., Mitra, N. J., & Wonka, P. (2021). Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG), 40(3), 1-21.
[7]
Härkönen, E., Hertzmann, A., Lehtinen, J., & Paris, S. (2020). Ganspace: Discovering interpretable gan controls. Advances in neural information processing systems, 33, 9841-9850.
[8]
Shen, Y., & Zhou, B. (2021). Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1532-1540).
[9]
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401-4410).
[10]
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8110-8119).
[11]
Abdal, R., Qin, Y., & Wonka, P. (2019). Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4432-4441).
[12]
Abdal, R., Qin, Y., & Wonka, P. (2020). Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8296-8305).
[13]
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., & Cohen-Or, D. (2021). Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4), 1-14.
[14]
Zhu, J., Shen, Y., Zhao, D., & Zhou, B. (2020, August). In-domain gan inversion for real image editing. In European conference on computer vision (pp. 592-608). Cham: Springer International Publishing.
[15]
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., & Cohen-Or, D. (2021). Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2287-2296).
[16]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
[17]
Wei, T., Chen, D., Zhou, W., Liao, J., Zhang, W., Yuan, L., ... & Yu, N. (2022). E2Style: Improve the efficiency and effectiveness of StyleGAN inversion. IEEE Transactions on Image Processing, 31, 3267-3280.
[18]
Jahanian, A., Chai, L., & Isola, P. (2019). On the" steerability" of generative adversarial networks. arXiv preprint arXiv:1907.07171.
[19]
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586-595).
[20]
Alaluf, Y., Patashnik, O., & Cohen-Or, D. (2021). Restyle: A residual-based stylegan encoder via iterative refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6711-6720).
[21]
Kim, H., Choi, Y., Kim, J., Yoo, S., & Uh, Y. (2021). Exploiting spatial dimensions of latent in gan for real-time image editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 852-861).
[22]
Wang, T., Zhang, Y., Fan, Y., Wang, J., & Chen, Q. (2022). High-fidelity gan inversion for image attribute editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11379-11388).
[23]
Li, H., Liu, J., Zhang, X., Bai, Y., Wang, H., & Mueller, K. (2021). Transforming the latent space of stylegan for real face editing. arXiv preprint arXiv:2105.14230.
[24]
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90.
[25]
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4690-4699).
[26]
Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., & Han, J. (2019). On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265.
[27]
Zhang, M., Lucas, J., Ba, J., & Hinton, G. E. (2019). Lookahead optimizer: k steps forward, 1 step back. Advances in neural information processing systems, 32.
[28]
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
[29]
Hore, A., & Ziou, D. (2010, August). Image quality metrics: PSNR vs. SSIM. In 2010 20th international conference on pattern recognition (pp. 2366-2369). IEEE.
[30]
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612.
[31]
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30.
[32]
Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., ... & Huang, F. (2020). Curricularface: adaptive curriculum learning loss for deep face recognition. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5901-5910).

Cited By

View all
  • (2024)EditScribe: Non-Visual Image Editing with Natural Language Verification LoopsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675599(1-19)Online publication date: 27-Oct-2024

Index Terms

  1. Transformer-based high-fidelity StyleGAN inversion for face image editing
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        BDIOT '23: Proceedings of the 2023 7th International Conference on Big Data and Internet of Things
        August 2023
        232 pages
        ISBN:9798400708015
        DOI:10.1145/3617695
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 02 November 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • the Science and Technology Research Project of Chongqing Education Commission

        Conference

        BDIOT 2023

        Acceptance Rates

        Overall Acceptance Rate 75 of 136 submissions, 55%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)31
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 21 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)EditScribe: Non-Visual Image Editing with Natural Language Verification LoopsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675599(1-19)Online publication date: 27-Oct-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media