More Web Proxy on the site http://driver.im/

research-article

Transformer-based high-fidelity StyleGAN inversion for face image editing

Authors:

Lingling TaoAuthors Info & Claims

BDIOT '23: Proceedings of the 2023 7th International Conference on Big Data and Internet of Things

Pages 76 - 81

https://doi.org/10.1145/3617695.3617701

Published: 02 November 2023 Publication History

Abstract

Recently, there have been many methods utilizing pre-trained GAN generators for image editing. To apply these methods to the editing of real images, it is necessary to invert real images into the latent space before proceeding with the editing process.Existing GAN inversion methods have difficulty providing both high-fidelity inversion and high editability for face images. The most common inversion method based on the W+ space has enough ability to describe image details, but sacrifices editability. To address this problem, this paper proposes a Transformer-based StyleGAN inversion model to achieve high-quality editing of real face images. This paper uses the code of the StyleGAN2 mapping network as the initial query vector and the image features extracted by a CNN encoder as the key and value. Multiple updates to the query vector are made using a multi-head cross-attention module, and the final output query vector is used as the latent code of the inverted image and fed into the generator. The experimental results prove that our method can provide high-fidelity inversion for face images, and the edited images can effectively preserve identity information while achieving precise control over attributes.

References

[1]

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.

[2]

Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.

[3]

Razavi, A., Van den Oord, A., & Vinyals, O. (2019). Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems, 32.

[4]

Abdal, R., Zhu, P., Mitra, N. J., & Wonka, P. (2021). Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG), 40(3), 1-21.

[5]

Shen, Y., Yang, C., Tang, X., & Zhou, B. (2020). Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE transactions on pattern analysis and machine intelligence, 44(4), 2004-2018.

[6]

Abdal, R., Zhu, P., Mitra, N. J., & Wonka, P. (2021). Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG), 40(3), 1-21.

[7]

Härkönen, E., Hertzmann, A., Lehtinen, J., & Paris, S. (2020). Ganspace: Discovering interpretable gan controls. Advances in neural information processing systems, 33, 9841-9850.

[8]

Shen, Y., & Zhou, B. (2021). Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1532-1540).

[9]

Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401-4410).

[10]

Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8110-8119).

[11]

Abdal, R., Qin, Y., & Wonka, P. (2019). Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4432-4441).

[12]

Abdal, R., Qin, Y., & Wonka, P. (2020). Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8296-8305).

[13]

Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., & Cohen-Or, D. (2021). Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4), 1-14.

[14]

Zhu, J., Shen, Y., Zhao, D., & Zhou, B. (2020, August). In-domain gan inversion for real image editing. In European conference on computer vision (pp. 592-608). Cham: Springer International Publishing.

[15]

Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., & Cohen-Or, D. (2021). Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2287-2296).

[16]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

[17]

Wei, T., Chen, D., Zhou, W., Liao, J., Zhang, W., Yuan, L., ... & Yu, N. (2022). E2Style: Improve the efficiency and effectiveness of StyleGAN inversion. IEEE Transactions on Image Processing, 31, 3267-3280.

[18]

Jahanian, A., Chai, L., & Isola, P. (2019). On the" steerability" of generative adversarial networks. arXiv preprint arXiv:1907.07171.

[19]

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586-595).

[20]

Alaluf, Y., Patashnik, O., & Cohen-Or, D. (2021). Restyle: A residual-based stylegan encoder via iterative refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6711-6720).

[21]

Kim, H., Choi, Y., Kim, J., Yoo, S., & Uh, Y. (2021). Exploiting spatial dimensions of latent in gan for real-time image editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 852-861).

[22]

Wang, T., Zhang, Y., Fan, Y., Wang, J., & Chen, Q. (2022). High-fidelity gan inversion for image attribute editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11379-11388).

[23]

Li, H., Liu, J., Zhang, X., Bai, Y., Wang, H., & Mueller, K. (2021). Transforming the latent space of stylegan for real face editing. arXiv preprint arXiv:2105.14230.

[24]

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90.

[25]

Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4690-4699).

[26]

Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., & Han, J. (2019). On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265.

[27]

Zhang, M., Lucas, J., Ba, J., & Hinton, G. E. (2019). Lookahead optimizer: k steps forward, 1 step back. Advances in neural information processing systems, 32.

[28]

Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.

[29]

Hore, A., & Ziou, D. (2010, August). Image quality metrics: PSNR vs. SSIM. In 2010 20th international conference on pattern recognition (pp. 2366-2369). IEEE.

[30]

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612.

[31]

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30.

[32]

Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., ... & Huang, F. (2020). Curricularface: adaptive curriculum learning loss for deep face recognition. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5901-5910).

Cited By

Chang RLiu YZhang LGuo A(2024)EditScribe: Non-Visual Image Editing with Natural Language Verification LoopsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675599(1-19)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675599

Index Terms

Transformer-based high-fidelity StyleGAN inversion for face image editing
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Conditional reiterative High-Fidelity GAN inversion for image editing
Abstract
Our work introduces a conditional reiteration mechanism for High-Fidelity GAN (Generative Adversarial Networks) inversion (HFGI), preserving image-specific details (like background, appearance, etc.) for both normal and out-of-domain images (e.g. ...
Graphical abstract

Display Omitted
Highlights
- We proposed a Conditional Repetition Branch that aids in preserving the high-confidence region, capturing image-specific.
- The proposed method significantly improves the performance of reconstructing and editing out-of-the-domain ...
High-Fidelity Image Inpainting with GAN Inversion
Computer Vision – ECCV 2022
Abstract
Image inpainting seeks a semantically consistent way to recover the corrupted image in the light of its unmasked content. Previous approaches usually reuse the well-trained GAN as effective prior to generate realistic patches for missing holes ... $^{}$ $^{}$
High-fidelity instructional fashion image editing
Abstract
Instructional image editing has received a significant surge of attention recently. In this work, we are interested in the challenging problem of instructional image editing within the particular fashion realm, a domain with significant potential ...
Graphical abstract

Display Omitted

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

BDIOT '23: Proceedings of the 2023 7th International Conference on Big Data and Internet of Things

August 2023

232 pages

ISBN:9798400708015

DOI:10.1145/3617695

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the Science and Technology Research Project of Chongqing Education Commission

Conference

BDIOT 2023

BDIOT 2023: 2023 7th International Conference on Big Data and Internet of Things

August 11 - 13, 2023

Beijing, China

Acceptance Rates

Overall Acceptance Rate 75 of 136 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
38
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)2

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chang RLiu YZhang LGuo A(2024)EditScribe: Non-Visual Image Editing with Natural Language Verification LoopsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675599(1-19)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675599

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents