Computer Science > Computer Vision and Pattern Recognition

arXiv:2210.04506 (cs)

[Submitted on 10 Oct 2022]

Title:Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

Authors:Wanfeng Zheng, Qiang Li, Xiaoyan Guo, Pengfei Wan, Zhongyuan Wang

View PDF

Abstract:Text-driven image manipulation is developed since the vision-language model (CLIP) has been proposed. Previous work has adopted CLIP to design a text-image consistency-based objective to address this issue. However, these methods require either test-time optimization or image feature cluster analysis for single-mode manipulation direction. In this paper, we manage to achieve inference-time optimization-free diverse manipulation direction mining by bridging CLIP and StyleGAN through Latent Alignment (CSLA). More specifically, our efforts consist of three parts: 1) a data-free training strategy to train latent mappers to bridge the latent space of CLIP and StyleGAN; 2) for more precise mapping, temporal relative consistency is proposed to address the knowledge distribution bias problem among different latent spaces; 3) to refine the mapped latent in s space, adaptive style mixing is also proposed. With this mapping scheme, we can achieve GAN inversion, text-to-image generation and text-driven image manipulation. Qualitative and quantitative comparisons are made to demonstrate the effectiveness of our method.

Comments:	20 pages, 23 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2210.04506 [cs.CV]
	(or arXiv:2210.04506v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2210.04506

Submission history

From: Qiang Li Capasso [view email]
[v1] Mon, 10 Oct 2022 09:17:35 UTC (9,166 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators