Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.16322 (cs)

[Submitted on 25 May 2023 (v1), last revised 29 Oct 2023 (this version, v3)]

Title:Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Authors:Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

View PDF

Abstract:Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challenges in understanding such complex texts and generating the corresponding images. Therefore, there is a growing need to enable more control modes beyond text description. In this paper, we introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one single model. Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models, eliminating the huge cost of training from scratch. Moreover, thanks to some dedicated adapter designs, Uni-ControlNet only necessitates a constant number (i.e., 2) of adapters, regardless of the number of local or global controls used. This not only reduces the fine-tuning costs and model size, making it more suitable for real-world deployment, but also facilitate composability of different conditions. Through both quantitative and qualitative comparisons, Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability. Code is available at \url{this https URL}.

Comments:	Camera Ready, Code is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Cite as:	arXiv:2305.16322 [cs.CV]
	(or arXiv:2305.16322v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.16322

Submission history

From: Dongdong Chen [view email]
[v1] Thu, 25 May 2023 17:59:58 UTC (15,242 KB)
[v2] Wed, 31 May 2023 01:48:23 UTC (15,242 KB)
[v3] Sun, 29 Oct 2023 15:59:24 UTC (15,072 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators