Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.02051 (cs)

[Submitted on 4 Apr 2023 (v1), last revised 23 Aug 2023 (this version, v2)]

Title:Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Authors:Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

View PDF

Abstract:Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations are publicly available at: this https URL.

Comments:	ICCV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Cite as:	arXiv:2304.02051 [cs.CV]
	(or arXiv:2304.02051v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.02051

Submission history

From: Marcella Cornia [view email]
[v1] Tue, 4 Apr 2023 18:03:04 UTC (30,680 KB)
[v2] Wed, 23 Aug 2023 12:45:27 UTC (33,098 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators