Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.05184 (cs)

[Submitted on 7 Jun 2024 (v1), last revised 15 Dec 2024 (this version, v4)]

Title:The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

Authors:Scott Geng, Cheng-Yu Hsieh, Vivek Ramanujan, Matthew Wallingford, Chun-Liang Li, Pang Wei Koh, Ranjay Krishna

Abstract:Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. Does the intermediate generator provide additional information over directly training on relevant parts of the upstream data? Grounding this question in the setting of image classification, we compare finetuning on task-relevant, targeted synthetic data generated by Stable Diffusion -- a generative model trained on the LAION-2B dataset -- against finetuning on targeted real images retrieved directly from LAION-2B. We show that while synthetic data can benefit some downstream tasks, it is universally matched or outperformed by real data from the simple retrieval baseline. Our analysis suggests that this underperformance is partially due to generator artifacts and inaccurate task-relevant visual details in the synthetic images. Overall, we argue that targeted retrieval is a critical baseline to consider when training with synthetic data -- a baseline that current methods do not yet surpass. We release code, data, and models at this https URL.

Comments:	Correspondence to sgeng at cs dot washington dot edu. RK and PWK equally advised the project
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.05184 [cs.CV]
	(or arXiv:2406.05184v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.05184

Submission history

From: Scott Geng [view email]
[v1] Fri, 7 Jun 2024 18:04:21 UTC (29,661 KB)
[v2] Wed, 3 Jul 2024 06:00:50 UTC (29,661 KB)
[v3] Wed, 11 Dec 2024 08:56:37 UTC (29,746 KB)
[v4] Sun, 15 Dec 2024 00:13:20 UTC (29,746 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators