More Web Proxy on the site http://driver.im/

research-article

Understanding and mitigating copying in diffusion models

AUTHORs:

Gowthami Somepalli,

Micah Goldblum,

Tom GoldsteinAuthors Info & Claims

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Article No.: 2071, Pages 47783 - 47803

Published: 10 December 2023 Publication History

Abstract

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role. In fact, we see in our experiments that data replication often does not happen for unconditional models, while it is common in the text-conditional case. Motivated by our findings, we then propose several techniques for reducing data replication at both training and inference time by randomizing and augmenting image captions in the training set. Code is available at https://github.com/somepago/DCR.

References

[1]

Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. Quantifying Memorization Across Neural Language Models. arxiv:2202.07646[cs], February 2022. URL http://arxiv.org/abs/2202.07646.

[2]

Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting Training Data from Diffusion Models. arxiv:2301.13188[cs], January 2023. URL http://arxiv.org/abs/2301.13188.

[3]

David Chess. Some light infringement?, December 2022. URL https://ceoln.wordpress.com/2022/12/16/some-light-infringement/.

[4]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248-255. Ieee, 2009.

[5]

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718, 2021.

[6]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.

[7]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840-6851, 2020.

[8]

Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, and Chiyuan Zhang. Measuring Forgetting of Memorized Training Examples. arxiv:2207.00099[cs], June 2022. URL http://arxiv.org/abs/2207.00099.

[9]

Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, and Jun-Yan Zhu. Ablating Concepts in Text-to-Image Diffusion Models. March 2023. URL https://arxiv.org/ abs/2303.13516v2.

[10]

Jooyoung Lee, Thai Le, Jinghui Chen, and Dongwon Lee. Do Language Models Plagiarize? arxiv:2203.07618[cs], March 2022. URL http://arxiv.org/abs/2203.07618.

[11]

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888-12900. PMLR, 2022.

[12]

Emad Mostaque. Some more from first batch. Lots of optimisation to do, took about an hour of playing about. Prompts here: https://t.co/4soGak9op0 negative important for 2.0 given how we flatten distribution of latents with dedupe etc Embeddings will make it easier out of the box https://t.co/sZxhrk1v8J, November 2022. URL https://twitter.com/EMostaque/status/1596907328548139008.

[13]

Alex Nichol. Dall-e 2 pre-training mitigations, 2022. URL https://openai.com/research/dall-e-2-pre-training-mitigations. Accessed: 2023-05-05.

[14]

Ed Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal, and Matthijs Douze. A Self-Supervised Descriptor for Image Copy Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14532-14542, 2022. URL https://openaccess.thecvf.com/content/CVPR2022/html/Pizzi_A_Self-Supervised_Descriptor_for_Image_Copy_Detection_CVPR_2022_paper.html.

[15]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021.

[16]

Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, and Florian Tramèr. Red-Teaming the Stable Diffusion Safety Filter. arxiv:2210.04610[cs], October 2022. URL http://arxiv.org/abs/2210.04610.

[17]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. Highresolution image synthesis with latent diffusion models, 2021.

[18]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234-241. Springer, 2015.

[19]

Joseph Saveri and Matthew Butterick. Stable Diffusion Litigation, 2023. URL https://stablediffusionlitigation.com/.

[20]

Patrick Schramowski, Manuel Brack, Björn Deiseroth, and Kristian Kersting. Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. arxiv:2211.05105[cs], April 2023. URL http://arxiv.org/abs/2211.05105.

[21]

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.

[22]

Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models. arxiv:2212.03860[cs], December 2022. URL http://arxiv.org/abs/2212.03860.

[23]

Stability AI. Stable Diffusion v2.1 and DreamStudio Update, December 2022. URL https://stability.ai/blog/stablediffusion2-1-release7-dec-2022.

[24]

Nikhil Vyas, Sham Kakade, and Boaz Barak. Provable Copyright Protection for Generative Models. arxiv:2302.10870[cs, stat], February 2023. URL http://arxiv.org/abs/2302.10870.

[25]

Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models. arXiv:2210.14896 [cs], 2022. URL https://arxiv.org/abs/2210.14896.

[26]

Ryan Webster, Julien Rabin, Loic Simon, and Frederic Jurie. This Person (Probably) Exists. Identity Membership Attacks Against GAN Generated Faces. arxiv:2107.06018[cs], July 2021. URL http://arxiv.org/abs/2107.06018.

[27]

Ryan Webster, Julien Rabin, Loic Simon, and Frederic Jurie. On the de-duplication of laion-2b. arXiv preprint arXiv:2303.12733, 2023.

[28]

Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery. February 2023. URL https://arxiv.org/abs/2302.03668v1.

Recommendations

Iterative Ensemble Training with Anti-gradient Control for Mitigating Memorization in Diffusion Models
Computer Vision – ECCV 2024
Abstract
Diffusion models, known for their tremendous ability to generate novel and high-quality samples, have recently raised concerns due to their data memorization behavior, which poses privacy risks. Recent approaches for memory mitigation either only ...
Diffusion models and semi-supervised learners benefit mutually with few labels
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called dual pseudo training (DPT), built upon strong semi-supervised learners and diffusion models. DPT operates in ...
Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been recently proposed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

December 2023

80772 pages

Copyright © 2023 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 December 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten