More Web Proxy on the site http://driver.im/

research-article

Filter-Guided Diffusion for Controllable Image Generation

Authors:

Abe DavisAuthors Info & Claims

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Article No.: 101, Pages 1 - 10

https://doi.org/10.1145/3641519.3657489

Published: 13 July 2024 Publication History

Abstract

Recent advances in diffusion-based generative models have shown incredible promise for zero shot image-to-image translation and editing. Most of these approaches work by combining or replacing network-specific features used in the generation of new images with those taken from the inversion of some guide image. Methods of this type are considered the current state-of-the-art in training-free approaches, but have some notable limitations: they tend to be costly in runtime and memory, and often depend on deterministic sampling that limits variation in generated results. We propose Filter-Guided Diffusion (FGD), an alternative approach that leverages fast filtering operations during the diffusion process to support finer control over the strength and frequencies of guidance and can work with non-deterministic samplers to produce greater variety. With its efficiency, FGD can be sampled over multiple seeds and hyperparameters in less time than a single run of other SOTA methods to produce superior results based on structural and semantic metrics. We conduct extensive quantitative and qualitative experiments to evaluate the performance of FGD in translation tasks and also demonstrate its potential in localized editing when used with masks.

Supplemental Material

MP4 File - presentation

presentation

Download
176.22 MB

MP4 File - presentation

presentation

Download
457.89 MB

PDF File

An appendix, and a web gallery of more results.

Download
11.16 MB

ZIP File

An appendix, and a web gallery of more results.

Download
25.18 MB

References

[1]

Mahmoud Afifi, Marcus A Brubaker, and Michael S Brown. 2021. Histogan: Controlling colors of gan-generated and real images via color histograms. In Proc. Computer Vision and Pattern Recognition (CVPR). 7941–7950.

[2]

Soonmin Bae, Sylvain Paris, and Frédo Durand. 2006. Two-scale tone management for photographic look. ACM Trans. Graph. 25, 3 (jul 2006), 637–645. https://doi.org/10.1145/1141911.1141935

Digital Library

[3]

Tim Brooks, Aleksander Holynski, and Alexei A Efros. 2023. Instructpix2pix: Learning to follow image editing instructions. In Proc. Computer Vision and Pattern Recognition (CVPR). 18392–18402.

[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Adv. Neural Inform. Process. Syst. 33 (2020), 1877–1901.

[5]

Peter J. Burt and Edward H. Adelson. 1987. The Laplacian Pyramid as a Compact Image Code. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 671–679.

[6]

Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2photo: Internet image montage. ACM Trans. Graph. 28, 5 (2009), 1–10.

Digital Library

[7]

Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. 2021. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. arxiv:2108.02938 [cs.CV]

[8]

Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proc. Computer Vision and Pattern Recognition (CVPR). 8188–8197.

[9]

Elmar Eisemann and Frédo Durand. 2004. Flash Photography Enhancement via Intrinsic Relighting. 23, 3 (aug 2004), 673–678. https://doi.org/10.1145/1015706.1015778

Digital Library

[10]

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022).

[11]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Adv. Neural Inform. Process. Syst. 33 (2020), 6840–6851.

[12]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. Proc. Computer Vision and Pattern Recognition (CVPR) (2017).

[13]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).

[14]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proc. Computer Vision and Pattern Recognition (CVPR). 4401–4410.

[15]

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-based real image editing with diffusion models. In Proc. Computer Vision and Pattern Recognition (CVPR). 6007–6017.

[16]

Gwanghyun Kim, Taesung Kwon, and Jong Chul Ye. 2022. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proc. Computer Vision and Pattern Recognition (CVPR). 2426–2435.

[17]

Gihyun Kwon and Jong Chul Ye. 2022. Diffusion-based image translation using disentangled style and content representation. arXiv preprint arXiv:2209.15264 (2022).

[18]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proc. Int. Conf. on Computer Vision (ICCV).

Digital Library

[19]

Chenlin Meng, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).

[20]

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2023. Null-text inversion for editing real images using guided diffusion models. In Proc. Computer Vision and Pattern Recognition (CVPR). 6038–6047.

[21]

Aude Oliva, Antonio Torralba, and Philippe G. Schyns. 2006. Hybrid Images. ACM Trans. Graph. 25, 3 (jul 2006), 527–532. https://doi.org/10.1145/1141911.1141919

Digital Library

[22]

Sylvain Paris, Samuel W Hasinoff, and Jan Kautz. 2011. Local Laplacian filters: Edge-aware image processing with a Laplacian pyramid.ACM Trans. Graph. 30, 4 (2011), 68.

Digital Library

[23]

Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In Proc. European Conference on Computer Vision (ECCV). Springer, 319–345.

Digital Library

[24]

Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, and Krishna Kumar Singh. 2022. Spatially-adaptive multilayer selection for gan inversion and editing. In Proc. Computer Vision and Pattern Recognition (CVPR). 11399–11409.

[25]

Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. 2023. Zero-shot Image-to-Image Translation. arXiv preprint arXiv:2302.03027 (2023).

[26]

Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen, Hugues Hoppe, and Kentaro Toyama. 2004. Digital Photography with Flash and No-Flash Image Pairs. ACM Trans. Graph. 23, 3 (aug 2004), 664–672. https://doi.org/10.1145/1015706.1015777

Digital Library

[27]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In Int. Conf. Mach. Learn. PMLR, 8748–8763.

[28]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proc. Computer Vision and Pattern Recognition (CVPR). 10684–10695.

[29]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).

[30]

C. Tomasi and R. Manduchi. 1998. Bilateral filtering for gray and color images. In Proc. Int. Conf. on Computer Vision (ICCV). 839–846. https://doi.org/10.1109/ICCV.1998.710815

[31]

Narek Tumanyan, Omer Bar-Tal, Shai Bagon, and Tali Dekel. 2022a. Splicing vit features for semantic appearance transfer. In Proc. Computer Vision and Pattern Recognition (CVPR). 10748–10757.

[32]

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022b. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. arXiv preprint arXiv:2211.12572 (2022).

[33]

Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Lu Yuan, Gang Hua, and Nenghai Yu. 2021. A simple baseline for stylegan inversion. arXiv preprint arXiv:2104.07661 9 (2021), 10–12.

[34]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proc. Int. Conf. on Computer Vision (ICCV).

Cited By

Li JJia ZXing XXin M(2024)DiffGCA: A Diffusion Recommendation Model with Global Attention Mechanism2024 7th International Conference on Data Science and Information Technology (DSIT)10.1109/DSIT61374.2024.10880988(1-5)Online publication date: 20-Dec-2024
https://doi.org/10.1109/DSIT61374.2024.10880988

Index Terms

Filter-Guided Diffusion for Controllable Image Generation
1. Computing methodologies

Recommendations

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Recently, the multimedia community has witnessed the rise of diffusion models trained on large-scale multi-modal data for visual content creation, particularly in the field of text-to-image generation. In this paper, we propose a new task for "stylizing''...
Diffusion Models for Document Image Generation
Document Analysis and Recognition - ICDAR 2023
Abstract
Image generation has got wide attention in recent times; however, despite advances in image generation techniques, document image generation having wide industry application has remained largely neglected. The previous research on structured ...
Unsupervised Controllable Generation of Diffusion Models with Latent Variables in VAEs
Database Systems for Advanced Applications
Abstract
This study introduces a method for controlling image generation in Diffusion Models using the disentangled latent variables of Beta-VAE and Factor-VAE, variations of the Variational Autoencoder. By integrating these disentangled latent variables ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

July 2024

1106 pages

ISBN:9798400705250

DOI:10.1145/3641519

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SIGGRAPH '24

Sponsor:

SIGGRAPH

SIGGRAPH '24: Special Interest Group on Computer Graphics and Interactive Techniques Conference

July 27 - August 1, 2024

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
516
Total Downloads

Downloads (Last 12 months)516
Downloads (Last 6 weeks)21

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li JJia ZXing XXin M(2024)DiffGCA: A Diffusion Recommendation Model with Global Attention Mechanism2024 7th International Conference on Data Science and Information Technology (DSIT)10.1109/DSIT61374.2024.10880988(1-5)Online publication date: 20-Dec-2024
https://doi.org/10.1109/DSIT61374.2024.10880988

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten