[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3641519.3657489acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Filter-Guided Diffusion for Controllable Image Generation

Published: 13 July 2024 Publication History

Abstract

Recent advances in diffusion-based generative models have shown incredible promise for zero shot image-to-image translation and editing. Most of these approaches work by combining or replacing network-specific features used in the generation of new images with those taken from the inversion of some guide image. Methods of this type are considered the current state-of-the-art in training-free approaches, but have some notable limitations: they tend to be costly in runtime and memory, and often depend on deterministic sampling that limits variation in generated results. We propose Filter-Guided Diffusion (FGD), an alternative approach that leverages fast filtering operations during the diffusion process to support finer control over the strength and frequencies of guidance and can work with non-deterministic samplers to produce greater variety. With its efficiency, FGD can be sampled over multiple seeds and hyperparameters in less time than a single run of other SOTA methods to produce superior results based on structural and semantic metrics. We conduct extensive quantitative and qualitative experiments to evaluate the performance of FGD in translation tasks and also demonstrate its potential in localized editing when used with masks.

Supplemental Material

MP4 File - presentation
presentation
MP4 File - presentation
presentation
PDF File
An appendix, and a web gallery of more results.
ZIP File
An appendix, and a web gallery of more results.

References

[1]
Mahmoud Afifi, Marcus A Brubaker, and Michael S Brown. 2021. Histogan: Controlling colors of gan-generated and real images via color histograms. In Proc. Computer Vision and Pattern Recognition (CVPR). 7941–7950.
[2]
Soonmin Bae, Sylvain Paris, and Frédo Durand. 2006. Two-scale tone management for photographic look. ACM Trans. Graph. 25, 3 (jul 2006), 637–645. https://doi.org/10.1145/1141911.1141935
[3]
Tim Brooks, Aleksander Holynski, and Alexei A Efros. 2023. Instructpix2pix: Learning to follow image editing instructions. In Proc. Computer Vision and Pattern Recognition (CVPR). 18392–18402.
[4]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Adv. Neural Inform. Process. Syst. 33 (2020), 1877–1901.
[5]
Peter J. Burt and Edward H. Adelson. 1987. The Laplacian Pyramid as a Compact Image Code. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 671–679.
[6]
Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2photo: Internet image montage. ACM Trans. Graph. 28, 5 (2009), 1–10.
[7]
Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. 2021. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. arxiv:2108.02938 [cs.CV]
[8]
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proc. Computer Vision and Pattern Recognition (CVPR). 8188–8197.
[9]
Elmar Eisemann and Frédo Durand. 2004. Flash Photography Enhancement via Intrinsic Relighting. 23, 3 (aug 2004), 673–678. https://doi.org/10.1145/1015706.1015778
[10]
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022).
[11]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Adv. Neural Inform. Process. Syst. 33 (2020), 6840–6851.
[12]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. Proc. Computer Vision and Pattern Recognition (CVPR) (2017).
[13]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
[14]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proc. Computer Vision and Pattern Recognition (CVPR). 4401–4410.
[15]
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-based real image editing with diffusion models. In Proc. Computer Vision and Pattern Recognition (CVPR). 6007–6017.
[16]
Gwanghyun Kim, Taesung Kwon, and Jong Chul Ye. 2022. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proc. Computer Vision and Pattern Recognition (CVPR). 2426–2435.
[17]
Gihyun Kwon and Jong Chul Ye. 2022. Diffusion-based image translation using disentangled style and content representation. arXiv preprint arXiv:2209.15264 (2022).
[18]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proc. Int. Conf. on Computer Vision (ICCV).
[19]
Chenlin Meng, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).
[20]
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2023. Null-text inversion for editing real images using guided diffusion models. In Proc. Computer Vision and Pattern Recognition (CVPR). 6038–6047.
[21]
Aude Oliva, Antonio Torralba, and Philippe G. Schyns. 2006. Hybrid Images. ACM Trans. Graph. 25, 3 (jul 2006), 527–532. https://doi.org/10.1145/1141911.1141919
[22]
Sylvain Paris, Samuel W Hasinoff, and Jan Kautz. 2011. Local Laplacian filters: Edge-aware image processing with a Laplacian pyramid.ACM Trans. Graph. 30, 4 (2011), 68.
[23]
Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In Proc. European Conference on Computer Vision (ECCV). Springer, 319–345.
[24]
Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, and Krishna Kumar Singh. 2022. Spatially-adaptive multilayer selection for gan inversion and editing. In Proc. Computer Vision and Pattern Recognition (CVPR). 11399–11409.
[25]
Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. 2023. Zero-shot Image-to-Image Translation. arXiv preprint arXiv:2302.03027 (2023).
[26]
Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen, Hugues Hoppe, and Kentaro Toyama. 2004. Digital Photography with Flash and No-Flash Image Pairs. ACM Trans. Graph. 23, 3 (aug 2004), 664–672. https://doi.org/10.1145/1015706.1015777
[27]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In Int. Conf. Mach. Learn. PMLR, 8748–8763.
[28]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proc. Computer Vision and Pattern Recognition (CVPR). 10684–10695.
[29]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
[30]
C. Tomasi and R. Manduchi. 1998. Bilateral filtering for gray and color images. In Proc. Int. Conf. on Computer Vision (ICCV). 839–846. https://doi.org/10.1109/ICCV.1998.710815
[31]
Narek Tumanyan, Omer Bar-Tal, Shai Bagon, and Tali Dekel. 2022a. Splicing vit features for semantic appearance transfer. In Proc. Computer Vision and Pattern Recognition (CVPR). 10748–10757.
[32]
Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022b. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. arXiv preprint arXiv:2211.12572 (2022).
[33]
Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Lu Yuan, Gang Hua, and Nenghai Yu. 2021. A simple baseline for stylegan inversion. arXiv preprint arXiv:2104.07661 9 (2021), 10–12.
[34]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proc. Int. Conf. on Computer Vision (ICCV).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers
July 2024
1106 pages
ISBN:9798400705250
DOI:10.1145/3641519
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Diffusion Models
  2. Generative Artificial Intelligence
  3. Image Synthesis
  4. Style Transfer

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGGRAPH '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 451
    Total Downloads
  • Downloads (Last 12 months)451
  • Downloads (Last 6 weeks)67
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media