More Web Proxy on the site http://driver.im/

research-article

Image Super-Resolution via Iterative Refinement

Authors:

Chitwan Saharia,

David J. Fleet,

Mohammad NorouziAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 45, Issue 4

Pages 4713 - 4726

https://doi.org/10.1109/TPAMI.2022.3204461

Published: 01 April 2023 Publication History

Abstract

We present SR3, an approach to image Super-Resolution via Repeated Refinement. SR3 adapts denoising diffusion probabilistic models (Ho et al. 2020), (Sohl-Dickstein et al. 2015) to image-to-image translation, and performs super-resolution through a stochastic iterative denoising process. Output images are initialized with pure Gaussian noise and iteratively refined using a U-Net architecture that is trained on denoising at various noise levels, conditioned on a low-resolution input image. SR3 exhibits strong performance on super-resolution tasks at different magnification factors, on faces and natural images. We conduct human evaluation on a standard 8× face super-resolution task on CelebA-HQ for which SR3 achieves a fool rate close to 50%, suggesting photo-realistic outputs, while GAN baselines do not exceed a fool rate of 34%. We evaluate SR3 on a 4× super-resolution task on ImageNet, where SR3 outperforms baselines in human evaluation and classification accuracy of a ResNet-50 classifier trained on high-resolution images. We further show the effectiveness of SR3 in cascaded image generation, where a generative model is chained with super-resolution models to synthesize high-resolution images with competitive FID scores on the class-conditional 256×256 ImageNet generation challenge.

References

[1]

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Proc. 34th Int. Conf. Neural Inf. Process. Syst., 2020, pp. 6840–6851.

[2]

J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 2256–2265.

[3]

I. Sutskever, O. Vinyals, and Q. Le, “Sequence to sequence learning with neural networks,” in Proc. 27th Int. Conf. Neural Inf. Process. Syst., 2014, pp. 3104–3112.

[4]

A. Vaswaniet al., “Attention is all you need,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst., 2017, pp. 6000–6010.

[5]

A. v. d. Oordet al., “WaveNet: A generative model for raw audio,” 2016,.

[6]

A. v. d. Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu, “Conditional image generation with PixelCNN decoders,” in Proc. 30th Int. Conf. Neural Inf. Process. Syst., 2016, pp. 4797–4805.

[7]

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in Proc. Int. Conf. Learn. Representations, 2013.

[8]

A. Vahdat and J. Kautz, “NVAE: A deep hierarchical variational autoencoder,” in Proc. 34th Int. Conf. Neural Inf. Process. Syst., 2020, pp. 19667–19679.

[9]

L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real NVP,” 2016,.

[10]

D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” in Proc. 32nd Int. Conf. Neural Inf. Process. Syst., 2018, pp. 10236–10245.

[11]

I. J. Goodfellowet al., “Generative adversarial networks,” in Proc. 27th Int. Conf. Neural Inf. Process. Syst., 2014, pp. 2672–2680.

[12]

T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” in Proc. Int. Conf. Learn. Representations, 2018.

[13]

A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” 2015,.

[14]

Y. Chen, Y. Tai, X. Liu, C. Shen, and J. Yang, “FSRNet: End-to-end learning face super-resolution with facial priors,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2492–2501.

[15]

R. Dahl, M. Norouzi, and J. Shlens, “Pixel recursive super resolution,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5449–5458.

[16]

C. Lediget al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 105–114.

[17]

S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin, “PULSE: Self-supervised photo upsampling via latent space exploration of generative models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 2434–2442.

[18]

N. Parmaret al., “Image transformer,” in Proc. 35th Int. Conf. Mach. Learn., 2018, pp. 4055–4064.

[19]

M. Arjovsky, S. Chintala, and L. Bottou, “GAN Wasserstein,” 2017,.

[20]

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” 2017,.

[21]

L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein, “Unrolled generative adversarial networks,” 2016,.

[22]

S. Ravuri and O. Vinyals, “Classification accuracy score for conditional generative models,” 2019,.

[23]

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 2256–2265.

[24]

Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” in Proc. 33rd Int. Conf. Neural Inf. Process. Syst., 2019, pp. 11918–11930.

[25]

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention, 2015, pp. 234–241.

[26]

A. Dosovitskiy and T. Brox, “Generating images with perceptual similarity metrics based on deep networks,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 658–666.

[27]

D. Berthelot, P. Milanfar, and I. Goodfellow, “Creating high resolution images with a latent adversarial generator,” 2020,.

[28]

Z. Kadkhodaie and E. P. Simoncelli, “Solving linear inverse problems using the prior implicit in a denoiser,” 2021,.

[29]

R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 649–666.

[30]

X. Wanget al., “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proc. Eur. Conf. Comput. Vis. Workshops, 2018, pp. 63–79.

[31]

M. S. Sajjadi, B. Scholkopf, and M. Hirsch, “EnhanceNet: Single image super-resolution through automated texture synthesis,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4491–4500.

[32]

A. Lugmayr, M. Danelljan, L. Van Gool, and R. Timofte, “SRFlow: Learning the super-resolution space with normalizing flow,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 715–732.

[33]

N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan, “WaveGrad: Estimating Gradients for Waveform Generation,” in Proc. Int. Conf. Learn. Representations, 2021.

[34]

C. Sahariaet al., “Palette: Image-to-image diffusion models,” 2021,.

[35]

A. Hyvärinen and P. Dayan, “Estimation of non-normalized statistical models by score matching,” J. Mach. Learn. Res., vol. 6, no. 4, pp. 695–709, 2005.

Digital Library

[36]

P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Comput., vol. 23, no. 7, pp. 1661–1674, 2011.

Digital Library

[37]

M. Raphan and E. P. Simoncelli, “Least squares estimation without priors or supervision,” Neural Comput., vol. 23, pp. 374–420, 2011.

Digital Library

[38]

S. Saremi, A. Mehrjou, B. Schölkopf, and A. Hyvärinen, “Deep energy estimator networks,” 2018,.

[39]

Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” 2020,.

[40]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Proc. Int. Conf. Learn. Representations, 2021.

[41]

A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” 2018,.

[42]

E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville, “FiLM: Visual reasoning with a general conditioning layer,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 3942–3951.

[43]

A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in Proc. 33rd Int. Conf. Mach. Learn., 2016, pp. 1747–1756.

[44]

T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, “PixelCNN: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications,” in Proc. Int. Conf. Learn. Representations, 2017.

[45]

D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 1530–1538.

[46]

D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 1278–1286.

[47]

E. Denton, S. Chintala, A. Szlam, and R. Fergus, “Deep generative image models using a Laplacian pyramid of adversarial networks,” in Proc. 28th Int. Conf. Neural Inf. Process. Syst., 2015, pp. 1486–1494.

[48]

R. Caiet al., “Learning gradient fields for shape generation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 364–381.

[49]

C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 184–199.

[50]

J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1637–1645.

[51]

Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3147–3155.

[52]

N. Ahn, B. Kang, and K.-A. Sohn, “Image super-resolution via progressive cascading residual network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 904–908.

[53]

X. Li, C. Chen, S. Zhou, X. Lin, W. Zuo, and L. Zhang, “Blind face restoration via deep multi-scale component dictionaries,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 399–415.

[54]

Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 370–378.

[55]

C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, Feb. 2016.

Digital Library

[56]

J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1646–1654.

[57]

J. Menick and N. Kalchbrenner, “Generating high fidelity images with subscale pixel networks and multidimensional upscaling,” in Proc. Int. Conf. Learn. Representations, 2019.

[58]

A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu, “Conditional image generation with PixelCNN decoders,” in Proc. 30th Int. Conf. Neural Inf. Process. Syst., 2016, pp. 4797–4805.

[59]

L. Yanget al., “HiFaceGAN: Face renovation via collaborative suppression and replenishment,” 2020,.

[60]

J. J. Yu, K. G. Derpanis, and M. A. Brubaker, “Wavelet flow: Fast training of high resolution normalizing flows,” 2020,.

[61]

T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4396–4405.

[62]

O. Russakovskyet al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.

Digital Library

[63]

Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

Digital Library

[64]

Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 286–301.

[65]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[66]

E. Pérez-Pellitero, J. Salvador, J. Hidalgo, and B. Rosenhahn, “PSyCo: Manifold span reduction for super resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1837–1845.

[67]

C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 391–407.

[68]

S. Anwar and N. Barnes, “Densely residual Laplacian super-resolution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3, pp. 1192–1204, Mar. 2022.

[69]

R. K. Mantiuk, A. Tomaszewska, and R. Mantiuk, “Comparison of four subjective methods for image quality assessment,” Comput. Graph. Forum, vol. 31, no. 8, pp. 2478–2491, 2012.

[70]

A. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” 2021,.

[71]

A. Razavi, A. v. d. Oord, and O. Vinyals, “Generating diverse high-fidelity images with VQ-VAE-2,” 2019,.

[72]

J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded diffusion models for high fidelity image generation,” J. Mach. Learn. Res., vol. 23, no. 47, pp. 1–33, 2022.

[73]

D. Watson, J. Ho, M. Norouzi, and W. Chan, “Learning to efficiently sample from diffusion probabilistic models,” 2021,.

[74]

A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas, “Gotta go fast when generating data with score-based models,” 2021,.

[75]

T. Salimans and J. Ho, “Progressive distillation for fast sampling of diffusion models,” in Proc. Int. Conf. Learn. Representations, 2022.

[76]

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” 2020,.

Cited By

Pinheiro PJamasb AMahmood OSresht VSaremi SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Structure-based drug design by denoising voxel gridsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693726(40795-40812)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693726
Li JWang CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)DiffFPRProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693222(28673-28687)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693222
Hu CWei XWu X(2024)DIRformer: A Novel Image Restoration Approach Based on U-shaped Transformer and Diffusion ModelsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370363221:2(1-23)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3703632
Show More Cited By

Recommendations

Image super-resolution: use of self-learning and gabor prior
ACCV'12: Proceedings of the 11th Asian conference on Computer Vision - Volume Part III

Recent approaches on single image super-resolution (SR) have attempted to exploit self-similarity to avoid the use of multiple images. In this paper, we propose an SR method based on self-learning and Gabor prior. Given a low resolution (LR) test image ...
Single Image Super-Resolution via Iterative Collaborative Representation
Proceedings, Part II, of the 16th Pacific-Rim Conference on Advances in Multimedia Information Processing -- PCM 2015 - Volume 9315

We propose a new model called iterative collaborative representation ICR for image super-resolution SR. Most of popular SR approaches extract low-resolution LR features from the given LR image directly to recover its corresponding high-resolution HR ...
Microscopic image super resolution using deep convolutional neural networks
Abstract
Recently, deep convolutional neural networks (CNNs) have achieved excellent results in single image super resolution (SISR). Owing to the strength of deep CNNs, it gives promising results compared to state-of-the-art learning based models on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 45, Issue 4

April 2023

1338 pages

ISSN:0162-8828

Issue’s Table of Contents

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

107
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pinheiro PJamasb AMahmood OSresht VSaremi SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Structure-based drug design by denoising voxel gridsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693726(40795-40812)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693726
Li JWang CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)DiffFPRProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693222(28673-28687)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693222
Hu CWei XWu X(2024)DIRformer: A Novel Image Restoration Approach Based on U-shaped Transformer and Diffusion ModelsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370363221:2(1-23)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3703632
Xing ZFeng QChen HDai QHu HXu HWu ZJiang Y(2024)A Survey on Video Diffusion ModelsACM Computing Surveys10.1145/369641557:2(1-42)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3696415
Li YFang HFeng ZMa KBan CZang XZhou LHe ZChen JHu JSun HZhang HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)GOAL: Grounded text-to-image Synthesis with Joint Layout Alignment TuningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681284(7055-7064)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681284
Yu ZZhou JBao ZFu GHe WLiang CXiao CCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681283(3647-3656)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681283
Zhang ZXiao JLiao LWang MCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)RefScale: Multi-temporal Assisted Image Rescaling in Repetitive Observation ScenariosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681254(9866-9874)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681254
Zhao ZXue HFang PZhu SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-ResolutionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680974(9769-9778)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680974
Tan JPark HZhang YWang TZhang KKong XDai PLiu ZLuo WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Blind Face Video Restoration with Temporal Consistent Generative Prior and Degradation-Aware PromptProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680917(1417-1426)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680917
Wen YGao TChen TCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680560(360-369)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680560
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents