[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Image Super-Resolution via Iterative Refinement

Published: 01 April 2023 Publication History

Abstract

We present SR3, an approach to image Super-Resolution via Repeated Refinement. SR3 adapts denoising diffusion probabilistic models (Ho et al. 2020), (Sohl-Dickstein et al. 2015) to image-to-image translation, and performs super-resolution through a stochastic iterative denoising process. Output images are initialized with pure Gaussian noise and iteratively refined using a U-Net architecture that is trained on denoising at various noise levels, conditioned on a low-resolution input image. SR3 exhibits strong performance on super-resolution tasks at different magnification factors, on faces and natural images. We conduct human evaluation on a standard 8× face super-resolution task on CelebA-HQ for which SR3 achieves a fool rate close to 50%, suggesting photo-realistic outputs, while GAN baselines do not exceed a fool rate of 34%. We evaluate SR3 on a 4× super-resolution task on ImageNet, where SR3 outperforms baselines in human evaluation and classification accuracy of a ResNet-50 classifier trained on high-resolution images. We further show the effectiveness of SR3 in cascaded image generation, where a generative model is chained with super-resolution models to synthesize high-resolution images with competitive FID scores on the class-conditional 256×256 ImageNet generation challenge.

References

[1]
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Proc. 34th Int. Conf. Neural Inf. Process. Syst., 2020, pp. 6840–6851.
[2]
J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 2256–2265.
[3]
I. Sutskever, O. Vinyals, and Q. Le, “Sequence to sequence learning with neural networks,” in Proc. 27th Int. Conf. Neural Inf. Process. Syst., 2014, pp. 3104–3112.
[4]
A. Vaswaniet al., “Attention is all you need,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst., 2017, pp. 6000–6010.
[5]
A. v. d. Oordet al., “WaveNet: A generative model for raw audio,” 2016,.
[6]
A. v. d. Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu, “Conditional image generation with PixelCNN decoders,” in Proc. 30th Int. Conf. Neural Inf. Process. Syst., 2016, pp. 4797–4805.
[7]
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in Proc. Int. Conf. Learn. Representations, 2013.
[8]
A. Vahdat and J. Kautz, “NVAE: A deep hierarchical variational autoencoder,” in Proc. 34th Int. Conf. Neural Inf. Process. Syst., 2020, pp. 19667–19679.
[9]
L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real NVP,” 2016,.
[10]
D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” in Proc. 32nd Int. Conf. Neural Inf. Process. Syst., 2018, pp. 10236–10245.
[11]
I. J. Goodfellowet al., “Generative adversarial networks,” in Proc. 27th Int. Conf. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
[12]
T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” in Proc. Int. Conf. Learn. Representations, 2018.
[13]
A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” 2015,.
[14]
Y. Chen, Y. Tai, X. Liu, C. Shen, and J. Yang, “FSRNet: End-to-end learning face super-resolution with facial priors,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2492–2501.
[15]
R. Dahl, M. Norouzi, and J. Shlens, “Pixel recursive super resolution,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5449–5458.
[16]
C. Lediget al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 105–114.
[17]
S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin, “PULSE: Self-supervised photo upsampling via latent space exploration of generative models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 2434–2442.
[18]
N. Parmaret al., “Image transformer,” in Proc. 35th Int. Conf. Mach. Learn., 2018, pp. 4055–4064.
[19]
M. Arjovsky, S. Chintala, and L. Bottou, “GAN Wasserstein,” 2017,.
[20]
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” 2017,.
[21]
L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein, “Unrolled generative adversarial networks,” 2016,.
[22]
S. Ravuri and O. Vinyals, “Classification accuracy score for conditional generative models,” 2019,.
[23]
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 2256–2265.
[24]
Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” in Proc. 33rd Int. Conf. Neural Inf. Process. Syst., 2019, pp. 11918–11930.
[25]
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention, 2015, pp. 234–241.
[26]
A. Dosovitskiy and T. Brox, “Generating images with perceptual similarity metrics based on deep networks,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 658–666.
[27]
D. Berthelot, P. Milanfar, and I. Goodfellow, “Creating high resolution images with a latent adversarial generator,” 2020,.
[28]
Z. Kadkhodaie and E. P. Simoncelli, “Solving linear inverse problems using the prior implicit in a denoiser,” 2021,.
[29]
R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 649–666.
[30]
X. Wanget al., “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proc. Eur. Conf. Comput. Vis. Workshops, 2018, pp. 63–79.
[31]
M. S. Sajjadi, B. Scholkopf, and M. Hirsch, “EnhanceNet: Single image super-resolution through automated texture synthesis,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4491–4500.
[32]
A. Lugmayr, M. Danelljan, L. Van Gool, and R. Timofte, “SRFlow: Learning the super-resolution space with normalizing flow,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 715–732.
[33]
N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan, “WaveGrad: Estimating Gradients for Waveform Generation,” in Proc. Int. Conf. Learn. Representations, 2021.
[34]
C. Sahariaet al., “Palette: Image-to-image diffusion models,” 2021,.
[35]
A. Hyvärinen and P. Dayan, “Estimation of non-normalized statistical models by score matching,” J. Mach. Learn. Res., vol. 6, no. 4, pp. 695–709, 2005.
[36]
P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Comput., vol. 23, no. 7, pp. 1661–1674, 2011.
[37]
M. Raphan and E. P. Simoncelli, “Least squares estimation without priors or supervision,” Neural Comput., vol. 23, pp. 374–420, 2011.
[38]
S. Saremi, A. Mehrjou, B. Schölkopf, and A. Hyvärinen, “Deep energy estimator networks,” 2018,.
[39]
Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” 2020,.
[40]
Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Proc. Int. Conf. Learn. Representations, 2021.
[41]
A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” 2018,.
[42]
E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville, “FiLM: Visual reasoning with a general conditioning layer,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 3942–3951.
[43]
A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in Proc. 33rd Int. Conf. Mach. Learn., 2016, pp. 1747–1756.
[44]
T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, “PixelCNN: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications,” in Proc. Int. Conf. Learn. Representations, 2017.
[45]
D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 1530–1538.
[46]
D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 1278–1286.
[47]
E. Denton, S. Chintala, A. Szlam, and R. Fergus, “Deep generative image models using a Laplacian pyramid of adversarial networks,” in Proc. 28th Int. Conf. Neural Inf. Process. Syst., 2015, pp. 1486–1494.
[48]
R. Caiet al., “Learning gradient fields for shape generation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 364–381.
[49]
C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 184–199.
[50]
J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1637–1645.
[51]
Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3147–3155.
[52]
N. Ahn, B. Kang, and K.-A. Sohn, “Image super-resolution via progressive cascading residual network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 904–908.
[53]
X. Li, C. Chen, S. Zhou, X. Lin, W. Zuo, and L. Zhang, “Blind face restoration via deep multi-scale component dictionaries,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 399–415.
[54]
Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 370–378.
[55]
C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, Feb. 2016.
[56]
J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1646–1654.
[57]
J. Menick and N. Kalchbrenner, “Generating high fidelity images with subscale pixel networks and multidimensional upscaling,” in Proc. Int. Conf. Learn. Representations, 2019.
[58]
A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu, “Conditional image generation with PixelCNN decoders,” in Proc. 30th Int. Conf. Neural Inf. Process. Syst., 2016, pp. 4797–4805.
[59]
L. Yanget al., “HiFaceGAN: Face renovation via collaborative suppression and replenishment,” 2020,.
[60]
J. J. Yu, K. G. Derpanis, and M. A. Brubaker, “Wavelet flow: Fast training of high resolution normalizing flows,” 2020,.
[61]
T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4396–4405.
[62]
O. Russakovskyet al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[63]
Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[64]
Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 286–301.
[65]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[66]
E. Pérez-Pellitero, J. Salvador, J. Hidalgo, and B. Rosenhahn, “PSyCo: Manifold span reduction for super resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1837–1845.
[67]
C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 391–407.
[68]
S. Anwar and N. Barnes, “Densely residual Laplacian super-resolution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3, pp. 1192–1204, Mar. 2022.
[69]
R. K. Mantiuk, A. Tomaszewska, and R. Mantiuk, “Comparison of four subjective methods for image quality assessment,” Comput. Graph. Forum, vol. 31, no. 8, pp. 2478–2491, 2012.
[70]
A. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” 2021,.
[71]
A. Razavi, A. v. d. Oord, and O. Vinyals, “Generating diverse high-fidelity images with VQ-VAE-2,” 2019,.
[72]
J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded diffusion models for high fidelity image generation,” J. Mach. Learn. Res., vol. 23, no. 47, pp. 1–33, 2022.
[73]
D. Watson, J. Ho, M. Norouzi, and W. Chan, “Learning to efficiently sample from diffusion probabilistic models,” 2021,.
[74]
A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas, “Gotta go fast when generating data with score-based models,” 2021,.
[75]
T. Salimans and J. Ho, “Progressive distillation for fast sampling of diffusion models,” in Proc. Int. Conf. Learn. Representations, 2022.
[76]
J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” 2020,.

Cited By

View all
  • (2024)Structure-based drug design by denoising voxel gridsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693726(40795-40812)Online publication date: 21-Jul-2024
  • (2024)DiffFPRProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693222(28673-28687)Online publication date: 21-Jul-2024
  • (2024)DIRformer: A Novel Image Restoration Approach Based on U-shaped Transformer and Diffusion ModelsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370363221:2(1-23)Online publication date: 8-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 45, Issue 4
April 2023
1338 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Structure-based drug design by denoising voxel gridsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693726(40795-40812)Online publication date: 21-Jul-2024
  • (2024)DiffFPRProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693222(28673-28687)Online publication date: 21-Jul-2024
  • (2024)DIRformer: A Novel Image Restoration Approach Based on U-shaped Transformer and Diffusion ModelsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370363221:2(1-23)Online publication date: 8-Nov-2024
  • (2024)A Survey on Video Diffusion ModelsACM Computing Surveys10.1145/369641557:2(1-42)Online publication date: 18-Sep-2024
  • (2024)GOAL: Grounded text-to-image Synthesis with Joint Layout Alignment TuningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681284(7055-7064)Online publication date: 28-Oct-2024
  • (2024)CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681283(3647-3656)Online publication date: 28-Oct-2024
  • (2024)RefScale: Multi-temporal Assisted Image Rescaling in Repetitive Observation ScenariosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681254(9866-9874)Online publication date: 28-Oct-2024
  • (2024)PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-ResolutionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680974(9769-9778)Online publication date: 28-Oct-2024
  • (2024)Blind Face Video Restoration with Temporal Consistent Generative Prior and Degradation-Aware PromptProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680917(1417-1426)Online publication date: 28-Oct-2024
  • (2024)Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680560(360-369)Online publication date: 28-Oct-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media