8000 GitHub - 3587jjh/LSRNA: Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models (CVPR 2025)
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ LSRNA Public

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models (CVPR 2025)

Notifications You must be signed in to change notification settings

3587jjh/LSRNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSRNA

Project Page arXiv

Official code for "Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models".

Teaser

Abstract: In this paper, we propose LSRNA, a novel framework for higher-resolution (exceeding 1K) image generation using diffusion models by leveraging super-resolution directly in the latent space. Existing diffusion models struggle with scaling beyond their training resolutions, often leading to structural distortions or content repetition. Reference-based methods address the issues by upsampling a low-resolution reference to guide higher-resolution generation. However, they face significant challenges: upsampling in latent space often causes manifold deviation, which degrades output quality. On the other hand, upsampling in RGB space tends to produce overly smoothed outputs. To overcome these limitations, LSRNA combines Latent space Super-Resolution (LSR) for manifold alignment and Region-wise Noise Addition (RNA) to enhance high-frequency details. Our extensive experiments demonstrate that integrating LSRNA outperforms state-of-the-art reference-based methods across various resolutions and metrics, while showing the critical role of latent space upsampling in preserving detail and sharpness.

Environment

conda create -n lsrna python=3.10
conda activate lsrna
pip install -r requirements.txt

Text-to-Image Generation

Note:
Although our LSRNA framework is designed to be compatible with any reference-based method,
this repo provides example code for LSRNA-DemoFusion, as DemoFusion is a pioneering reference-based approach.

CUDA_VISIBLE_DEVICES=0 python main.py \
    --prompt "A well-worn baseball glove and ball sitting on fresh-cut grass." \
    --negative_prompt "blurry, ugly, duplicate, poorly drawn, deformed, mosaic" \
    --height 2048 \
    --width 2048 \
    --seed 0 \
    --lsr_path "lsr/swinir-liif-latent-sdxl.pth" \
    --rna_min_std 0.0 \
    --rna_max_std 1.2 \
    --inversion_depth 30 \
    --save_dir "results" \
    #--low_vram

Feel free to adjust the RNA hyperparameters (e.g., --rna_max_std) to adjust the level of detail in the generated images. If you’re running out of VRAM, enable the low-VRAM mode with --low_vram. We also provide a run.sh script for the generation.

Visual Comparison

Comparison

Additional results can be found on the project page.

Citation

@inproceedings{jeong2025latent,
  title={Latent space super-resolution for higher-resolution image generation with diffusion models},
  author={Jeong, Jinho and Han, Sangmin and Kim, Jinwoo and Kim, Seon Joo},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={2355--2365},
  year={2025}
}

Acknowledgement

This repo is based on DemoFusion and LIIF.

About

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models (CVPR 2025)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0