8000 GitHub - cvlab-kaist/TAIR: Official implementation of "Text-Aware Image Restoration with Diffusion Models"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cvlab-kaist/TAIR

Repository files navigation

Text-Aware Image Restoration with Diffusion Models

Jaewon Min1*, Jin Hyeon Kim2*, Paul Hyunbin Cho1, Jaeeun Lee3, Jihye Park4, Minkyu Park4,
Sangpil Kim2†, Hyunhee Park4†, Seungryong Kim1†

1 KAIST AI · 2 Korea University · 3 Yonsei University · 4 Samsung Electronics

* Equal contribution. Co-corresponding author.

📢 News

  • 🌈 2025.06.24 - TAIR Demo code released!
  • ❤️ 2025.06.23 - Training code released!
  • 🤗 2025.06.19SA-Text and Real-Text datasets are released along with the dataset pipeline!
  • 📄 2025.06.12 — Arxiv paper is released!
  • 🚀 2025.06.01 — Official launch of the repository and project page!

💾 SA-Text Dataset

SA-Text is a newly proposed dataset for Text-Aware Image Restoration (TAIR) task. It is built from SA-1B dataset using our dataset pipeline and consists of 100K image-text instance pairs with detailed scene-level annotations. Real-Text is an evaluation dataset for real-world scenarios. It is constructed from RealSR and DrealSR using same pipeline as above.

Dataset Preparation

Split Hugging Face 🤗 Google Drive 📁
SA-Text
Real-Text

Dataset Folder Structure (Google Drive)

  • Each image is paired with one or more text instances with polygon-level annotations.
  • The dataset follows a consistent annotation format, detailed in the dataset pipeline.
  • We recommend using the dataset from Google Drive for testing our code.
sa_text/
├── images/                        # 100K hiqh-quality scene images with text instances
└── restoration_dataset.json       # Annotations

real_text/
├── HQ/                            # High-quality images
├── LQ/                            # Low-quality degraded inputs
└── real_benchmark_dataset.json    # Annotations

⚒️ Training Preparation

Environment

conda create -n tair python=3.10 -y
conda activate tair

Installation

pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
cd detectron2 
pip install -e .
cd testr 
pip install -e .

Download Pretrained Weights and Dataset

  1. Run the bash script download_weights.sh to download the pretrained weights for the image restoration module.
    Additionally, download the pretrained text spotting module from this link and place it in the ./weights directory.

  2. Download the SA-Text dataset using the Google Drive link provided above. Once downloaded, unzip the contents and place the folder in your working directory.


🔥 Training Recipe

Our text-aware restoration model, TeReDiff, comprises two main modules: an image restoration module and a text spotting module. Training is conducted in three stages:

  • Stage 1: Train only the image restoration module.
  • Stage 2: Train only the text spotting module.
  • Stage 3: Jointly train both modules.

Training Script

  • Run the following bash script for Stage1 training. Its configuration file can be found here. Refer to the comments within the configuration file for a detailed explanation of each setting.
bash run_script/train_script/run_train_stage1_terediff.sh
  • Run the following bash script for Stage2 training. Its configuration file can be found here
bash run_script/train_script/run_train_stage2_terediff.sh
  • Run the following bash script for Stage3 training. Its configuration file can be found here
bash run_script/train_script/run_train_stage3_terediff.sh

🚀 Text-Aware Image Restoration (TAIR) Demo

Demo Script

Download the released checkpoint of our model (TeReDiff) from here, and set the appropriate parameters in the demo configuration file here. Then, run the script below to perform a demo on low-quality images and generate high-quality, text-aware restored outputs. The results will be saved in val_demo_result/ by default.

bash run_script/val_script/run_val_terediff.sh

TAIR Demo Results

Running the demo script above will generate the following restoration results. The visualized images are shown in the order: Low-Quality (LQ) image / Restored image / High-Quality (HQ) Ground Truth image. Note that when the text in the LQ images is severely degraded, the model may fail to accurately restore the textual content due to insufficient visual information.

Citation

If you find our work useful for your research, please consider citing it :)

@article{min2025text,
  title={Text-Aware Image Restoration with Diffusion Models},
  author={Min, Jaewon and Kim, Jin Hyeon and Cho, Paul Hyunbin and Lee, Jaeeun and Park, Jihye and Park, Minkyu and Kim, Sangpil and Park, Hyunhee and Kim, Seungryong},
  journal={arXiv preprint arXiv:2506.09993},
  year={2025}
}

About

Official implementation of "Text-Aware Image Restoration with Diffusion Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0