Text-Aware Image Restoration with Diffusion Models

Jaewon Min^1*, Jin Hyeon Kim^2*, Paul Hyunbin Cho¹, Jaeeun Lee³, Jihye Park⁴, Minkyu Park⁴,
Sangpil Kim^2†, Hyunhee Park^4†, Seungryong Kim^1†

¹ KAIST AI · ² Korea University · ³ Yonsei University · ⁴ Samsung Electronics

^* Equal contribution. ^†Co-corresponding author.

📢 News

🌈 2025.06.24 - TAIR Demo code released!
❤️ 2025.06.23 - Training code released!
🤗 2025.06.19 — SA-Text and Real-Text datasets are released along with the dataset pipeline!
📄 2025.06.12 — Arxiv paper is released!
🚀 2025.06.01 — Official launch of the repository and project page!

💾 SA-Text Dataset

SA-Text is a newly proposed dataset for Text-Aware Image Restoration (TAIR) task. It is built from SA-1B dataset using our dataset pipeline and consists of 100K image-text instance pairs with detailed scene-level annotations. Real-Text is an evaluation dataset for real-world scenarios. It is constructed from RealSR and DrealSR using same pipeline as above.

Dataset Preparation

Split	Hugging Face 🤗	Google Drive 📁
SA-Text	Link	Link
Real-Text	Link	Link

Dataset Folder Structure (Google Drive)

Each image is paired with one or more text instances with polygon-level annotations.
The dataset follows a consistent annotation format, detailed in the dataset pipeline.
We recommend using the dataset from Google Drive for testing our code.

sa_text/
├── images/                        # 100K hiqh-quality scene images with text instances
└── restoration_dataset.json       # Annotations

real_text/
├── HQ/                            # High-quality images
├── LQ/                            # Low-quality degraded inputs
└── real_benchmark_dataset.json    # Annotations

⚒️ Training Preparation

Environment

conda create -n tair python=3.10 -y
conda activate tair

Installation

pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
cd detectron2 
pip install -e .
cd testr 
pip install -e .

Download Pretrained Weights and Dataset

Run the bash script download_weights.sh to download the pretrained weights for the image restoration module.
Additionally, download the pretrained text spotting module from this link and place it in the ./weights directory.
Download the SA-Text dataset using the Google Drive link provided above. Once downloaded, unzip the contents and place the folder in your working directory.

🔥 Training Recipe

Our text-aware restoration model, TeReDiff, comprises two main modules: an image restoration module and a text spotting module. Training is conducted in three stages:

Stage 1: Train only the image restoration module.
Stage 2: Train only the text spotting module.
Stage 3: Jointly train both modules.

Training Script

Run the following bash script for Stage1 training. Its configuration file can be found here. Refer to the comments within the configuration file for a detailed explanation of each setting.

bash run_script/train_script/run_train_stage1_terediff.sh

Run the following bash script for Stage2 training. Its configuration file can be found here

bash run_script/train_script/run_train_stage2_terediff.sh

Run the following bash script for Stage3 training. Its configuration file can be found here

bash run_script/train_script/run_train_stage3_terediff.sh

🚀 Text-Aware Image Restoration (TAIR) Demo

Demo Script

Download the released checkpoint of our model (TeReDiff) from here, and set the appropriate parameters in the demo configuration file here. Then, run the script below to perform a demo on low-quality images and generate high-quality, text-aware restored outputs. The results will be saved in val_demo_result/ by default.

bash run_script/val_script/run_val_terediff.sh

TAIR Demo Results

Running the demo script above will generate the following restoration results. The visualized images are shown in the order: Low-Quality (LQ) image / Restored image / High-Quality (HQ) Ground Truth image. Note that when the text in the LQ images is severely degraded, the model may fail to accurately restore the textual content due to insufficient visual information.

Citation

If you find our work useful for your research, please consider citing it :)

@article{min2025text,
  title={Text-Aware Image Restoration with Diffusion Models},
  author={Min, Jaewon and Kim, Jin Hyeon and Cho, Paul Hyunbin and Lee, Jaeeun and Park, Jihye and Park, Minkyu and Kim, Sangpil and Park, Hyunhee and Kim, Seungryong},
  journal={arXiv preprint arXiv:2506.09993},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
configs		configs
detectron2		detectron2
run_script		run_script
terediff		terediff
testr		testr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_weights.sh		download_weights.sh
initialize.py		initialize.py
requirements.txt		requirements.txt
train.py		train.py
val.py		val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text-Aware Image Restoration with Diffusion Models

📢 News

💾 SA-Text Dataset

Dataset Preparation

Dataset Folder Structure (Google Drive)

⚒️ Training Preparation

Environment

Installation

Download Pretrained Weights and Dataset

🔥 Training Recipe

Training Script

🚀 Text-Aware Image Restoration (TAIR) Demo

Demo Script

TAIR Demo Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

cvlab-kaist/TAIR

Folders and files

Latest commit

History

Repository files navigation

Text-Aware Image Restoration with Diffusion Models

📢 News

💾 SA-Text Dataset

Dataset Preparation

Dataset Folder Structure (Google Drive)

⚒️ Training Preparation

Environment

Installation

Download Pretrained Weights and Dataset

🔥 Training Recipe

Training Script

🚀 Text-Aware Image Restoration (TAIR) Demo

Demo Script

TAIR Demo Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages