8000 GitHub - Stars1233/flow_grpo: An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

License

Notifications You must be signed in to change notification settings

Stars1233/flow_grpo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flow-GRPO

This is an official implementation of Flow-GRPO: Training Flow Matching Models via Online RL.

🔔 News

[Update] We release a new GenEval model that maintains image quality close to the base model, while still achieving the original GenEval score of 95. Feel free to give it a try!

✅ TODO

  • Provide a web demo showcasing a wide range of generation examples for GenEval, OCR, and PickScore. @GongyeLiu is working on this urgently.
  • Provide a web visualization of image evolution during training for all three tasks. @GongyeLiu is working on this urgently.

Model

Task Model
GenEval 🤗GenEval
Text Rendering 🤗Text
Human Preference Alignment 🤗PickScore

Installation

git clone https://github.com/yifan123/flow_grpo.git
cd flow_grpo
conda create -n flow_grpo python=3.10.16
pip install -e .

Reward

The steps above only install the current repository. However, RL training requires different rewards, and each reward model might depend on some older pre-trained models. It's difficult to place all of these into a single Conda environment without version conflicts. Therefore, drawing inspiration from the ddpo-pytorch implementation, we use a remote server setup for some rewards.

OCR

Please install paddle-ocr:

pip install paddlepaddle-gpu==2.6.2
pip install paddleocr==2.9.1
pip install python-Levenshtein

Then, pre-download the model using the Python command line:

from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=False, lang="en", use_gpu=False, show_log=False)

GenEval

Please create a new Conda virtual environment and install the corresponding dependencies according to the instructions in reward-server.

Usage

Single-node training:

bash scripts/single_node/main.sh

Multi-node training:

# Master node
bash scripts/multi_node/main.sh
# Other nodes
bash scripts/multi_node/main1.sh
bash scripts/multi_node/main2.sh

Multi Reward Training

For multi-reward settings, you can pass in a dictionary where each key is a reward name and the corresponding value is its weight. For example:

{
    "pickscore": 0.5,
    "ocr": 0.2,
    "aesthetic": 0.3
}

This means the final reward is a weighted sum of the individual rewards.

The following reward models are currently supported:

  • Geneval evaluates T2I models on complex compositional prompts.
  • OCR provides an OCR-based reward.
  • PickScore is a general-purpose T2I reward model trained on human preferences.
  • DeQA is a multimodal LLM-based image quality assessment model that measures the impact of distortions and texture damage on perceived quality.
  • ImageReward is a general-purpose T2I reward model capturing text-image alignment, visual fidelity, and safety.
  • QwenVL is an experimental reward model using prompt engineering.
  • Aesthetic is a CLIP-based linear regressor predicting image aesthetic scores.
  • JPEG_Compressibility measures image size as a proxy for quality.
  • UnifiedReward is a state-of-the-art reward model for multimodal understanding and generation, topping the human preference leaderboard.< 6622 /li>

Important Hyperparameters

You can adjust the parameters in config/dgx.py to tune different hyperparameters. An empirical finding is that config.sample.train_batch_size * num_gpu / config.sample.num_image_per_prompt * config.sample.num_batches_per_epoch = 48, i.e., group_number=48, group_size=24. Additionally, setting config.train.gradient_accumulation_steps = config.sample.num_batches_per_epoch // 2 also yields good performance.

Acknowledgement

This repo is based on ddpo-pytorch and diffusers. We thank the authors for their valuable contributions to the AIGC community. Special thanks to Kevin Black for the excellent ddpo-pytorch repo.

Citation

@misc{liu2025flowgrpo,
      title={Flow-GRPO: Training Flow Matching Models via Online RL}, 
      author={Jie Liu and Gongye Liu and Jiajun Liang and Yangguang Li and Jiaheng Liu and Xintao Wang and Pengfei Wan and Di Zhang and Wanli Ouyang},
      year={2025},
      eprint={2505.05470},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.05470}, 
}

About

An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.7%
  • Shell 2.3%
0