Haizhong Zheng1, Yang Zhou1, Brian R. Bartoldson2, Bhavya Kailkhura2,
Fan Lai3, Jiawei Zhao4, Beidi Chen1
1Carnegie Mellon University,
2Lawrence Livermore National Laboratory,
3University of Illinois Urbana-Champaign,
4Meta AI
TL;DR We propose GRESO, a lightweight pre-rollout filtering method that improves the efficiency of rollout scaling in LLM RL by predicting and skipping low-value prompts.
- [2025.06.03] Blog post released: Act Only When It Pays – GRESO.
- [2025.06.03] Paper preprint available on arXiv.
Figure 1: We train Qwen2.5-Math-1.5B/7B on the DAPO + MATH dataset and evaluate them on five math reasoning benchmarks: MATH500, AMC, Gaokao, Minerva, and Olympiad Bench. Compared to the baseline method (Dynamic Sampling), our approach (GRESO) reduces rollout overhead by up to 2x while achieving comparable training performance, improving the efficiency of rollout scaling.
Our implementation is based on volcengine/verl .
conda create -n greso python==3.11
conda activate greso
# Install verl
# exit the project folder to install verl
cd ..
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .
# Install the latest stable version of vLLM
pip3 install vllm==0.8.2
# Install flash-attn
pip3 install flash-attn --no-build-isolation
## misc
pip install wandb IPython matplotlib ipdb latex2sympy2-extended math-verify torchdata pylatexenc
You can download the dataset using the following command:
# cd the project folder
conda activate greso
export PYTHONPATH="$PYTHONPATH:$(pwd)"
bash train-scripts/generate_dataset.sh
Train Qwen Math 1.5b with GRESO on 4xH100:
bash train-scripts/math_qwen_1_5b_dm_greso.sh
Train Qwen Math 7b with GRESO on 8xH100:
bash train-scripts/math_qwen_7b_dm_greso.sh
See more scripts in train-scripts
folder.