Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng¹, Yang Zhou¹, Brian R. Bartoldson², Bhavya Kailkhura²,
Fan Lai³, Jiawei Zhao⁴, Beidi Chen¹
¹Carnegie Mellon University, ²Lawrence Livermore National Laboratory,
³University of Illinois Urbana-Champaign, ⁴Meta AI

[Paper] | [Blog]

TL;DR We propose GRESO, a lightweight pre-rollout filtering method that improves the efficiency of rollout scaling in LLM RL by predicting and skipping low-value prompts.

🗞️ News

[2025.06.03] Blog post released: Act Only When It Pays – GRESO.
[2025.06.03] Paper preprint available on arXiv.

Figure 1: We train Qwen2.5-Math-1.5B/7B on the DAPO + MATH dataset and evaluate them on five math reasoning benchmarks: MATH500, AMC, Gaokao, Minerva, and Olympiad Bench. Compared to the baseline method (Dynamic Sampling), our approach (GRESO) reduces rollout overhead by up to 2x while achieving comparable training performance, improving the efficiency of rollout scaling.

Getting Started

Our implementation is based on volcengine/verl .

1. Environment Setup

conda create -n greso python==3.11
conda activate greso

# Install verl
# exit the project folder to install verl
cd ..
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .

# Install the latest stable version of vLLM
pip3 install vllm==0.8.2

# Install flash-attn
pip3 install flash-attn --no-build-isolation

## misc
pip install wandb IPython matplotlib ipdb latex2sympy2-extended math-verify torchdata pylatexenc

2. Download & Preprocess Data

You can download the dataset using the following command:

# cd the project folder
conda activate greso
export PYTHONPATH="$PYTHONPATH:$(pwd)"

bash train-scripts/generate_dataset.sh

3. Training

Train Qwen Math 1.5b with GRESO on 4xH100:

bash train-scripts/math_qwen_1_5b_dm_greso.sh

Train Qwen Math 7b with GRESO on 8xH100:

bash train-scripts/math_qwen_7b_dm_greso.sh

See more scripts in train-scripts folder.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data/dapo_math		data/dapo_math
docker		docker
docs		docs
examples		examples
patches		patches
raw-data		raw-data
recipe/prime		recipe/prime
scripts		scripts
static		static
tests		tests
train-scripts		train-scripts
verl		verl
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

🗞️ News

Getting Started

1. Environment Setup

2. Download & Preprocess Data

3. Training

About

Uh oh!

Releases

Packages

Languages

Infini-AI-Lab/GRESO

Folders and files

Latest commit

History

Repository files navigation

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

🗞️ News

Getting Started

1. Environment Setup

2. Download & Preprocess Data

3. Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages