8000 GitHub - Infini-AI-Lab/GRESO
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Infini-AI-Lab/GRESO

Repository files navigation

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng1, Yang Zhou1, Brian R. Bartoldson2, Bhavya Kailkhura2,
Fan Lai3, Jiawei Zhao4, Beidi Chen1
1Carnegie Mellon University, 2Lawrence Livermore National Laboratory,
3University of Illinois Urbana-Champaign, 4Meta AI

[Paper] | [Blog]

TL;DR We propose GRESO, a lightweight pre-rollout filtering method that improves the efficiency of rollout scaling in LLM RL by predicting and skipping low-value prompts.

🗞️ News

GRESO Overview

Figure 1: We train Qwen2.5-Math-1.5B/7B on the DAPO + MATH dataset and evaluate them on five math reasoning benchmarks: MATH500, AMC, Gaokao, Minerva, and Olympiad Bench. Compared to the baseline method (Dynamic Sampling), our approach (GRESO) reduces rollout overhead by up to 2x while achieving comparable training performance, improving the efficiency of rollout scaling.

Getting Started

Our implementation is based on volcengine/verl .

1. Environment Setup

conda create -n greso python==3.11
conda activate greso

# Install verl
# exit the project folder to install verl
cd ..
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .

# Install the latest stable version of vLLM
pip3 install vllm==0.8.2

# Install flash-attn
pip3 install flash-attn --no-build-isolation

## misc
pip install wandb IPython matplotlib ipdb latex2sympy2-extended math-verify torchdata pylatexenc

2. Download & Preprocess Data

You can download the dataset using the following command:

# cd the project folder
conda activate greso
export PYTHONPATH="$PYTHONPATH:$(pwd)"

bash train-scripts/generate_dataset.sh

3. Training

Train Qwen Math 1.5b with GRESO on 4xH100:

bash train-scripts/math_qwen_1_5b_dm_greso.sh

Train Qwen Math 7b with GRESO on 8xH100:

bash train-scripts/math_qwen_7b_dm_greso.sh

See more scripts in train-scripts folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0