8000 Dev by haizhongzheng · Pull Request #1 · Infini-AI-Lab/GRESO · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Dev #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
129 changes: 129 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
data-data
data-model
data-task
core

**/*.pt
**/checkpoints
**/wget-log
**/_build/
**/*.ckpt
**/outputs
**/*.tar.gz
**/playground
**/wandb

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
dataset/*
tensorflow/my_graph/*
.idea/
# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# IPython Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# dotenv
.env

# virtualenv
venv/
ENV/

# Spyder project settings
.spyderproject

# Rope project settings
.ropeproject

# vscode
.vscode

# Mac
.DS_Store

# output logs
tests/e2e/toy_examples/deepspeed/synchronous/output.txt

# vim
*.swp

# ckpt
*.lock

# data
*.parquet


# local logs
logs
log
97 changes: 96 additions & 1 deletion README.md
8000
Original file line number Diff line number Diff line change
@@ -1 +1,96 @@
# GRESO
<div align="center">
<h1> Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
</h1>

Haizhong Zheng<sup>1</sup>, Yang Zhou<sup>1</sup>, Brian R. Bartoldson<sup>2</sup>, Bhavya Kailkhura<sup>2</sup>,
<br>
Fan Lai<sup>3</sup>, Jiawei Zhao<sup>4</sup>, Beidi Chen<sup>1</sup>
<br>
<sup>1</sup>Carnegie Mellon University,
<sup>2</sup>Lawrence Livermore National Laboratory,
<br>
<sup>3</sup>University of Illinois Urbana-Champaign,
<sup>4</sup>Meta AI

<div align="center">
[<a href="">Paper</a>] | [<a href="https://infini-ai-lab.github.io/GRESO/">Blog</a>]
</div>
<br>

<!-- ---------- -->
**TL;DR**
We propose GRESO, a lightweight pre-rollout filtering method that improves the efficiency of *rollout scaling* in LLM RL by predicting and skipping low-value prompts.
</div>



## 🗞️ News

- **[2025.06.03]** Blog post released: [Act Only When It Pays – GRESO](https://infini-ai-lab.github.io/GRESO/)
- **[2025.06.03]** Paper preprint available on [arXiv]()

<!-- ---------- -->
<p align="center">
<img src="static/scaling.jpg" alt="GRESO Overview" style="width:90%;"/>
</p>

<p align="center"><i>
Figure 1: We train Qwen2.5-Math-1.5B/7B on the DAPO + MATH dataset and evaluate them on five math reasoning benchmarks: MATH500, AMC, Gaokao, Minerva, and Olympiad Bench. Compared to the baseline method (Dynamic Sampling), our approach (GRESO) reduces rollout overhead by up to 2x while achieving comparable training performance, improving the efficiency of rollout scaling.
</i></p>

<!-- ------- -->

## Getting Started
Our implementation is based on [volcengine/verl](https://github.com/volcengine/verl)
.

### 1. Environment Setup



```bash
conda create -n greso python==3.11
conda activate greso

# Install verl
# exit the project folder to install verl
cd ..
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .

# Install the latest stable version of vLLM
pip3 install vllm==0.8.2

# Install flash-attn
pip3 install flash-attn --no-build-isolation

## misc
pip install wandb IPython matplotlib ipdb latex2sympy2-extended math-verify torchdata pylatexenc
```

### 2. Download & Preprocess Data

You can download the dataset using the following command:

```bash
# cd the project folder
conda activate greso
export PYTHONPATH="$PYTHONPATH:$(pwd)"

bash train-scripts/generate_dataset.sh
```

### 3. Training

Train Qwen Math 1.5b with GRESO on 4xH100:
```bash
bash train-scripts/math_qwen_1_5b_dm_greso.sh
```

Train Qwen Math 7b with GRESO on 8xH100:
```bash
bash train-scripts/math_qwen_7b_dm_greso.sh
```

See more scripts in `train-scripts` folder.
9 changes: 9 additions & 0 deletions docker/Dockerfile.megatron
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

RUN pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

RUN cd /opt/nvidia && git clone --single-branch --branch core_r0.11.0 https://github.com/NVIDIA/Megatron-LM.git Megatron-LM

# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
# unset for now
RUN cd /opt/nvidia/Megatron-LM && pip3 install --no-deps -e .
47 changes: 47 additions & 0 deletions docker/Dockerfile.ngc.vllm
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# docker buildx build --platform linux/x86_64 -t "verlai/verl:ngc-th2.4.0-cu124-vllm0.6.3-ray2.4-te1.7-v0.0.6" -f docker/Dockerfile.ngc.vllm . --builder cloud-verlai-verl-builder --progress=plain --push
FROM nvcr.io/nvidia/pytorch:24.05-py3

# uninstall nv-pytorch fork
RUN pip3 uninstall pytorch-quantization \
pytorch-triton \
torch \
torch-tensorrt \
torchvision \
xgboost transformer_engine flash_attn \
apex megatron-core -y

RUN pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124

# =============== Megatron dependencies (optional) =================
# install apex, set MAX_JOBS to avoid OOMs
RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
--config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
git+https://github.com/NVIDIA/apex
# =============== End of Megatron dependencies (optional) =================

RUN pip3 install --no-cache-dir \
accelerate \
codetiming \
datasets \
dill \
hydra-core \
numpy \
'pandas' \
'peft' \
'pyarrow>=15.0.0' \
'pybind11' \
'pylatexenc' \
'ray>=2.10' \
'tensordict<0.6' \
'transformers' \
'vllm==0.6.3.post1' \
'wandb'

# full dependencies
RUN pip3 install pytest yapf py-spy pyext liger-kernel

# =============== Megatron dependencies (optional) =================
# install Transformer Engine, which requires FA 2.5.8. Do it in a separate step for docker cache
RUN MAX_JOBS=4 NINJA_FLAGS="-j4" pip3 install flash-attn==2.5.8 --no-cache-dir --no-build-isolation
RUN MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 pip3 install git+https://github.com/eric-haibin-lin/TransformerEngine.git@v1.7.0
# =============== End of Megatron dependencies (optional) =================
45 changes: 45 additions & 0 deletions docker/Dockerfile.rocm
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Build the docker in the repo dir:
# docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .
# docker images # you can find your built docker


FROM rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4

# Set working directory
# WORKDIR $PWD/app

# Set environment variables
ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"

# Install vllm
RUN pip uninstall -y vllm && \
rm -rf vllm && \
git clone -b v0.6.3 https://github.com/vllm-project/vllm.git && \
cd vllm && \
MAX_JOBS=$(nproc) python3 setup.py install && \
cd .. && \
rm -rf vllm

# Copy the entire project directory
COPY . .

# Install dependencies
RUN pip install "tensordict<0.6" --no-deps && \
pip install accelerate \
codetiming \
datasets \
dill \
hydra-core \
liger-kernel \
numpy \
pandas \
peft \
"pyarrow>=15.0.0" \
pylatexenc \
"ray[data,train,tune,serve]" \
torchdata \
transformers \
wandb \
orjson \
pybind11 && \
pip install -e . --no-deps
41 changes: 41 additions & 0 deletions docker/Dockerfile.vemlp.vllm.te
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# docker buildx build --platform linux/x86_64 -t "verlai/verl:$TAG" -f docker/$FILE .

# the one in docker.io is an alias for the one veturbo
# FROM vemlp-cn-beijing.cr.volces.com/veturbo/pytorch:2.4-cu124
FROM docker.io/haibinlin/verl:v0.0.5-th2.4.0-cu124-base

# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
# unset for now
RUN pip3 config unset global.index-url

# transformers 4.47.0 contains the following bug:
# AttributeError: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask'
RUN pip3 install --no-cache-dir \
torch==2.4.0 \
accelerate \
codetiming \
dill \
hydra-core \
numpy \
pybind11 \
tensordict \
"transformers <= 4.46.0"

RUN pip3 install --no-cache-dir flash-attn==2.7.0.post2 --no-build-isolation

# vllm depends on ray, and veRL does not support ray > 2.37
RUN pip3 install --no-cache-dir vllm==0.6.3 ray==2.10

# install apex
RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
--config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
git+https://github.com/NVIDIA/apex

# install Transformer Engine
# - flash-attn pinned to 2.5.3 by TransformerEngine, switch to eric-haibin-lin/TransformerEngine.git@v1.7.0 to relax version req
# - install with: MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 to avoid OOM
# - cudnn is required by TransformerEngine
# RUN CUDNN_PATH=/opt/conda/lib/python3.11/site-packages/nvidia/cudnn \
# pip3 install git+https://github.com/eric-haibin-lin/TransformerEngine.git@v1.7.0
RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install flash-attn==2.5.3 --no-cache-dir --no-build-isolation
RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7
Loading
0