Infini-AI-Lab · haizhongzheng · Jun 4, 2025 · May 30, 2025 · May 30, 2025 · May 30, 2025
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,129 @@
+data-data
+data-model
+data-task
+core
+
+**/*.pt
+**/checkpoints
+**/wget-log
+**/_build/
+**/*.ckpt
+**/outputs
+**/*.tar.gz
+**/playground
+**/wandb
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+dataset/*
+tensorflow/my_graph/*
+.idea/
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*,cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# IPython Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# dotenv
+.env
+
+# virtualenv
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+
+# Rope project settings
+.ropeproject
+
+# vscode
+.vscode
+
+# Mac
+.DS_Store
+
+# output logs
+tests/e2e/toy_examples/deepspeed/synchronous/output.txt
+
+# vim
+*.swp
+
+# ckpt
+*.lock
+
+# data
+*.parquet
+
+
+# local logs
+logs
+log
diff --git a/README.md b/README.md
@@ -1 +1,96 @@
-# GRESO
+<div align="center">
+<h1> Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
+</h1>
+
+Haizhong Zheng<sup>1</sup>, Yang Zhou<sup>1</sup>, Brian R. Bartoldson<sup>2</sup>, Bhavya Kailkhura<sup>2</sup>,
+<br>
+Fan Lai<sup>3</sup>, Jiawei Zhao<sup>4</sup>, Beidi Chen<sup>1</sup>
+<br>
+<sup>1</sup>Carnegie Mellon University,
+<sup>2</sup>Lawrence Livermore National Laboratory,
+<br>
+<sup>3</sup>University of Illinois Urbana-Champaign,
+<sup>4</sup>Meta AI
+
+<div align="center">
+[<a href="">Paper</a>] | [<a href="https://infini-ai-lab.github.io/GRESO/">Blog</a>]
+</div>
+<br>
+
+<!-- ---------- -->
+**TL;DR**
+We propose GRESO, a lightweight pre-rollout filtering method that improves the efficiency of *rollout scaling* in LLM RL by predicting and skipping low-value prompts.
+</div>
+
+
+
+## 🗞️ News
+
+- **[2025.06.03]** Blog post released: [Act Only When It Pays – GRESO](https://infini-ai-lab.github.io/GRESO/)
+- **[2025.06.03]** Paper preprint available on [arXiv]()
+
+<!-- ---------- -->
+<p align="center">
+  <img src="static/scaling.jpg" alt="GRESO Overview" style="width:90%;"/>
+</p>
+
+<p align="center"><i>
+Figure 1: We train Qwen2.5-Math-1.5B/7B on the DAPO + MATH dataset and evaluate them on five math reasoning benchmarks: MATH500, AMC, Gaokao, Minerva, and Olympiad Bench. Compared to the baseline method (Dynamic Sampling), our approach (GRESO) reduces rollout overhead by up to 2x while achieving comparable training performance, improving the efficiency of rollout scaling.
+</i></p>
+
+<!-- ------- -->
+
+## Getting Started
+Our implementation is based on [volcengine/verl](https://github.com/volcengine/verl)
+.
+
+### 1. Environment Setup
+
+
+
+```bash
+conda create -n greso python==3.11
+conda activate greso
+
+# Install verl
+# exit the project folder to install verl
+cd ..
+git clone https://github.com/volcengine/verl.git
+cd verl
+pip3 install -e .
+
+# Install the latest stable version of vLLM
+pip3 install vllm==0.8.2
+
+# Install flash-attn
+pip3 install flash-attn --no-build-isolation
+
+## misc
+pip install wandb IPython matplotlib ipdb latex2sympy2-extended math-verify torchdata pylatexenc
+```
+
+### 2. Download & Preprocess Data
+
+You can download the dataset using the following command:
+
+```bash
+# cd the project folder
+conda activate greso
+export PYTHONPATH="$PYTHONPATH:$(pwd)"
+
+bash train-scripts/generate_dataset.sh
+```
+
+### 3. Training
+
+Train Qwen Math 1.5b with GRESO on 4xH100:
+```bash
+bash train-scripts/math_qwen_1_5b_dm_greso.sh
+```
+
+Train Qwen Math 7b with GRESO on 8xH100:
+```bash
+bash train-scripts/math_qwen_7b_dm_greso.sh
+```
+
+See more scripts in `train-scripts` folder.
diff --git a/docker/Dockerfile.megatron b/docker/Dockerfile.megatron
@@ -0,0 +1,9 @@
+FROM verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
+
+RUN pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
+
+RUN cd /opt/nvidia && git clone --single-branch --branch core_r0.11.0 https://github.com/NVIDIA/Megatron-LM.git Megatron-LM
+
+# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
+# unset for now
+RUN cd /opt/nvidia/Megatron-LM && pip3 install --no-deps -e .
diff --git a/docker/Dockerfile.ngc.vllm b/docker/Dockerfile.ngc.vllm
@@ -0,0 +1,47 @@
+# docker buildx build --platform linux/x86_64 -t "verlai/verl:ngc-th2.4.0-cu124-vllm0.6.3-ray2.4-te1.7-v0.0.6" -f docker/Dockerfile.ngc.vllm . --builder cloud-verlai-verl-builder --progress=plain --push
+FROM nvcr.io/nvidia/pytorch:24.05-py3
+
+# uninstall nv-pytorch fork
+RUN pip3 uninstall pytorch-quantization \
+     pytorch-triton \
+     torch \
+     torch-tensorrt \
+     torchvision \
+     xgboost transformer_engine flash_attn \
+     apex megatron-core -y
+
+RUN pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
+
+# =============== Megatron dependencies (optional) =================
+# install apex, set MAX_JOBS to avoid OOMs
+RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
+    --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
+    git+https://github.com/NVIDIA/apex
+# =============== End of Megatron dependencies (optional) =================
+
+RUN pip3 install --no-cache-dir \
+    accelerate \
+    codetiming \
+    datasets \
+    dill \
+    hydra-core \
+    numpy \
+    'pandas' \
+    'peft' \
+    'pyarrow>=15.0.0' \
+    'pybind11' \
+    'pylatexenc' \
+    'ray>=2.10' \
+    'tensordict<0.6' \
+    'transformers' \
+    'vllm==0.6.3.post1' \
+    'wandb'
+
+# full dependencies
+RUN pip3 install pytest yapf py-spy pyext liger-kernel
+
+# =============== Megatron dependencies (optional) =================
+# install Transformer Engine, which requires FA 2.5.8. Do it in a separate step for docker cache
+RUN MAX_JOBS=4 NINJA_FLAGS="-j4" pip3 install flash-attn==2.5.8 --no-cache-dir --no-build-isolation
+RUN MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 pip3 install git+https://github.com/eric-haibin-lin/TransformerEngine.git@v1.7.0
+# =============== End of Megatron dependencies (optional) =================
diff --git a/docker/Dockerfile.rocm b/docker/Dockerfile.rocm
@@ -0,0 +1,45 @@
+#  Build the docker in the repo dir:
+# docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .
+# docker images # you can find your built docker
+
+
+FROM rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
+
+# Set working directory
+# WORKDIR $PWD/app
+
+# Set environment variables
+ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"
+
+# Install vllm
+RUN pip uninstall -y vllm && \
+    rm -rf vllm && \
+    git clone -b v0.6.3 https://github.com/vllm-project/vllm.git && \
+    cd vllm && \
+    MAX_JOBS=$(nproc) python3 setup.py install && \
+    cd .. && \
+    rm -rf vllm
+
+# Copy the entire project directory
+COPY . .
+
+# Install dependencies
+RUN pip install "tensordict<0.6" --no-deps && \
+    pip install accelerate \
+    codetiming \
+    datasets \
+    dill \
+    hydra-core \
+    liger-kernel \
+    numpy \
+    pandas \
+    peft \
+    "pyarrow>=15.0.0" \
+    pylatexenc \
+    "ray[data,train,tune,serve]" \
+    torchdata \
+    transformers \
+    wandb \
+    orjson \
+    pybind11 && \
+    pip install -e . --no-deps
diff --git a/docker/Dockerfile.vemlp.vllm.te b/docker/Dockerfile.vemlp.vllm.te
@@ -0,0 +1,41 @@
+# docker buildx build --platform linux/x86_64 -t "verlai/verl:$TAG" -f docker/$FILE .
+
+# the one in docker.io is an alias for the one veturbo
+# FROM vemlp-cn-beijing.cr.volces.com/veturbo/pytorch:2.4-cu124
+FROM docker.io/haibinlin/verl:v0.0.5-th2.4.0-cu124-base
+
+# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
+# unset for now
+RUN pip3 config unset global.index-url
+
+# transformers 4.47.0 contains the following bug:
+# AttributeError: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask'
+RUN pip3 install --no-cache-dir \
+    torch==2.4.0 \
+    accelerate \
+    codetiming \
+    dill \
+    hydra-core \
+    numpy \
+    pybind11 \
+    tensordict \
+    "transformers <= 4.46.0"
+
+RUN pip3 install --no-cache-dir flash-attn==2.7.0.post2 --no-build-isolation
+
+# vllm depends on ray, and veRL does not support ray > 2.37
+RUN pip3 install --no-cache-dir vllm==0.6.3 ray==2.10
+
+# install apex
+RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
+    --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
+    git+https://github.com/NVIDIA/apex
+
+# install Transformer Engine
+# - flash-attn pinned to 2.5.3 by TransformerEngine, switch to eric-haibin-lin/TransformerEngine.git@v1.7.0 to relax version req
+# - install with: MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 to avoid OOM
+# - cudnn is required by TransformerEngine
+# RUN CUDNN_PATH=/opt/conda/lib/python3.11/site-packages/nvidia/cudnn \
+#     pip3 install git+https://github.com/eric-haibin-lin/TransformerEngine.git@v1.7.0
+RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install flash-attn==2.5.3 --no-cache-dir --no-build-isolation
+RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7