A scalable asynchronous reinforcement learning implementation with in-flight weight updates. Designed to maximize GPU utilization while staying as on-policy as possible.
Clone the repository and change the directory to pipelinerl
git clone git@github.com:ServiceNow/PipelineRL.git
cd pipelinerl
Create the environments with dependencies.
conda create -n pipeline-rl -y python=3.11
conda run --no-capture-output -n pipeline-rl pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu121
conda run --no-capture-output -n pipeline-rl pip install -r requirements.txt --no-build-isolation
By default Pipeline-RL will use the file system as the medium for streaming the generated data to the trainer processes. This works on one node, but the files can get quite large. To use Redis instead you will need to install the Redis server in the same conda environment:
conda install redis-server==7.4.0 -c conda-forge
First, activate the conda environment:
conda activate pipeline-rl
Single node with 8 H00 GPUs:
python -m pipelinerl.launch output_dir=results/base1
If you only have 4 H100 GPUs:
python -m pipelinerl.launch --config-name base_4gpu output_dir=results/base1
To use Redis instead of the filesystem for data streaming:
python -m pipelinerl.launch streams=redis output_dir=results/base1
Multi node: coming soon.
Some key hyperparameters:
attempts
: the number of attempts per prompt / reasoning problemfinetune.seq_length
: maximum number of tokens per micro-batch, if you are usingfinetune.seq_packing=true
which is the defaultfinetune.train_batch_size
:- if
finetune.seq_packing=false
, the number of samples in each micro-batch. - if
finetune.seq_packing=true
, see the explanation offinetune.gradient_accumulation_passes
below
- if
finetune.gradient_accumulation_passes
- if
finetune.seq_packing=false
: the total number of micro-batches per batch for all training workers - if
finetune.seq_packing=true
: take this number and multiply it byfinetune.train_batch_size
to get the total batch size per optimizer step for all training workers
- if