🚀 Quick start • 🌰 Examples • 🍲 Recipes • 📚 Docs
A lightweight RL framework with PyTorch-like interfaces.
- FSDP2 and FSDP support for training.
vLLM
support for inference.ray
support for resource management.- Easy to learn and use. Most interfaces are kept the same as Torch, with parallel engine working seamlessly behind the scenes.
- Recipes that reproduce SOTA results with a single self-contained python script.
pip install pyrlite
Advanced installation options
We recommend using conda
to manage our computation environment.
- Create a conda environment:
conda create -n rlite python==3.12
conda activate rlite
- Install common dependencies
# install vllm
pip install vllm accelerate
# flash attention 2 (make sure you have 64 CPU cores)
MAX_JOBS=64 pip install flash-attn --no-build-isolation
# Install fashinfer for faster inference
pip install flashinfer-python==0.2.2.post1 -i https://flashinfer.ai/whl/cu124/torch2.6
- Install
rlite
git clone https://github.com/rlite-project/RLite.git
cd RLite; pip install -e .
We use recipes as examples for reproducing SOTA RL methods. Featured recipes
In RLite, users mainly work with Engines, which is a handler that takes the input from the main process, organizes the tasks and sends to the workers. The engine may have multiple Executors, each holding a full set of model weights. Both Engines and Executors reside in the main process. The Workers are the units that actually perform computational tasks, with each Worker corresponding to a GPU. Conversely, a single GPU can be associated with multiple Workers, which can use the GPU in a time-multiplexed manner.
RLite provide minimal interfaces that are
- easy to learn: most interfaces resembles the behavior of PyTorch.
- super flexible: interfaces are independent and can be used seperately. This allows inference without training, e.g. evaluation tasks, or training without inference, e.g. SFT and DPO.
- super powerful: the interfaces combined together allows reproduction of SOTA RL results.
- highly extensible: the interfaces allows extensions for fancy features such as other train/inference backends, streaming generations for multi-turn use cases, asynchronized workers for overlapping time-consuming operations.
Developer's guide.
Write code that you would like to read again.
We use pre-commit
and git cz
to sanitize the commits. You can run pre-commit
before git cz
to avoid repeatedly input the commit messages.
pip install pre-commit
# Install pre-commit hooks
pre-commit install
pre-commit install --hook-type commit-msg
# Install this emoji-style tool
sudo npm install -g git-cz --no-audit --verbose --registry=https://registry.npmmirror.com
# Install rlite
pip install -e ".[dev]"
- Single line code length is 99 characters, comments and documents are 79 characters.
- Write unit tests for atomic capabilities to ensure that
pytest
does not throw an error.
Run pre-commit
to automatically lint the code:
pre-commit run --all-files
# Only run tests
pytest
# Run tests and output test code coverage report
pytest --cov=rlite