PolyRL

PolyRL is a polymorphic reinforcement learning (RL) framework for large language models (LLM). Its unique poly-architecture disaggregates the rollout and update stages, strategically fitting them into heterogeneous hardware configurations to maximize cost efficiency. By utilizing elastic control, PolyRL dynamically scales resources, embodying its adaptable nature. Our goal is to create a portable and affordable fine-tuning platform accessible to everyone.

Installation

Please refer to INSTALL.md for installation instructions.

Usage

Please refer to examples/scripts/README.md for usage instructions.

Roadmap (subject to change)

Release-v0: Basic disaggregated RL system

Goal:

Rollout instances running on dynamic independent hardware.
Rollout instances stream results to training engine via async-efficient manager process.
Weight update via TCP&RDMA aggregated channel.

Rollout
- Decouple rollout and update
  - Training engine send requests to SGLang server via API.
  - Rank-zero send all generation requests and skip local generation.
  - Wrap rollout results streaming into an iterator.
- Multiple rollout instances management
  - Rollout manager in Rust
    - Rollout instance register request via API.
    - Relay requests from training engine to rollout instances.
    - Interface of algorithm-driven request scheduling.
- Rollout instances dynamic in-n-out
  - Active new rollout instance register during runtime.
  - Shutdown connection when rollout instance goes down.
Training
- From batch process to stream process
  - Align micro-batch size along rollout-actor-critic.
  - Reward, KL, advantage in micro-batch.
  - Critic and actor for/backward in micro-batch and update in mini-batch.
  - Align runtime metric collection with micro-batch processing.
Weight transfer
- TCP&RDMA aggregated interface
  - Integrate Mooncake transfer engine.
  - Weight transfer agent for each instance.
- Model weight gather and re-shard
  - Rank-zero gathers weights from FSDP ranks and call agent to transfer.
  - Support weight resharding on TP>1 rollout instances.
- Compression interface
  - Encode and decode weight interface to support compressed weight update.
  - Asynchronous full weight update interface.

Release v1: Elastic and auto-balanced resource management

Goal:

Maximize the utilization of heterogeneous resources.
Adaptive resource allocation to align the extending rollout time.
Improve robustness to handle dynamic availability.

Intra-stage (rollout) balance
- Rollout workload balance, start from homogenous rollout instances.
  - Round-robin request assignment.
  - Per-sample tracking and workload balance (need to check if affect the training progress)
  - Decouple data plane and control plane (replicate requests to all rollout instances and send control message during rollout).
Inter-stage (rollout vs. training) balance
- Rollout buffer zone
  - Estimate the time gap between update finish and rollout ready.
  - Training engine rollout locally before receiving streamed batches.
- Dynamic rollout instances allocation
  - Add rollout instances at runtime when rollout time extend.
Pack sequences in rollout manager
- Send rollout prompts to manager in a batch.
- Rollout manager decompose into per-sample requests and send to rollout instances.
- Manage the order of rollout results and pack into micro-batches.
- Dynamic batch size with a lower bound (block if under; return all when asked).
Fault tolerance
- Handle multiple failure cases
  - Spot instance preemption
  - Failure during weight transfer
  - Failure during rollout
Weight compression
- Quantization+lossless compression to reduce the size of weight before transfer.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples/scripts		examples/scripts
src		src
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PolyRL

Installation

Usage

Roadmap (subject to change)

Release-v0: Basic disaggregated RL system

Release v1: Elastic and auto-balanced resource management

Known issues

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Terra-Flux/PolyRL

Folders and files

Latest commit

History

Repository files navigation

PolyRL

Installation

Usage

Roadmap (subject to change)

Release-v0: Basic disaggregated RL system

Release v1: Elastic and auto-balanced resource management

Known issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages