how to run multi-adapter PPO training in TRL==0.16.1 ? · Issue #3331 · huggingface/trl · GitHub

10000 how to run multi-adapter PPO training in TRL==0.16.1 ? · Issue #3331 · huggingface/trl · GitHub

More Web Proxy on the site http://driver.im/

how to run multi-adapter PPO training in TRL==0.16.1 ? #3331

Closed

Closed

how to run multi-adapter PPO training in TRL==0.16.1 ?#3331

Labels

❓ question🏋 PPO🏋 SFT

In TRL==0.11.0, we can use multi-adapter to train PPO model like:

$\pi_\text{sft}$ sft model as base model
$\pi_\text{sft} + \text{LoRA}_\text{rm}$ as reward model
$\pi_\text{sft} + \text{LoRA}_\text{policy}$ as policy model
$\pi_\text{sft} + \text{LoRA}_\text{critic}$ as value model

in v0.16.0 how to run multi-adapter PPO training.

Metadata

Assignees

No one assigned

Labels

❓ question🏋 PPO🏋 SFT

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

0