10000 how to run multi-adapter PPO training in TRL==0.16.1 ? · Issue #3331 · huggingface/trl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
how to run multi-adapter PPO training in TRL==0.16.1 ? #3331
Closed
@dhcode-cpp

Description

@dhcode-cpp

In TRL==0.11.0, we can use multi-adapter to train PPO model like:

  • $\pi_\text{sft}$ sft model as base model
  • $\pi_\text{sft} + \text{LoRA}_\text{rm}$ as reward model
  • $\pi_\text{sft} + \text{LoRA}_\text{policy}$ as policy model
  • $\pi_\text{sft} + \text{LoRA}_\text{critic}$ as value model

in v0.16.0 how to run multi-adapter PPO training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ❓ questionSeeking clarification or more information🏋 PPORelated to PPO🏋 SFTRelated to SFT

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0