8000 Tags · TideDra/lmm-r1 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: TideDra/lmm-r1

Tags

v0.7.3a

Toggle v0.7.3a's commit message
Merge branch 'dev'

v0.7.2

Toggle v0.7.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
support entropy loss and enable remove padding in replay buffer and f…

…ix kl loss log and fix grpo with ring (#983)

* support entropy loss and enalbe remove padding in replay buffer and fix kl log mask

* fix

* fix

* fix

* fix

* fix

* fix

* update

* up[date

* update

* update

* update

* fix

* fix

* update

* update

* update

* update

* update

* fix

v0.7.2.post1

Toggle v0.7.2.post1's commit message
bump version

v0.7.1.post2

Toggle v0.7.1.post2's commit message
bump version

v0.7.1

Toggle v0.7.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Support single controller PPO (#972)

* init

* fix

* fix

* fix

* fix

* fix make exp

* fix adv norm

* fix

* fix prepare_datasets

* fix

* fix

* fix

* fix

* fix

* fix shuffle

* support ring

* fix train_ppo_ray

* fix rm

* version

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* update

* update

* update

* bump deps

* fix

* update

* fix

* update

* fix

* fix

* update

* fix

* update

* fix

* fix

* update

* fix

* update

* update

* fix

* update

* fix

* update

* update

* update

* update

* update

* update

* update

* update

* update

* fix

* fix

* fix

* update

* update

* deep compile

* update

* update

* update

* update

* fix

* rename

* fix

* fix

* update

* fix save ckpt

* fix OpenRLHF/OpenRLHF#973

v0.7.1.post1

Toggle v0.7.1.post1's commit message
better loading ckpt print

v0.7.0.post1

Toggle v0.7.0.post1's commit message
fix validation for ring

v0.7.0a

Toggle v0.7.0a's commit message
Merge branch 'dev'

v0.7.0

Toggle v0.7.0's commit message
fix rm evaluate steps

v0.6.4

Toggle v0.6.4's commit message
bump version

0