Tags: TideDra/lmm-r1
Tags
support entropy loss and enable remove padding in replay buffer and f… …ix kl loss log and fix grpo with ring (#983) * support entropy loss and enalbe remove padding in replay buffer and fix kl log mask * fix * fix * fix * fix * fix * fix * update * up[date * update * update * update * fix * fix * update * update * update * update * update * fix
Support single controller PPO (#972) * init * fix * fix * fix * fix * fix make exp * fix adv norm * fix * fix prepare_datasets * fix * fix * fix * fix * fix * fix shuffle * support ring * fix train_ppo_ray * fix rm * version * fix * fix * fix * fix * fix * fix * fix * update * update * update * bump deps * fix * update * fix * update * fix * fix * update * fix * update * fix * fix * update * fix * update * update * fix * update * fix * update * update * update * update * update * update * update * update * update * fix * fix * fix * update * update * deep compile * update * update * update * update * fix * rename * fix * fix * update * fix save ckpt * fix OpenRLHF/OpenRLHF#973
PreviousNext