v3.4.1
中文版
新特性
- 序列并行: 支持在PT/SFT/DPO阶段使用ulysses序列并行。兼容deepspeed、packing、flash_attn、streaming等训练技术。训练脚本参考这里。
- GRPO: 支持自定义奖励模型逻辑,内置了一个生成式奖励模型的例子,训练脚本参考这里。
- Megatron-SWIFT: 更新megatron-core到0.12.0;新增max_epochs参数,在epoch到达max_epochs时停止训练并保存权重;新增wandb参数记录训练日志。
- 最佳实践:新增从零开始快速训练视觉语言模型的最佳实践,参考这里。
- 外部贡献:支持GRPO使用judge0执行生成的代码;支持指定freeze/activate parameters使用正则表达式;支持对初始化模型中未初始化参数指定初始化策略。感谢招商银行技术团队的贡献。
新模型
- XiaomiMiMo/MiMo-7B-RL系列
- deepseek-ai/DeepSeek-Prover-V2-7B系列
- OpenGVLab/InternVL3-1B-Pretrained系列
English Version
New Features
- Sequence Parallelism: Supports the use of Ulysses sequence parallelism during PT/SFT/DPO stages. Compatible with training techniques such as DeepSpeed, packing, flash_attn, and streaming. Refer to the training script here.
- GRPO: Supports custom reward model logic. Includes a built-in example of a generative reward model. Refer to the training script here.
- Megatron-SWIFT: Updated megatron-core to version 0.12.0. Added the max_epochs parameter to stop training and save weights when the epoch reaches max_epochs. Added the wandb parameter to log training metrics.
- Best Practices: Added best practices for quickly training vision-language models from scratch. Refer to the guide here.
- External Contributions: Supports GRPO using judge0 for executing generated code. Allows specifying freeze/activate parameters using regular expressions. Supports defining initialization strategies for uninitialized parameters in the initial model. Thanks to the contributions from the technical team at China Merchants Bank.
New Models
- XiaomiMiMo/MiMo-7B-RL Series
- deepseek-ai/DeepSeek-Prover-V2-7B Series
- OpenGVLab/InternVL3-1B-Pretrained Series
What's Changed
- Fix grpo eval when gas > 1 by @hjh0119 in #4057
- support qwen3-moe awq by @Jintao-Huang in #4059
- Support empty think loss scale by @Jintao-Huang in #4065
- fix packing eval streaming by @Jintao-Huang in #4066
- support MiMo-7B by @Jintao-Huang in #4067
- fix padding_side left by @Jintao-Huang in #4069
- feat: add run name support by @firefighter-eric in #4072
- feat: support megatron wandb by @firefighter-eric in #4074
- update docs by @Jintao-Huang in #4078
- Support ulysses for llm/mllm,dpo/sft by @tastelikefeet in #4085
- fix enable_cache by @Jintao-Huang in #4091
- Update liger code by @tastelikefeet in #4095
- support max_epochs by @Jintao-Huang in #4102
- [megatron] Update long text shell by @Jintao-Huang in #4106
- fix requirements by @Jintao-Huang in #4108
- fix enable_cache by @Jintao-Huang in #4109
- fix packing by @Jintao-Huang in #4113
- Fix ulysses eval by @tastelikefeet in #4114
- fix omni aligner by @Jintao-Huang in #4117
- fix sequence_parallel by @Jintao-Huang in #4122
- update qwen3 more models by @Jintao-Huang in #4123
- [grpo] fix labels pop and peftmodel parameter check by @hjh0119 in #4136
- [megatron] support max_epochs by @Jintao-Huang in #4125
- grpo code reward by judge0 by @kevssim in #4140
- Feature freezing/activating parameters via regex by @lincq2000 in #4143
- Support init parameters by @lincq2000 in #4141
- fix ulysses dpo by @tastelikefeet in #4149
- Fix bugs by @Jintao-Huang in #4150
- fix init parameters by @lincq2000 in #4148
- Add sp script by @tastelikefeet in #4154
- Add more evaluation args by @Yunnglin in #4155
- update readme by @Jintao-Huang in #4157
- Support ulysses streaming by @tastelikefeet in #4160
- [megatron]Support packing & CP by @Jintao-Huang in #4163
- support internvl3 pretrain instruct by @Jintao-Huang in #4164
- [grpo] support gen rm by @hjh0119 in #4151
- [grpo] fix multi modal doc by @hjh0119 in #4124
- fix _tp_plan by @Jintao-Huang in #4167
- [doc] VL model training best practice by @hjh0119 in #4168
- fix val_dataset streaming packing by @Jintao-Huang in #4172
- fix kto by @tastelikefeet in #4180
- fix max_length by @Jintao-Huang in #4178
- fix loss_scale by @Jintao-Huang in #4183
- support deepseek_prover_v2 by @Jintao-Huang in #4184
- update docs by @Jintao-Huang in #4189
New Contributors
- @firefighter-eric made their first contribution in #4072
- @kevssim made their first contribution in #4140
- @lincq2000 made their first contribution in #4143
Full Changelog: v3.4.0...v3.4.1