v3.4.1

@hjh0119

中文版

新特性

序列并行: 支持在PT/SFT/DPO阶段使用ulysses序列并行。兼容deepspeed、packing、flash_attn、streaming等训练技术。训练脚本参考这里。
GRPO: 支持自定义奖励模型逻辑，内置了一个生成式奖励模型的例子，训练脚本参考这里。
Megatron-SWIFT: 更新megatron-core到0.12.0；新增max_epochs参数，在epoch到达max_epochs时停止训练并保存权重；新增wandb参数记录训练日志。
最佳实践：新增从零开始快速训练视觉语言模型的最佳实践，参考这里。
外部贡献：支持GRPO使用judge0执行生成的代码；支持指定freeze/activate parameters使用正则表达式；支持对初始化模型中未初始化参数指定初始化策略。感谢招商银行技术团队的贡献。

新模型

XiaomiMiMo/MiMo-7B-RL系列
deepseek-ai/DeepSeek-Prover-V2-7B系列
OpenGVLab/InternVL3-1B-Pretrained系列

English Version

New Features

Sequence Parallelism: Supports the use of Ulysses sequence parallelism during PT/SFT/DPO stages. Compatible with training techniques such as DeepSpeed, packing, flash_attn, and streaming. Refer to the training script here.
GRPO: Supports custom reward model logic. Includes a built-in example of a generative reward model. Refer to the training script here.
Megatron-SWIFT: Updated megatron-core to version 0.12.0. Added the max_epochs parameter to stop training and save weights when the epoch reaches max_epochs. Added the wandb parameter to log training metrics.
Best Practices: Added best practices for quickly training vision-language models from scratch. Refer to the guide here.
External Contributions: Supports GRPO using judge0 for executing generated code. Allows specifying freeze/activate parameters using regular expressions. Supports defining initialization strategies for uninitialized parameters in the initial model. Thanks to the contributions from the technical team at China Merchants Bank.

New Models

XiaomiMiMo/MiMo-7B-RL Series
deepseek-ai/DeepSeek-Prover-V2-7B Series
OpenGVLab/InternVL3-1B-Pretrained Series

What's Changed

Fix grpo eval when gas > 1 by @hjh0119 in #4057
support qwen3-moe awq by @Jintao-Huang in #4059
Support empty think loss scale by @Jintao-Huang in #4065
fix packing eval streaming by @Jintao-Huang in #4066
support MiMo-7B by @Jintao-Huang in #4067
fix padding_side left by @Jintao-Huang in #4069
feat: add run name support by @firefighter-eric in #4072
feat: support megatron wandb by @firefighter-eric in #4074
update docs by @Jintao-Huang in #4078
Support ulysses for llm/mllm,dpo/sft by @tastelikefeet in #4085
fix enable_cache by @Jintao-Huang in #4091
Update liger code by @tastelikefeet in #4095
support max_epochs by @Jintao-Huang in #4102
[megatron] Update long text shell by @Jintao-Huang in #4106
fix requirements by @Jintao-Huang in #4108
fix enable_cache by @Jintao-Huang in #4109
fix packing by @Jintao-Huang in #4113
Fix ulysses eval by @tastelikefeet in #4114
fix omni aligner by @Jintao-Huang in #4117
fix sequence_parallel by @Jintao-Huang in #4122
update qwen3 more models by @Jintao-Huang in #4123
[grpo] fix labels pop and peftmodel parameter check by @hjh0119 in #4136
[megatron] support max_epochs by @Jintao-Huang in #4125
grpo code reward by judge0 by @kevssim in #4140
Feature freezing/activating parameters via regex by @lincq2000 in #4143
Support init parameters by @lincq2000 in #4141
fix ulysses dpo by @tastelikefeet in #4149
Fix bugs by @Jintao-Huang in #4150
fix init parameters by @lincq2000 in #4148
Add sp script by @tastelikefeet in #4154
Add more evaluation args by @Yunnglin in #4155
update readme by @Jintao-Huang in #4157
Support ulysses streaming by @tastelikefeet in #4160
[megatron]Support packing & CP by @Jintao-Huang in #4163
support internvl3 pretrain instruct by @Jintao-Huang in #4164
[grpo] support gen rm by @hjh0119 in #4151
[grpo] fix multi modal doc by @hjh0119 in #4124
fix _tp_plan by @Jintao-Huang in #4167
[doc] VL model training best practice by @hjh0119 in #4168
fix val_dataset streaming packing by @Jintao-Huang in #4172
fix kto by @tastelikefeet in #4180
fix max_length by @Jintao-Huang in #4178
fix loss_scale by @Jintao-Huang in #4183
support deepseek_prover_v2 by @Jintao-Huang in #4184
update docs by @Jintao-Huang in #4189

New Contributors

@firefighter-eric made their first contribution in #4072
@kevssim made their first contribution in #4140
@lincq2000 made their first contribution in #4143

Full Changelog: v3.4.0...v3.4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.4.1

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!