Pulse · huggingface/trl · GitHub

10000 Pulse · huggingface/trl · GitHub

More Web Proxy on the site http://driver.im/

10000

June 19, 2025 – June 22, 2025

Overview

17 Active pull requests

4 Active issues

1 Release published by 1 person

v0.19.0
published Jun 21, 2025

16 Pull requests merged by 7 people

⬆️ Bump dev version
#3626 merged Jun 20, 2025
Release: v0.19
#3625 merged Jun 20, 2025
🧰 [SFT] Tool support
#3597 merged Jun 20, 2025
🔍 Add test to verify chat template consistency
#3624 merged Jun 20, 2025
⚔️ Fix bf16 fp16 config conflict issue
#3598 merged Jun 20, 2025
📜 Add chat_template_path parameter to SFTConfig
#3599 merged Jun 20, 2025
🧬 Add generation_kwargs as a property of GRPOConfig to support additional generation arguments.
#3617 merged Jun 20, 2025
[GRPO] Fix prompt truncation (max_prompt_length) with vLLM.
#3601 merged Jun 20, 2025
⭐ Add vllm_gpu_memory_utilization recommendation script
#3554 merged Jun 19, 2025
🎁 Put the reward computation in a separate function
#3620 merged Jun 19, 2025
🤵‍♂️ SFT on assistant messages only
#3586 merged Jun 19, 2025
🦘 Skip no-op ChatML conversion for datasets already in ChatML format
#3594 merged Jun 19, 2025
📚 SFTTrainer support chat template kwargs
#3609 merged Jun 19, 2025
🔖 Fix: ensure user-provided labels are retained in self._signature_columns
#3589 merged Jun 19, 2025
👔 Apply doc-builder style
#3615 merged Jun 19, 2025
🏛️ Fix CI and Iterative SFT
#3614 merged Jun 19, 2025

1 Pull request opened by 1 person

Feature: Add SGLang support for GRPO Trainer
#3627 opened Jun 21, 2025

3 Issues closed by 2 people

add generation_kwargs to `GRPOTrainer`, so people have more control when training
#3562 closed Jun 20, 2025
TRL GRPOtrainer truncation problem: The decoder prompt (length 3353) is longer than the maximum model length of 2560.
#3569 closed Jun 20, 2025
Repeated calculation of ref_per_token_logps in grpo_trainer
#3621 closed Jun 19, 2025

1 Issue opened by 1 person

GRPO server mode connection error in the middle of training
#3622 opened Jun 20, 2025

17 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

`data_utils.apply_chat_template()` does not allow roles like "tool" or "tool_results" in the last message of `"prompt"`
#3529 commented on Jun 19, 2025 • 0 new comments
PPOTrainer need same tokenizer for policy and reward?
#3595 commented on Jun 19, 2025 • 0 new comments
runing example raise error: The size of tensor a (2) must match the size of tensor b (16) at non-singleton dimension 0
#3344 commented on Jun 19, 2025 • 0 new comments
SFTTrainer crashes when using dataloader > 0 with deepspeed installed
#3618 commented on Jun 19, 2025 • 0 new comments
my code outputs "tensor 'kwargs['input_ids']' size mismatch" while training. ckpt is not saved and the log is not printed.
#3611 commented on Jun 19, 2025 • 0 new comments
NCCL timeout when GRPO training with vllm
#2923 commented on Jun 20, 2025 • 0 new comments
How to convert my multiturn dialogue dataset？
#3605 commented on Jun 20, 2025 • 0 new comments
[question] best way to have my own reward model which is backed by rules
#2518 commented on Jun 20, 2025 • 0 new comments
Packing sequences for memory efficiency in GRPO and other preference learning implementations
#3549 commented on Jun 20, 2025 • 0 new comments
HF Doc Builder style
#3498 commented on Jun 19, 2025 • 0 new comments
🥳 new rloo
#3533 commented on Jun 19, 2025 • 0 new comments
Add entropy based filtering inside the GRPOTrainer.
#3563 commented on Jun 19, 2025 • 0 new comments
Chisquare regularized DPO
#3573 commented on Jun 19, 2025 • 0 new comments
Check rewards shapes in RewardTrainer
#3577 commented on Jun 19, 2025 • 0 new comments
Fix: corrected fsdp in GRPO trainer
#3582 commented on Jun 21, 2025 • 0 new comments
ClearML logging of visualization in RewardTrainer evaluation
#3602 commented on Jun 19, 2025 • 0 new comments
[WIP] [SFT] SFT doc rewrite
#3619 commented on Jun 19, 2025 • 0 new comments

0