Pulse · huggingface/trl · GitHub

8000 Pulse · huggingface/trl · GitHub

More Web Proxy on the site http://driver.im/

June 14, 2025 – June 21, 2025

Overview

32 Active pull requests

25 Active issues

2 Releases published by 1 person

v0.18.2
published Jun 15, 2025
v0.19.0
published Jun 21, 2025

29 Pull requests merged by 15 people

⬆️ Bump dev version
#3626 merged Jun 20, 2025
Release: v0.19
#3625 merged Jun 20, 2025
🧰 [SFT] Tool support
#3597 merged Jun 20, 2025
🔍 Add test to verify chat template consistency
#3624 merged Jun 20, 2025
⚔️ Fix bf16 fp16 config conflict issue
#3598 merged Jun 20, 2025
📜 Add chat_template_path parameter to SFTConfig
#3599 merged Jun 20, 2025
🧬 Add generation_kwargs as a property of GRPOConfig to support additional generation arguments.
#3617 merged Jun 20, 2025
[GRPO] Fix prompt truncation (max_prompt_length) with vLLM.
#3601 merged Jun 20, 2025
⭐ Add vllm_gpu_memory_utilization recommendation script
#3554 merged Jun 19, 2025
🎁 Put the reward computation in a separate function
#3620 merged Jun 19, 2025
🤵‍♂️ SFT on assistant messages only
#3586 merged Jun 19, 2025
🦘 Skip no-op ChatML conversion for datasets already in ChatML format
#3594 merged Jun 19, 2025
📚 SFTTrainer support chat template kwargs
#3609 merged Jun 19, 2025
🔖 Fix: ensure user-provided labels are retained in self._signature_columns
#3589 merged Jun 19, 2025
👔 Apply doc-builder style
#3615 merged Jun 19, 2025
🏛️ Fix CI and Iterative SFT
#3614 merged Jun 19, 2025
🏁 Refactor reference model initialization in GRPOTrainer
#3575 merged Jun 18, 2025
[SFT] Clarify default collator docs
#3606 merged Jun 18, 2025
Change enforce_eager default value in vLLM server.
#3607 merged Jun 18, 2025
Fix Typos in Comments and Improve Clarity in Trainer Modules
#3596 merged Jun 18, 2025
Fix: list-typed tags handling in Trainer::create_model_card
#3613 merged Jun 18, 2025
🗳️ Remove logging_steps parameter from for simpler setup
#3612 merged Jun 18, 2025
♻️ Avoids redundant calculation of ref logps in the new policy update loop
#3600 merged Jun 18, 2025
Fix Typo in Documentation and Notebook; Improve Library Installation Comment
#3593 merged Jun 15, 2025
Fix typos and improve metric descriptions in documentation
#3585 merged Jun 15, 2025
🛡️ Adding trust_remote_code to vllm-serve
#3588 merged Jun 15, 2025
💬 Fix setup_chat_format and add clone_chat_template
#3404 merged Jun 15, 2025
💡 Fix wrong type hint for formatting_func argument in SFTTrainer
#3584 merged Jun 15, 2025
💡 Fix type hints in trainer/utils.py
#3591 merged Jun 15, 2025

3 Pull requests opened by 3 people

ClearML logging of visualization in RewardTrainer evaluation
#3602 opened Jun 16, 2025
[WIP] [SFT] SFT doc rewrite
#3619 opened Jun 18, 2025
Feature: Add SGLang support for GRPO Trainer
#3627 opened Jun 21, 2025

17 Issues closed by 5 people

add generation_kwargs to `GRPOTrainer`, so people have more control when training
#3562 closed Jun 20, 2025
TRL GRPOtrainer truncation problem: The decoder prompt (length 3353) is longer than the maximum model length of 2560.
#3569 closed Jun 20, 2025
Repeated calculation of ref_per_token_logps in grpo_trainer
#3621 closed Jun 19, 2025
Unexpected behavior of `unwrap_model_for_generation`
#3416 closed Jun 19, 2025
GRPO trainer cannot start with zero1 together with bf16
#3359 closed Jun 18, 2025
GRPO_trainer, why deepspeed_zero3 and is_peft_model not compatible?
#3041 closed Jun 18, 2025
vLLM max_model_len should be set as the sum of max_prompt_len and max_completion_length
#3113 closed Jun 18, 2025
Clarification on default data collator for SFTTrainer
#3580 closed Jun 18, 2025
GRPO server mode fails with pydantic error
#3603 closed Jun 18, 2025
Attribute error during model card creation in GRPO
#3610 closed Jun 18, 2025
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/disk2/lwc/LLM/Qwen3-32b'. Use `repo_type` argument if needed.
#3604 closed Jun 17, 2025
function call initialization failed, code from https://github.com/huggingface/trl/pull/2455#issue-2729366606
#3564 closed Jun 17, 2025
Qwen3 training support
#3387 closed Jun 17, 2025
keep_end + max_length causes NaNs in trainer_state.json
#3382 closed Jun 17, 2025
how to run multi-adapter PPO training in TRL==0.16.1 ?
#3331 closed Jun 17, 2025
Grpo trainer for VLMs like Qwen 2.5 VL
#3590 closed Jun 17, 2025
Type hint problem in formatting_func argument in SFTTrainer
#3583 closed Jun 15, 2025

8 Issues opened by 8 people

GRPO server mode connection error in the middle of training
#3622 opened Jun 20, 2025
SFTTrainer crashes when using dataloader > 0 with deepspeed installed
#3618 opened Jun 18, 2025
Latest default config of bf16=True breaks test cases when run on CPU.
#3616 opened Jun 18, 2025
my code outputs "tensor 'kwargs['input_ids']' size mismatch" while training. ckpt is not saved and the log is not printed.
#3611 opened Jun 18, 2025
GRPO server mode gets stuck in model update
#3608 opened Jun 17, 2025
How to convert my multiturn dialogue dataset？
#3605 opened Jun 17, 2025
PPOTrainer need same tokenizer for policy and reward?
#3595 opened Jun 15, 2025
trl vllm-serve support more args: e.g. reasoning-parser
#3592 opened Jun 15, 2025

19 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add entropy based filtering inside the GRPOTrainer.
#3563 commented on Jun 19, 2025 • 3 new comments
Fix: corrected fsdp in GRPO trainer
#3582 commented on Jun 21, 2025 • 0 new comments
Check rewards shapes in RewardTrainer
#3577 commented on Jun 19, 2025 • 0 new comments
Chisquare regularized DPO
#3573 commented on Jun 19, 2025 • 0 new comments
🥳 new rloo
#3533 commented on Jun 19, 2025 • 0 new comments
HF Doc Builder style
#3498 commented on Jun 19, 2025 • 0 new comments
Allow an user to train from a local dataset
#3470 commented on Jun 15, 2025 • 0 new comments
Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface
#3469 commented on Jun 17, 2025 • 0 new comments
Support iterable datasets in GRPO
#3226 commented on Jun 18, 2025 • 0 new comments
🚀 Enhance GRPO VLLM server from sync to async and accelerate training
#3182 commented on Jun 17, 2025 • 0 new comments
Packing sequences for memory efficiency in GRPO and other preference learning implementations
#3549 commented on Jun 20, 2025 • 0 new comments
[question] best way to have my own reward model which is backed by rules
#2518 commented on Jun 20, 2025 • 0 new comments
NCCL timeout when GRPO training with vllm
#2923 commented on Jun 20, 2025 • 0 new comments
runing example raise error: The size of tensor a (2) must match the size of tensor b (16) at non-singleton dimension 0
#3344 commented on Jun 19, 2025 • 0 new comments
`data_utils.apply_chat_template()` does not allow roles like "tool" or "tool_results" in the last message of `"prompt"`
#3529 commented on Jun 19, 2025 • 0 new comments
Add Adaptive Entropy Control to GRPOTrainer
#3320 commented on Jun 18, 2025 • 0 new comments
[GRPO] Entropy metric
#3571 commented on Jun 17, 2025 • 0 new comments
Question about the k3 KL estimator implementation
#3556 commented on Jun 17, 2025 • 0 new comments
Completions Only Loss is incompatible with use_liger_kernel set as true
#3484 commented on Jun 17, 2025 • 0 new comments

0