-
Notifications
You must be signed in to change notification settings - Fork 2k
Insights: huggingface/trl
Overview
Could not load contribution data
Please try again later
29 Pull requests merged by 15 people
-
⬆️ Bump dev version
#3626 merged
Jun 20, 2025 -
Release: v0.19
#3625 merged
Jun 20, 2025 -
🧰 [SFT] Tool support
#3597 merged
Jun 20, 2025 -
🔍 Add test to verify chat template consistency
#3624 merged
Jun 20, 2025 -
⚔️ Fix bf16 fp16 config conflict issue
#3598 merged
Jun 20, 2025 -
📜 Add
chat_template_path
parameter toSFTConfig
#3599 merged
Jun 20, 2025 -
🧬 Add
generation_kwargs
as a property ofGRPOConfig
to support additional generation arguments.#3617 merged
Jun 20, 2025 -
[GRPO] Fix prompt truncation (
max_prompt_length
) with vLLM.#3601 merged
Jun 20, 2025 -
⭐ Add
vllm_gpu_memory_utilization
recommendation script#3554 merged
Jun 19, 2025 -
🎁 Put the reward computation in a separate function
#3620 merged
Jun 19, 2025 -
🤵♂️ SFT on assistant messages only
#3586 merged
Jun 19, 2025 -
🦘 Skip no-op ChatML conversion for datasets already in ChatML format
#3594 merged
Jun 19, 2025 -
📚 SFTTrainer support chat template kwargs
#3609 merged
Jun 19, 2025 -
🔖 Fix: ensure user-provided
labels
are retained in self._signature_columns#3589 merged
Jun 19, 2025 -
👔 Apply doc-builder style
#3615 merged
Jun 19, 2025 -
🏛️ Fix CI and Iterative SFT
#3614 merged
Jun 19, 2025 -
🏁 Refactor reference model initialization in GRPOTrainer
#3575 merged
Jun 18, 2025 -
[SFT] Clarify default collator docs
#3606 merged
Jun 18, 2025 -
Change
enforce_eager
default value in vLLM server.#3607 merged
Jun 18, 2025 -
Fix Typos in Comments and Improve Clarity in Trainer Modules
#3596 merged
Jun 18, 2025 -
Fix: list-typed tags handling in
Trainer::create_model_card
#3613 merged
Jun 18, 2025 -
🗳️ Remove
logging_steps
parameter from for simpler setup#3612 merged
Jun 18, 2025 -
♻️ Avoids redundant calculation of ref logps in the new policy update loop
#3600 merged
Jun 18, 2025 -
Fix Typo in Documentation and Notebook; Improve Library Installation Comment
#3593 merged
Jun 15, 2025 -
Fix typos and improve metric descriptions in documentation
#3585 merged
Jun 15, 2025 -
🛡️ Adding trust_remote_code to vllm-serve
#3588 merged
Jun 15, 2025 -
💬 Fix
setup_chat_format
and addclone_chat_template
#3404 merged
Jun 15, 2025 -
💡 Fix wrong type hint for formatting_func argument in SFTTrainer
#3584 merged
Jun 15, 2025 -
💡 Fix type hints in trainer/utils.py
#3591 merged
Jun 15, 2025
3 Pull requests opened by 3 people
-
ClearML logging of visualization in RewardTrainer evaluation
#3602 opened
Jun 16, 2025 -
[WIP] [SFT] SFT doc rewrite
#3619 opened
Jun 18, 2025 -
Feature: Add SGLang support for GRPO Trainer
#3627 opened
Jun 21, 2025
17 Issues closed by 5 people
-
add generation_kwargs to `GRPOTrainer`, so people have more control when training
#3562 closed
Jun 20, 2025 -
Repeated calculation of ref_per_token_logps in grpo_trainer
#3621 closed
Jun 19, 2025 -
Unexpected behavior of `unwrap_model_for_generation`
#3416 closed
Jun 19, 2025 -
GRPO trainer cannot start with zero1 together with bf16
#3359 closed
Jun 18, 2025 -
GRPO_trainer, why deepspeed_zero3 and is_peft_model not compatible?
#3041 closed
Jun 18, 2025 -
vLLM max_model_len should be set as the sum of max_prompt_len and max_completion_length
#3113 closed
Jun 18, 2025 -
Clarification on default data collator for SFTTrainer
#3580 closed
Jun 18, 2025 -
GRPO server mode fails with pydantic error
#3603 closed
Jun 18, 2025 -
Attribute error during model card creation in GRPO
#3610 closed
Jun 18, 2025 -
Qwen3 training support
#3387 closed
Jun 17, 2025 -
keep_end + max_length causes NaNs in trainer_state.json
#3382 closed
Jun 17, 2025 -
how to run multi-adapter PPO training in TRL==0.16.1 ?
#3331 closed
Jun 17, 2025 -
Grpo trainer for VLMs like Qwen 2.5 VL
#3590 closed
Jun 17, 2025 -
Type hint problem in formatting_func argument in SFTTrainer
#3583 closed
Jun 15, 2025
8 Issues opened by 8 people
-
GRPO server mode connection error in the middle of training
#3622 opened
Jun 20, 2025 -
SFTTrainer crashes when using dataloader > 0 with deepspeed installed
#3618 opened
Jun 18, 2025 -
Latest default config of bf16=True breaks test cases when run on CPU.
#3616 opened
Jun 18, 2025 -
GRPO server mode gets stuck in model update
#3608 opened
Jun 17, 2025 -
How to convert my multiturn dialogue dataset?
#3605 opened
Jun 17, 2025 -
PPOTrainer need same tokenizer for policy and reward?
#3595 opened
Jun 15, 2025 -
trl vllm-serve support more args: e.g. reasoning-parser
#3592 opened
Jun 15, 2025
19 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add entropy based filtering inside the GRPOTrainer.
#3563 commented on
Jun 19, 2025 • 3 new comments -
Fix: corrected fsdp in GRPO trainer
#3582 commented on
Jun 21, 2025 • 0 new comments -
Check rewards shapes in RewardTrainer
#3577 commented on
Jun 19, 2025 • 0 new comments -
Chisquare regularized DPO
#3573 commented on
Jun 19, 2025 • 0 new comments -
🥳 new rloo
#3533 commented on
Jun 19, 2025 • 0 new comments -
HF Doc Builder style
#3498 commented on
Jun 19, 2025 • 0 new comments -
Allow an user to train from a local dataset
#3470 commented on
Jun 15, 2025 • 0 new comments -
Add async tool-enabled vLLM server for GRPO training via OpenAI-compatible interface
#3469 commented on
Jun 17, 2025 • 0 new comments -
Support iterable datasets in GRPO
#3226 commented on
Jun 18, 2025 • 0 new comments -
🚀 Enhance GRPO VLLM server from sync to async and accelerate training
#3182 commented on
Jun 17, 2025 • 0 new comments -
Packing sequences for memory efficiency in GRPO and other preference learning implementations
#3549 commented on
Jun 20, 2025 • 0 new comments -
[question] best way to have my own reward model which is backed by rules
#2518 commented on
Jun 20, 2025 • 0 new comments -
NCCL timeout when GRPO training with vllm
#2923 commented on
Jun 20, 2025 • 0 new comments -
runing example raise error: The size of tensor a (2) must match the size of tensor b (16) at non-singleton dimension 0
#3344 commented on
Jun 19, 2025 • 0 new comments -
`data_utils.apply_chat_template()` does not allow roles like "tool" or "tool_results" in the last message of `"prompt"`
#3529 commented on
Jun 19, 2025 • 0 new comments -
Add Adaptive Entropy Control to GRPOTrainer
#3320 commented on
Jun 18, 2025 • 0 new comments -
[GRPO] Entropy metric
#3571 commented on
Jun 17, 2025 • 0 new comments -
Question about the k3 KL estimator implementation
#3556 commented on
Jun 17, 2025 • 0 new comments -
Completions Only Loss is incompatible with use_liger_kernel set as true
#3484 commented on
Jun 17, 2025 • 0 new comments