-
Notifications
You must be signed in to change notification settings - Fork 2k
Insights: huggingface/trl
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.19.0
published
Jun 21, 2025
16 Pull requests merged by 7 people
-
⬆️ Bump dev version
#3626 merged
Jun 20, 2025 -
Release: v0.19
#3625 merged
Jun 20, 2025 -
🧰 [SFT] Tool support
#3597 merged
Jun 20, 2025 -
🔍 Add test to verify chat template consistency
#3624 merged
Jun 20, 2025 -
⚔️ Fix bf16 fp16 config conflict issue
#3598 merged
Jun 20, 2025 -
📜 Add
chat_template_path
parameter toSFTConfig
#3599 merged
Jun 20, 2025 -
🧬 Add
generation_kwargs
as a property ofGRPOConfig
to support additional generation arguments.#3617 merged
Jun 20, 2025 -
[GRPO] Fix prompt truncation (
max_prompt_length
) with vLLM.#3601 merged
Jun 20, 2025 -
⭐ Add
vllm_gpu_memory_utilization
recommendation script#3554 merged
Jun 19, 2025 -
🎁 Put the reward computation in a separate function
#3620 merged
Jun 19, 2025 -
🤵♂️ SFT on assistant messages only
#3586 merged
Jun 19, 2025 -
🦘 Skip no-op ChatML conversion for datasets already in ChatML format
#3594 merged
Jun 19, 2025 -
📚 SFTTrainer support chat template kwargs
#3609 merged
Jun 19, 2025 -
🔖 Fix: ensure user-provided
labels
are retained in self._signature_columns#3589 merged
Jun 19, 2025 -
👔 Apply doc-builder style
#3615 merged
Jun 19, 2025 -
🏛️ Fix CI and Iterative SFT
#3614 merged
Jun 19, 2025
1 Pull request opened by 1 person
-
Feature: Add SGLang support for GRPO Trainer
#3627 opened
Jun 21, 2025
3 Issues closed by 2 people
-
add generation_kwargs to `GRPOTrainer`, so people have more control when training
#3562 closed
Jun 20, 2025 -
Repeated calculation of ref_per_token_logps in grpo_trainer
#3621 closed
Jun 19, 2025
1 Issue opened by 1 person
-
GRPO server mode connection error in the middle of training
#3622 opened
Jun 20, 2025
17 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
`data_utils.apply_chat_template()` does not allow roles like "tool" or "tool_results" in the last message of `"prompt"`
#3529 commented on
Jun 19, 2025 • 0 new comments -
PPOTrainer need same tokenizer for policy and reward?
#3595 commented on
Jun 19, 2025 • 0 new comments -
runing example raise error: The size of tensor a (2) must match the size of tensor b (16) at non-singleton dimension 0
#3344 commented on
Jun 19, 2025 • 0 new comments -
SFTTrainer crashes when using dataloader > 0 with deepspeed installed
#3618 commented on
Jun 19, 2025 • 0 new comments -
my code outputs "tensor 'kwargs['input_ids']' size mismatch" while training. ckpt is not saved and the log is not printed.
#3611 commented on
Jun 19, 2025 • 0 new comments -
NCCL timeout when GRPO training with vllm
#2923 commented on
Jun 20, 2025 • 0 new comments -
How to convert my multiturn dialogue dataset?
#3605 commented on
Jun 20, 2025 • 0 new comments -
[question] best way to have my own reward model which is backed by rules
#2518 commented on
Jun 20, 2025 • 0 new comments -
Packing sequences for memory efficiency in GRPO and other preference learning implementations
#3549 commented on
Jun 20, 2025 • 0 new comments -
HF Doc Builder style
#3498 commented on
Jun 19, 2025 • 0 new comments -
🥳 new rloo
#3533 commented on
Jun 19, 2025 • 0 new comments -
Add entropy based filtering inside the GRPOTrainer.
#3563 commented on
Jun 19, 2025 • 0 new comments -
Chisquare regularized DPO
#3573 commented on
Jun 19, 2025 • 0 new comments -
Check rewards shapes in RewardTrainer
#3577 commented on
Jun 19, 2025 • 0 new comments -
Fix: corrected fsdp in GRPO trainer
#3582 commented on
Jun 21, 2025 • 0 new comments -
ClearML logging of visualization in RewardTrainer evaluation
#3602 commented on
Jun 19, 2025 • 0 new comments -
[WIP] [SFT] SFT doc rewrite
#3619 commented on
Jun 19, 2025 • 0 new comments