Add support for repetition_penalty in GrpoParams #1654

REDDITARUN · 2025-04-25T19:33:36Z

Description

This PR adds support for the repetition_penalty parameter in the GrpoParams class.

repetition_penalty is a generation parameter commonly used in language models to discourage or encourage the repetition of tokens. By default, it is set to 1.0 (no penalty). Values >1.0 reduce repetition in generated text, and values <1.0 increase it.

This change allows users to fine-tune output repetition behavior during generation through Oumi’s generation interface, bringing it in line HuggingFace TRL GRPO.

Tested that TRL GRPO training works, and also tested that regular training isn't affected.

Related issues

Fixes #1655

Before submitting

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

nikg4 · 2025-04-25T19:55:10Z

@REDDITARUN Thanks for sending this PR! Also, could you please open a feature request for this task, and include relevant context why it's needed ?

REDDITARUN · 2025-04-25T20:26:16Z

Thanks @nikg4! Just submitted the feature request here: #1655
Let me know if anything else is needed! 🙌

wizeng23

Thanks for adding this! Please address the one comment, and then it can be merged.

src/oumi/datasets/grpo/rewards/__init__.py

Co-authored-by: William Zeng <10782997+wizeng23@users.noreply.github.com>

REDDITARUN · 2025-04-25T23:08:51Z

Thanks! Reverted the newline. Should be good now 🙌

wizeng23 · 2025-04-27T05:18:46Z

Please make sure pre-commit run --all-files --show-diff-on-failure run locally doesn't error. There's a linter error about a comment being too long, could you please resolve?

REDDITARUN · 2025-04-28T06:50:52Z

Got it. I'll fix the linter error. Thanks for the heads up!

REDDITARUN added 2 commits April 25, 2025 15:07

Add support for repetition_penalty in GrpoParams

9a1fe86

Add support for repetition_penalty in GrpoParams

ebdb4a4

nikg4 requested review from nikg4, jgreer013 and wizeng23 April 25, 2025 19:53

wizeng23 approved these changes Apr 25, 2025

View reviewed changes

src/oumi/datasets/grpo/rewards/__init__.py Outdated Show resolved Hide resolved

Update src/oumi/datasets/grpo/rewards/__init__.py

d5afd90

Co-authored-by: William Zeng <10782997+wizeng23@users.noreply.github.com>

Update grpo_params.py

aee9d4c

wizeng23 merged commit 0665cee into oumi-ai:main May 6, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for repetition_penalty in GrpoParams #1654

Add support for repetition_penalty in GrpoParams #1654

Add support for repetition_penalty in GrpoParams #1654

Add support for repetition_penalty in GrpoParams #1654

Conversation

Description

Related issues

Before submitting

Reviewers

Choose a reason for hiding this comment