[llm]update dpo criterion #9620

lugimzzz · 2024-12-11T12:44:52Z

PR types

Performance optimization

PR changes

APIs

Description

优化dpo criterion

paddle-bot · 2024-12-11T12:45:01Z

Thanks for your contribution!

lugimzzz · 2024-12-11T12:45:12Z

paddlenlp/trl/dpo_criterion.py

@@ -148,16 +148,25 @@ def dpo_logps(
            if self.config.tensor_parallel_degree > 1 and self.config.sequence_parallel:
                labels, sparse_tgt_idx = sequence_parallel_sparse_mask_labels(labels, 0)

-                hidden_states = paddle.take_along_axis(hidden_states, sparse_tgt_idx, axis=0)
+                hidden_states = paddle.gather(hidden_states, sparse_tgt_idx, axis=0)


gather api代替take_along_axis提高效率

lugimzzz · 2024-12-11T12:45:41Z

paddlenlp/trl/dpo_criterion.py

-                hidden_states = paddle.take_along_axis(hidden_states, sparse_tgt_idx.unsqueeze(-1), axis=0)
-
+                hidden_states = paddle.gather(hidden_states, sparse_tgt_idx, axis=0)
+        elif self.config.use_fused_head_and_loss_fn:


之前缺少sequence parallel的梯度计算补上

lugimzzz · 2024-12-11T12:46:02Z

paddlenlp/trl/dpo_criterion.py

-                [(per_token_logps[response_index[1] : response_index[2]]).sum() for response_index in response_indexs],
+                [
+                    (
+                        paddle.gather(


gather 代替slice，gpu和npu能够通用

lugimzzz · 2024-12-11T12:47:30Z

paddlenlp/trl/dpo_criterion.py

@@ -194,64 +203,65 @@ def dpo_logps(

        if len(response_indexs.shape) == 3:
            response_indexs = response_indexs[0]
+
+        offset = 1 if self.ignore_eos_token else 0


offset是因为希望rm和dpo能共用一个数据流，对于dpo来说 ignore_eos_token默认为True

codecov · 2024-12-11T13:18:44Z

Codecov Report

Attention: Patch coverage is 9.09091% with 10 lines in your changes missing coverage. Please review.

Project coverage is 52.98%. Comparing base (9f237b4) to head (27dc4d2).
Report is 25 commits behind head on develop.

❗ Current head 27dc4d2 differs from pull request most recent head 6fc343f

Please upload reports for the commit 6fc343f to get more accurate results.

Files with missing lines	Patch %	Lines
paddlenlp/trl/dpo_criterion.py	9.09%	10 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9620      +/-   ##
===========================================
- Coverage    53.10%   52.98%   -0.13%     
===========================================
  Files          704      708       +4     
  Lines       110967   111168     +201     
===========================================
- Hits         58925    58898      -27     
- Misses       52042    52270     +228

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wawltor

LGTM

wawltor

LGTM

update dpo criterion

27dc4d2

lugimzzz commented Dec 11, 2024

View reviewed changes

wawltor previously approved these changes Dec 11, 2024

View reviewed changes

update dpo criterion

6fc343f

lugimzzz dismissed wawltor’s stale review via 6fc343f December 12, 2024 02:55

wawltor approved these changes Dec 16, 2024

View reviewed changes

wawltor merged commit f3ba5b3 into PaddlePaddle:develop Dec 16, 2024
9 of 12 checks passed

lugimzzz deleted the gather branch December 16, 2024 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llm]update dpo criterion #9620

[llm]update dpo criterion #9620

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[llm]update dpo criterion #9620

[llm]update dpo criterion #9620

Uh oh!

Conversation

PR types

PR changes

Description

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!