add fast_rmsnorm #8680

deepllz · 2024-06-28T02:51:41Z

PR types

Performance optimization

PR changes

Others

Description

基于fast_ln，支持了fast_rms_norm。
对性能的影响：
使得rms_norm算子速度提升了1倍，模型吞吐如下：

模型	并行策略	pr前吞吐	pr后吞吐
Llama-2 7B	gbs8, sharding8-mbs1-acc1	4454.693	4490.384
Llama-2 13B	gbs8, pp4sharding2-vpp5-mbs1-acc4	2229.921	2252.541

对精度的影响：
修改前后保证了fast_ln的结果不变：
具体测试是打印了此算子前向和反向的md5sum值，结果不变，具体如下：

PR前的结果：

fast_rms_norm和fused_rms_norm无法做到诸位对齐。但不影响收敛，收敛的验证是通过TE来验证的，TE中用的就是fast_rms_norm，已知bf16精度的情况下，开关TE不影响收敛。
具体的精度测试结果如下：

可以看到，前向反向的md5sum值对不上，tensor值不完全相同，从diff上看，两边值几乎相同，对于shape=[10, 4096]的输出tensor，通过print(paddle.nonzero(output1 - output2))，可以看到有462个元素的值结果不同，占比1.1%，元素在1e-4精度有diff。反向亦如此

端到端影响：
控制相同输入和参数初始化


只看第一个loss的话，绝对误差1e-3，相对误差在1e-5

paddle-bot · 2024-06-28T02:51:45Z

Thanks for your contribution!

codecov · 2024-06-28T03:22:20Z

Codecov Report

Attention: Patch coverage is 22.22222% with 7 lines in your changes missing coverage. Please review.

Project coverage is 55.74%. Comparing base (c574d6d) to head (0a7af50).
Report is 222 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/llama/fusion_ops.py	25.00%	6 Missing ⚠️
paddlenlp/transformers/llama/modeling.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8680   +/-   ##
========================================
  Coverage    55.74%   55.74%           
========================================
  Files          623      623           
  Lines        97454    97457    +3     
========================================
+ Hits         54323    54331    +8     
+ Misses       43131    43126    -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ZHUI · 2024-07-01T03:10:22Z

测试精度的结果，PR里面展示一下吧。

ZHUI

LGTM

deepllz force-pushed the fast_rmsnorm branch from b0577ab to 943ad01 Compare June 28, 2024 02:54

add fast_rmsnorm

943ad01

Merge remote-tracking branch 'upstream/develop' into fast_rmsnorm

12f8e95

deepllz closed this Jul 1, 2024

deepllz reopened this Jul 1, 2024

deepllz added 3 commits July 1, 2024 19:50

fix forward bug

c75a56f 8000

fix backward bug

90aaa0e

Merge remote-tracking branch 'upstream/develop' into fast_rmsnorm

0a7af50

DesmonDay self-requested a review July 3, 2024 04:57

ZHUI approved these changes Jul 4, 2024

View reviewed changes

ZHUI merged commit fd01043 into PaddlePaddle:develop Jul 4, 2024
8 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add fast_rmsnorm #8680

add fast_rmsnorm #8680

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

add fast_rmsnorm #8680

add fast_rmsnorm #8680

Uh oh!

Conversation

Uh oh!

PR types

PR changes

Description

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!