-
Notifications
You must be signed in to change notification settings - Fork 3k
[Feature] Support float8 dtype storage and deepseek v3 with fp8 inference. #9906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your contribution! |
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9906 +/- ##
===========================================
- Coverage 51.08% 51.07% -0.01%
===========================================
Files 745 748 +3
Lines 119274 119536 +262
===========================================
+ Hits 60927 61055 +128
- Misses 58347 58481 +134 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
return paddle.to_tensor(tensor) | ||
|
||
|
||
class EextendDtypeNumpySafe(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend
is_bf16 = str(tensor.dtype) in ["uint16", "bfloat16"] | ||
tensor = paddle.Tensor.__call__(tensor, zero_copy=True) | ||
lora_A_tensor = paddle.Tensor.__call__(lora_A_tensor, zero_copy=True) | ||
lora_B_tensor = paddle.Tensor.__call__(lora_B_tensor, zero_copy=True) | ||
if self.is_cpu and is_bf16: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里替换__call__函数的原因是什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle.Tensor 不支持 初始化 FP8 tensor,临时切换接口支持。
@@ -0,0 +1,226 @@ | |||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的CopyRight是不是要加上deepseek
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以改成 kernel.py -> fp8_kernel.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下个pr修改
from .configuration import DeepseekV2Config | ||
from .fp8_linear import Linear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可以直接import吗? 看起来是把Linear全替换了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,全替换了。这个运行时需要全替换。可以后续 加载DeepSeek 模型的时候才替换。
@@ -628,36 +635,43 @@ def __init__(self, config: DeepseekV2Config, hidden_size=None, intermediate_size | |||
self.hidden_size = config.hidden_size if hidden_size is None else hidden_size | |||
self.intermediate_size = config.intermediate_size if intermediate_size is None else intermediate_size | |||
|
|||
def linear_dtype_gaurd(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp8参数的加载已经在在from_pretrained接口适配了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,直接初始化成加载FP8参数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Before submitting
tests
folder. If there are codecov issues, please add tests cases first.PR types
New features
PR changes
Others
Description
Support float8 dtype storage.
FP8的模型有:
deepseek-ai/DeepSeek-V3-FP8
,deepseek-ai/DeepSeek-R1-FP8
For FP8,
For BFLOAT16