-
Notifications
You must be signed in to change notification settings - Fork 701
Stop forcing approximation of math.erf
with --iree-codegen-gpu-native-math-precision
#20074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I benchmarked this and confirm it helps with perf
Summary of debugging of the CI failures so far: The difference in IR without/with this PR is exactly as expected: it causes us to use the The implementation of There is however one major difference:
So I actually think that the code with this PR is actually more accurate. It is a little surprising that the numerical different e2e is so large, but we need assessment of the actual accuracy metric, not those direct output-activations comparisons. |
I verified the numerics with this patch by comparing the image we generate from SDXL and it looks good. We need to update the golden values for SDXL in CI with this update. |
49c67fc
to
7fe42a5
Compare
This was mostly used on ROCm and WebGPU. On ROCm, over the past month we have made this be essentially the default behavior, so the flag is essentially not needed anymore. The last thing required before we can drop usage of the flag on ROCm is #20074. On WebGPU, this PR makes this be the default behavior like on ROCm, so this PR also drops the flag. The WGSL spec issue gpuweb/gpuweb#5109 mentioned in the comment explains the problem. It has also been discussed in the past as #11321. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
This was mostly used on ROCm and WebGPU. On ROCm, over the past month we have made this be essentially the default behavior, so the flag is essentially not needed anymore. The last thing required before we can drop usage of the flag on ROCm is #20074. On WebGPU, this PR makes this be the default behavior like on ROCm, so this PR also drops the flag. The WGSL spec issue gpuweb/gpuweb#5109 mentioned in the comment explains the problem. It has also been discussed in the past as #11321. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com> Signed-off-by: Elias Joseph <eljoseph@amd.com>
b2e9782
to
2612ff9
Compare
@nithinsubbiah despite seeing some numerical deviations, we still are producing good enough CLIP/FID scores with this patch -- any reason we don't update the golden output to what we produce with this patch? |
(Edited): Looks like artifacts are still disjointed. I'll reproduce locally and update the golden output. |
@nithinsubbiah , @monorimet , The MI250 error is a known flake: #20358. |
3568f24
to
825600e
Compare
when --iree-codegen-gpu-native-math-precision is passed. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com> Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
…t_unet.py Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
825600e
to
efb1462
Compare
Thank you @nithinsubbiah !! |
The implementation of the
--iree-codegen-gpu-native-math-precision
had a longstanding bug where both branches (for that option being true or false) were performing approximation of themath.erf
function.Recent refactorings preserved that behavior bug-for-bug.
Unfortunately, that meant that users passing
--iree-codegen-gpu-native-math-precision
were not getting fastermath.erf
, even on ROCm where we enabled the native call by default.