8000 Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` by bjacob · Pull Request #20074 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Stop forcing approximation of math.erf with --iree-codegen-gpu-native-math-precision #20074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 10, 2025

Conversation

bjacob
Copy link
Contributor
@bjacob bjacob commented Feb 24, 2025

The implementation of the --iree-codegen-gpu-native-math-precision had a longstanding bug where both branches (for that option being true or false) were performing approximation of the math.erf function.

Recent refactorings preserved that behavior bug-for-bug.

Unfortunately, that meant that users passing --iree-codegen-gpu-native-math-precision were not getting faster math.erf, even on ROCm where we enabled the native call by default.

@bjacob bjacob marked this pull request as ready for review February 24, 2025 16:57
@bjacob bjacob requested a review from hanhanW as a code owner February 24, 2025 16:57
@bjacob bjacob requested review from kuhar and Groverkss February 24, 2025 16:57
Copy link
Member
@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked this and confirm it helps with perf

@bjacob
Copy link
Contributor Author
bjacob commented Feb 24, 2025

Summary of debugging of the CI failures so far:

The difference in IR without/with this PR is exactly as expected: it causes us to use the __ocml_erf_f16 function instead of the polynomial approximation. https://gist.github.com/bjacob/247e46bc587f2c5e089fe67d8897fb49

The implementation of __ocml_erf_f16 is also embedded in the above IR diff. It uses a different approximation, but that is just as accurate. Both are accurate to < 1e-7 so shouldn't explain a large numerical difference like we are seeing here.

There is however one major difference:

  • Without this PR, we are performing the approximation in f16.
  • With this PR, the __ocml_erf_f16 function which we are calling is upcasting to f32 and performing the approximation in f32 and casting the result down to f16. It's all in the above IR diff.

So I actually think that the code with this PR is actually more accurate.

It is a little surprising that the numerical different e2e is so large, but we need assessment of the actual accuracy metric, not those direct output-activations comparisons.

@nithinsubbiah
Copy link
Collaborator

I verified the numerics with this patch by comparing the image we generate from SDXL and it looks good. We need to update the golden values for SDXL in CI with this update.

bjacob added a commit that referenced this pull request Mar 18, 2025
This was mostly used on ROCm and WebGPU.

On ROCm, over the past month we have made this be essentially the
default behavior, so the flag is essentially not needed anymore. The
last thing required before we can drop usage of the flag on ROCm is
#20074.

On WebGPU, this PR makes this be the default behavior like on ROCm, so
this PR also drops the flag. The WGSL spec issue
gpuweb/gpuweb#5109 mentioned in the comment
explains the problem. It has also been discussed in the past as
#11321.

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Eliasj42 pushed a commit that referenced this pull request Mar 24, 2025
This was mostly used on ROCm and WebGPU.

On ROCm, over the past month we have made this be essentially the
default behavior, so the flag is essentially not needed anymore. The
last thing required before we can drop usage of the flag on ROCm is
#20074.

On WebGPU, this PR makes this be the default behavior like on ROCm, so
this PR also drops the flag. The WGSL spec issue
gpuweb/gpuweb#5109 mentioned in the comment
explains the problem. It has also been discussed in the past as
#11321.

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
@monorimet
Copy link
Collaborator
monorimet commented Apr 1, 2025

@nithinsubbiah despite seeing some numerical deviations, we still are producing good enough CLIP/FID scores with this patch -- any reason we don't update the golden output to what we produce with this patch?

@monorimet
Copy link
Collaborator
monorimet commented Apr 2, 2025

(Edited): Looks like artifacts are still disjointed. I'll reproduce locally and update the golden output.

@bjacob
Copy link
Contributor Author
bjacob commented Apr 9, 2025

@nithinsubbiah , @monorimet , The MI250 error is a known flake: #20358.

@nithinsubbiah nithinsubbiah force-pushed the drop-erf-bug-for-bug branch 2 times, most recently from 3568f24 to 825600e Compare April 10, 2025 03:28
bjacob and others added 6 commits April 10, 2025 03:29
when --iree-codegen-gpu-native-math-precision is
passed.

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
…t_unet.py

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
@nithinsubbiah nithinsubbiah enabled auto-merge (squash) April 10, 2025 03:58
@nithinsubbiah nithinsubbiah merged commit e72c3bf into iree-org:main Apr 10, 2025
72 of 78 checks passed
@bjacob
Copy link
Contributor Author
bjacob commented Apr 10, 2025

Thank you @nithinsubbiah !!

bjacob added a commit that referenced this pull request Apr 11, 2025
…ent removal. (#20523)

#20074 removed the last significant
semantic impact of this flag on ROCm, making it effectively a NOP on
ROCm. This PR makes it a NOP everywhere and warns of imminent removal.

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0