Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074

bjacob · 2025-02-24T16:54:38Z

The implementation of the --iree-codegen-gpu-native-math-precision had a longstanding bug where both branches (for that option being true or false) were performing approximation of the math.erf function.

Recent refactorings preserved that behavior bug-for-bug.

Unfortunately, that meant that users passing --iree-codegen-gpu-native-math-precision were not getting faster math.erf, even on ROCm where we enabled the native call by default.

kuhar

I benchmarked this and confirm it helps with perf

bjacob · 2025-02-24T20:08:17Z

Summary of debugging of the CI failures so far:

The difference in IR without/with this PR is exactly as expected: it causes us to use the __ocml_erf_f16 function instead of the polynomial approximation. https://gist.github.com/bjacob/247e46bc587f2c5e089fe67d8897fb49

The implementation of __ocml_erf_f16 is also embedded in the above IR diff. It uses a different approximation, but that is just as accurate. Both are accurate to < 1e-7 so shouldn't explain a large numerical difference like we are seeing here.

There is however one major difference:

Without this PR, we are performing the approximation in f16.
With this PR, the __ocml_erf_f16 function which we are calling is upcasting to f32 and performing the approximation in f32 and casting the result down to f16. It's all in the above IR diff.

So I actually think that the code with this PR is actually more accurate.

It is a little surprising that the numerical different e2e is so large, but we need assessment of the actual accuracy metric, not those direct output-activations comparisons.

nithinsubbiah · 2025-02-24T20:25:56Z

I verified the numerics with this patch by comparing the image we generate from SDXL and it looks good. We need to update the golden values for SDXL in CI with this update.

This was mostly used on ROCm and WebGPU. On ROCm, over the past month we have made this be essentially the default behavior, so the flag is essentially not needed anymore. The last thing required before we can drop usage of the flag on ROCm is #20074. On WebGPU, this PR makes this be the default behavior like on ROCm, so this PR also drops the flag. The WGSL spec issue gpuweb/gpuweb#5109 mentioned in the comment explains the problem. It has also been discussed in the past as #11321. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

This was mostly used on ROCm and WebGPU. On ROCm, over the past month we have made this be essentially the default behavior, so the flag is essentially not needed anymore. The last thing required before we can drop usage of the flag on ROCm is #20074. On WebGPU, this PR makes this be the default behavior like on ROCm, so this PR also drops the flag. The WGSL spec issue gpuweb/gpuweb#5109 mentioned in the comment explains the problem. It has also been discussed in the past as #11321. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com> Signed-off-by: Elias Joseph <eljoseph@amd.com>

monorimet · 2025-04-01T20:52:02Z

@nithinsubbiah despite seeing some numerical deviations, we still are producing good enough CLIP/FID scores with this patch -- any reason we don't update the golden output to what we produce with this patch?

monorimet · 2025-04-02T16:21:42Z

(Edited): Looks like artifacts are still disjointed. I'll reproduce locally and update the golden output.

bjacob · 2025-04-09T15:51:23Z

@nithinsubbiah , @monorimet , The MI250 error is a known flake: #20358.

when --iree-codegen-gpu-native-math-precision is passed. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com> Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

…t_unet.py Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

bjacob · 2025-04-10T18:49:43Z

Thank you @nithinsubbiah !!

…ent removal. (#20523) #20074 removed the last significant semantic impact of this flag on ROCm, making it effectively a NOP on ROCm. This PR makes it a NOP everywhere and warns of imminent removal. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

bjacob marked this pull request as ready for review February 24, 2025 16:57

bjacob requested a review from hanhanW as a code owner February 24, 2025 16:57

bjacob requested review from kuhar and Groverkss February 24, 2025 16:57

Groverkss approved these changes Feb 24, 2025

View reviewed changes

kuhar approved these changes Feb 24, 2025

View reviewed changes

This was referenced Feb 25, 2025

New flag: iree-codegen-math-transform-tweaks #20042

Closed

[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970

Merged

nithinsubbiah requested review from benvanik and stellaraccident as code owners March 12, 2025 19:35

nithinsubbiah force-pushed the drop-erf-bug-for-bug branch from 49c67fc to 7fe42a5 Compare March 13, 2025 03:54

bjacob mentioned this pull request Mar 18, 2025

Deprecate --iree-codegen-gpu-native-math-precision #20299

Merged

nithinsubbiah force-pushed the drop-erf-bug-for-bug branch from b2e9782 to 2612ff9 Compare March 26, 2025 17:27

nithinsubbiah force-pushed the drop-erf-bug-for-bug branch 2 times, most recently from 3568f24 to 825600e Compare April 10, 2025 03:28

bjacob and others added 6 commits April 10, 2025 03:29

Drop a bug-for-bug forcing of approx for math.erf

f7798f1

when --iree-codegen-gpu-native-math-precision is passed. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com> Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

Update inputs in SDXL regression tests

4a7f60a

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

Update I/O values from shortfin

9494024

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

Update int8 fp16 punet output.

473c4ea

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

Update golden values in the new test suite

752e1ec

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

Delete experimental/regression_suite/shark-test-suite-models/sdxl/tes…

efb1462

…t_unet.py Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>

nithinsubbiah force-pushed the drop-erf-bug-for-bug branch from 825600e to efb1462 Compare April 10, 2025 03:29

nithinsubbiah enabled auto-merge (squash) April 10, 2025 03:58

nithinsubbiah merged commit e72c3bf into iree-org:main Apr 10, 2025
72 of 78 checks passed

bjacob mentioned this pull request Apr 10, 2025

Make iree-codegen-gpu-native-math-precision a NOP and warn of imminent removal. #20523

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074

Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Stop forcing approximation of math.erf with --iree-codegen-gpu-native-math-precision #20074

Stop forcing approximation of math.erf with --iree-codegen-gpu-native-math-precision #20074

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074

Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074