[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970

bjacob · 2025-02-12T03:16:16Z

On ROCm, we want to use the device library for all math functions.

This expands on #19969, which only concerned math.erf.

We only leave one category of rewrites enabled: the operand casts to f32. The ROCm device library internally performs the same for many math functions, so dropping the casts on our side would not necessarily make any difference.

The effect on accuracy is reflected in the updates to the JSON files in this PR, controlling the tolerances in the e2e accuracy tests for math ops. The tolerances are tightened also for LLVM-CPU to provide an accurate basis for comparison. The resulting diff between the two JSON files shows how the ROCm device library functions are much more accurate than the MLIR polynomial approximation which we are using on LLVM-CPU:
https://gist.github.com/bjacob/98549902957e8171373ffceed5611411#file-a-diff

Note that the tolerances 0 on many f16 testcases on ROCm, effectively requesting exactness, are explained by the fact that since we upcast from f16 to f32, the loss of precision in the math approximation happens in f32 and is typically small enough to not result in a change of the final f16 value after rounding.

MaheshRavishankar · 2025-02-12T03:23:50Z

If CI passes that's a good indication for e2e correctness for now I think

MaheshRavishankar · 2025-02-12T04:37:40Z

Interesting. It has compilation failures

compiler/src/iree/compiler/Codegen/Common/test/math_transform.mlir

bjacob · 2025-02-21T21:44:28Z

For a moment I felt that I was very close to resolving this with llvm/llvm-project#128203. That PR does solve the CI issues observed here, except that I have to add all the remaining math ops to scalarization, which is not desirable.

If and when we revive this, we have good reasons at this point to handle the necessary vector-flattening downstream.

bjacob · 2025-02-27T21:02:20Z

Good news, this should be unblocked by llvm/llvm-project#128915. Retrying now.

MaheshRavishankar

This is fine, but we maybe need some end-to-end tests to make sure it compiles as a whole. Do we have unit tests for these already in tests/e2e

bjacob · 2025-03-06T02:43:01Z

This is fine, but we maybe need some end-to-end tests to make sure it compiles as a whole. Do we have unit tests for these already in tests/e2e

@MaheshRavishankar : we didn't have e2e tests for math ops outside of a few cases. Coming in #20169.

This is thought to be needed before going ahead with a batch of math ops codegen changes (#19970 (review)) and this also discovered a few bugs: #20163 #20164 #20165. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

qedawkins · 2025-03-11T18:49:54Z

compiler/src/iree/compiler/Codegen/Common/MathTransformPass.cpp

  if (clNativeMathPrecision) { // Legacy.
    if (name == math::Exp2Op::getOperationName() ||
        name == math::RoundEvenOp::getOperationName()) {
      return false;
    }
  }
+  if (isROCMBackend(target)) {


Ignore me if this discussion already happened, but isROCMBackend in a common pass looks like bad form to me. The better way is a pass option that is set by each backend.

What if we consider that this pass is likely in the future to look into much finer target details, such as specific architecture, available instructions, relative performance of instructions? Reading this from the target attribute is easy.

Also a more meta point: since this pass is a codegen pass anyway, meaning that it is always just a HALExecutableTargetAttr::lookup(op) call away from having a target atttribute to look at, what is the benefit of passing information through pass options?

A specific benefit of target attributes for testing is that, in tests, we can set that on a per-op basis, while pass options require the machinery of multi-run lit tests with different check prefixes.

I see two main reasons to avoid hard coding backends like this:

Managing dependencies (although looking at isROCMBackend, it's just checking a string)

Downstream users have no control

Because as you noted here the space of potential "preferences" when it comes to implementations of these functions is unbounded, we don't want to expose control over every potential pattern as a pass option. That will be a pain to maintain. Checking based on the target backend sounds ok, but hard coding a string for a particular backend we know normally prefers one set of implementations is not scalable. If we acknowledge this is a temporary state and something to be improved, I won't block (although others might).

soooo maybe add a TODO :P

I see. It did occur to me that checking for "rocm" backend is usually a smell that a magic string is used to avoid thinking about what we really want.

But in this case, the most important dimension is not an actual hardware trait. It is the availability of the ROCm device library. That makes "is the backend rocm" unusually close to the mark. WDYT?

Hmmm that is a good point, but there still seems to be a layering violation with a Common pass string matching backend names. The string matching is better done in LLVMGPU or better yet ROCMTargetBackend (as the target backend is what really guarantees the library availability by way of its serialization implementation). That is probably a change with larger scope than this PR though.

I thought about that too, but then I thought that was fine: that this pass is under Common/ doesn't have to mean that it doesn't look into backend specifics; it only means that it exploits an opportunity to share code across backends. While the lines of code inside the if (rocm) {... } are not currently shared across backends, the rest of the file is, and even those lines could be if the if condition changed in the future to something like if (hasDeviceLibrary(target)) { ... } ,

The main thing to be careful of is dependencies in that case. All Common passes are built together and linked by all codegen backends, so if we end up pulling in AMDGPUDialect or NVVMDialect to one of these passes that would pull it in everywhere. String matching is more benign in one sense, but not as strong a guarantee that it's actually correct as calling directly into the underlying TargetBackend impl. Either way something we can leave for later.

qedawkins

(Approve)

qedawkins · 2025-03-11T19:39:26Z

compiler/src/iree/compiler/Codegen/Common/MathTransformPass.cpp

  if (clNativeMathPrecision) { // Legacy.
    if (name == math::Exp2Op::getOperationName() ||
        name == math::RoundEvenOp::getOperationName()) {
      return false;
    }
  }
+  if (isROCMBackend(target)) {


The main thing to be careful of is dependencies in that case. All Common passes are built together and linked by all codegen backends, so if we end up pulling in AMDGPUDialect or NVVMDialect to one of these passes that would pull it in everywhere. String matching is more benign in one sense, but not as strong a guarantee that it's actually correct as calling directly into the underlying TargetBackend impl. Either way something we can leave for later.

…#19970)" This reverts commit 674d713.

#20215) Reverts #19970 due to failures on rdna3: https://github.com/iree-org/iree/actions/runs/13796654893/job/38590777349 ``` test_pow_types_int64_int64::model.mlir::model.mlir::gpu_rocm_rdna3 _ EXEC @test_pow_types_int64_int64 [FAILED] result[0]: element at index 1 (31) does not match the expected (32) expected: 3xi64=1 32 729 actual: 3xi64=1 31 729 ```

This is the re-landing iree-org#19970 which was rolled back due to a ONNX test failure which we want to accept as its root cause is a torch-mlir bug: llvm/torch-mlir#4091 This reverts commit 00e8873. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

This is the re-landing of #19970 which was rolled back due to a ONNX test failure which we want to accept as its root cause is a torch-mlir bug: llvm/torch-mlir#4091 This reverts commit 00e8873. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

bjacob requested review from lialan and MaheshRavishankar February 12, 2025 03:17

bjacob force-pushed the no-approx-at-all-on-rocm branch from 4ae6121 to 35e4441 Compare February 12, 2025 03:18

MaheshRavishankar approved these changes Feb 12, 2025

View reviewed changes

lialan reviewed Feb 13, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/test/math_transform.mlir Show resolved Hide resolved

bjacob force-pushed the no-approx-at-all-on-rocm branch from 35e4441 to 86c508b Compare February 20, 2025 15:25

bjacob force-pushed the no-approx-at-all-on-rocm branch from 86c508b to 2d14e5e Compare February 27, 2025 21:00

bjacob marked this pull request as ready for review February 27, 2025 21:05

bjacob requested a review from hanhanW as a code owner February 27, 2025 21:05

MaheshRavishankar approved these changes Feb 27, 2025

View reviewed changes

bjacob mentioned this pull request Mar 6, 2025

E2E tests for math ops. #20169

Merged

MaheshRavishankar approved these changes Mar 6, 2025

View reviewed changes

hanhanW approved these changes Mar 6, 2025

View reviewed changes

bjacob added 2 commits March 11, 2025 12:43

no-approx-on-rocm

1883cca

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

update-json

d820621

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

bjacob force-pushed the no-approx-at-all-on-rocm branch from 2d14e5e to d820621 Compare March 11, 2025 18:14

bjacob requested review from lialan, MaheshRavishankar and hanhanW March 11, 2025 18:30

qedawkins reviewed Mar 11, 2025

View reviewed changes

bjacob requested a review from qedawkins March 11, 2025 19:08

bjacob merged commit 674d713 into iree-org:main Mar 11, 2025
40 of 43 checks passed

qedawkins reviewed Mar 11, 2025

View reviewed changes

qedawkins mentioned this pull request Mar 11, 2025

[Codegen] Port AMDGPU device lib implementations to MLIR rewrites #20213

Open

qedawkins added a commit that referenced this pull request Mar 11, 2025

Revert "[AMDGPU] Do not rewrite or approximate math functions on ROCm (…

3e7dfb1

…#19970)" This reverts commit 674d713.

qedawkins mentioned this pull request Mar 11, 2025

Revert "[AMDGPU] Do not rewrite or approximate math functions on ROCm" #20215

Merged

bjacob mentioned this pull request Mar 12, 2025

[AMDGPU] Do not rewrite or approximate math functions on ROCm #20222

Merged

ScottTodd mentioned this pull request Mar 14, 2025

Release tracker - 3.3.0 #19960

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970

[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970

[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!