-
Notifications
You must be signed in to change notification settings - Fork 703
[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ae6121
to
35e4441
Compare
If CI passes that's a good indication for e2e correctness for now I think |
Interesting. It has compilation failures |
35e4441
to
86c508b
Compare
For a moment I felt that I was very close to resolving this with llvm/llvm-project#128203. That PR does solve the CI issues observed here, except that I have to add all the remaining math ops to scalarization, which is not desirable. If and when we revive this, we have good reasons at this point to handle the necessary vector-flattening downstream. |
86c508b
to
2d14e5e
Compare
Good news, this should be unblocked by llvm/llvm-project#128915. Retrying now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but we maybe need some end-to-end tests to make sure it compiles as a whole. Do we have unit tests for these already in tests/e2e
@MaheshRavishankar : we didn't have e2e tests for math ops outside of a few cases. Coming in #20169. |
This is thought to be needed before going ahead with a batch of math ops codegen changes (#19970 (review)) and this also discovered a few bugs: #20163 #20164 #20165. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
2d14e5e
to
d820621
Compare
if (clNativeMathPrecision) { // Legacy. | ||
if (name == math::Exp2Op::getOperationName() || | ||
name == math::RoundEvenOp::getOperationName()) { | ||
return false; | ||
} | ||
} | ||
if (isROCMBackend(target)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore me if this discussion already happened, but isROCMBackend
in a common pass looks like bad form to me. The better way is a pass option that is set by each backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we consider that this pass is likely in the future to look into much finer target details, such as specific architecture, available instructions, relative performance of instructions? Reading this from the target attribute is easy.
Also a more meta point: since this pass is a codegen pass anyway, meaning that it is always just a HALExecutableTargetAttr::lookup(op)
call away from having a target atttribute to look at, what is the benefit of passing information through pass options?
A specific benefit of target attributes for testing is that, in tests, we can set that on a per-op basis, while pass options require the machinery of multi-run lit tests with different check prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two main reasons to avoid hard coding backends like this:
- Managing dependencies (although looking at
isROCMBackend
, it's just checking a string) - Downstream users have no control
Because as you noted here the space of potential "preferences" when it comes to implementations of these functions is unbounded, we don't want to expose control over every potential pattern as a pass option. That will be a pain to maintain. Checking based on the target backend sounds ok, but hard coding a string for a particular backend we know normally prefers one set of implementations is not scalable. If we acknowledge this is a temporary state and something to be improved, I won't block (although others might).
soooo maybe add a TODO :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It did occur to me that checking for "rocm" backend is usually a smell that a magic string is used to avoid thinking about what we really want.
But in this case, the most important dimension is not an actual hardware trait. It is the availability of the ROCm device library. That makes "is the backend rocm" unusually close to the mark. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm that is a good point, but there still seems to be a layering violation with a Common pass string matching backend names. The string matching is better done in LLVMGPU or better yet ROCMTargetBackend (as the target backend is what really guarantees the library availability by way of its serialization implementation). That is probably a change with larger scope than this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that too, but then I thought that was fine: that this pass is under Common/ doesn't have to mean that it doesn't look into backend specifics; it only means that it exploits an opportunity to share code across backends. While the lines of code inside the if (rocm) {... }
are not currently shared across backends, the rest of the file is, and even those lines could be if the if
condition changed in the future to something like if (hasDeviceLibrary(target)) { ... }
,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing to be careful of is dependencies in that case. All Common passes are built together and linked by all codegen backends, so if we end up pulling in AMDGPUDialect or NVVMDialect to one of these passes that would pull it in everywhere. String matching is more benign in one sense, but not as strong a guarantee that it's actually correct as calling directly into the underlying TargetBackend impl. Either way something we can leave for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Approve)
if (clNativeMathPrecision) { // Legacy. | ||
if (name == math::Exp2Op::getOperationName() || | ||
name == math::RoundEvenOp::getOperationName()) { | ||
return false; | ||
} | ||
} | ||
if (isROCMBackend(target)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing to be careful of is dependencies in that case. All Common passes are built together and linked by all codegen backends, so if we end up pulling in AMDGPUDialect or NVVMDialect to one of these passes that would pull it in everywhere. String matching is more benign in one sense, but not as strong a guarantee that it's actually correct as calling directly into the underlying TargetBackend impl. Either way something we can leave for later.
#20215) Reverts #19970 due to failures on rdna3: https://github.com/iree-org/iree/actions/runs/13796654893/job/38590777349 ``` test_pow_types_int64_int64::model.mlir::model.mlir::gpu_rocm_rdna3 _ EXEC @test_pow_types_int64_int64 [FAILED] result[0]: element at index 1 (31) does not match the expected (32) expected: 3xi64=1 32 729 actual: 3xi64=1 31 729 ```
This is the re-landing iree-org#19970 which was rolled back due to a ONNX test failure which we want to accept as its root cause is a torch-mlir bug: llvm/torch-mlir#4091 This reverts commit 00e8873. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
This is the re-landing of #19970 which was rolled back due to a ONNX test failure which we want to accept as its root cause is a torch-mlir bug: llvm/torch-mlir#4091 This reverts commit 00e8873. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
On ROCm, we want to use the device library for all math functions.
This expands on #19969, which only concerned
math.erf
.We only leave one category of rewrites enabled: the operand casts to f32. The ROCm device library internally performs the same for many math functions, so dropping the casts on our side would not necessarily make any difference.
The effect on accuracy is reflected in the updates to the JSON files in this PR, controlling the tolerances in the e2e accuracy tests for math ops. The tolerances are tightened also for LLVM-CPU to provide an accurate basis for comparison. The resulting diff between the two JSON files shows how the ROCm device library functions are much more accurate than the MLIR polynomial approximation which we are using on LLVM-CPU:
https://gist.github.com/bjacob/98549902957e8171373ffceed5611411#file-a-diff
Note that the tolerances 0 on many f16 testcases on ROCm, effectively requesting exactness, are explained by the fact that since we upcast from f16 to f32, the loss of precision in the math approximation happens in f32 and is typically small enough to not result in a change of the final f16 value after rounding.