-
Notifications
You must be signed in to change notification settings - Fork 701
Define a amdgpu.scaling_mfma
wrapper
#20616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
krzysz00
pushed a commit
to llvm/llvm-project
that referenced
this issue
May 2, 2025
Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Closed via llvm/llvm-project#137498 |
IanWood1
pushed a commit
to IanWood1/llvm-project
that referenced
this issue
May 6, 2025
Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
IanWood1
pushed a commit
to IanWood1/llvm-project
that referenced
this issue
May 6, 2025
Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
IanWood1
pushed a commit
to IanWood1/llvm-project
that referenced
this issue
May 6, 2025
Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this issue
May 6, 2025
…498) Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
6645
GeorgeARM
pushed a commit
to GeorgeARM/llvm-project
that referenced
this issue
May 7, 2025
Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Overview
We need a wrapper around the new scaled MFMAs that operate on fp4 (f4E2M1FN), fp6 (f6E2M3FN and f6E3M2FN) and fp8 (f8E4M3FN and f8E5M2) types using either M=N=16, K=128 or M=N=32, K=64 as their tile size.
These intrinsics follow the same pattern as other dense MFMAs (see the MI-300 manual, chapter 7 for all the details on that if you'd like) and (for if you're reading that material) only have a block size of 1.
Over in https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L800-L834, we already have the switch-case for going from tile size and element types to an intrinsic name and the
cbsz
andblgp
arguments. This is used around https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L937 to implement unscaled fp4/fp6/fp8 MFMAs that're 16x16x128 and 32x32x64 by setting the scale to a constant 0 (because that's what the compiler people decided to do to represent those operations)(Similarly, the "make my MFMA argument the type LLVM expects" handling is already set up for these scaled MFMAs, so you can reuse it in the lowering)
We want a wrapper around these MFMAs when they're used with a scale, so that we can start building up higher-level operations.
(Also, the ROCDL intrinsics should probably be fixed to make all their
immarg
s attributes, but that's not critical in this PR)Notable differences from
amdgpu.mfma
While we want to take some design cues from
amdgpu.mfma
,amdgpu.scaled_mfma
will need some differences.abid
,cbsz
, orblgp
attributes. For the scaled MFMA, CBSZ and BLGP are used for the type code, and ABID is used internally for scale/no scale. (CBSZ and BLGP are the type selectors for matrix A and B respectively). That is, these MFMAs don't support broadcastingi32
, which is, morally, a<4 x i8>
. There are separate scales for A and B, and they both come with anopsel
argument (which, at the MLIR level, has to be an attribute) that selects which byte of thei32
should be usedGoing with the MLIR-level preference for not lying about types, the scales should be passed in as
<4 x i8>
arguments with a selector for which byte gets used. We should probably include the ergonomic convenience of allowing ai8
scale and turning it into a<4 x i8>
on the user's behalf in case they've already loaded just one scaleThe text was updated successfully, but these errors were encountered: