8000 Define a `amdgpu.scaling_mfma` wrapper · Issue #20616 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Define a amdgpu.scaling_mfma wrapper #20616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krzysz00 opened this issue Apr 23, 2025 · 1 comment
Closed

Define a amdgpu.scaling_mfma wrapper #20616

krzysz00 opened this issue Apr 23, 2025 · 1 comment
Assignees

Comments

@krzysz00
Copy link
Contributor

Overview

We need a wrapper around the new scaled MFMAs that operate on fp4 (f4E2M1FN), fp6 (f6E2M3FN and f6E3M2FN) and fp8 (f8E4M3FN and f8E5M2) types using either M=N=16, K=128 or M=N=32, K=64 as their tile size.

These intrinsics follow the same pattern as other dense MFMAs (see the MI-300 manual, chapter 7 for all the details on that if you'd like) and (for if you're reading that material) only have a block size of 1.

Over in https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L800-L834, we already have the switch-case for going from tile size and element types to an intrinsic name and the cbsz and blgp arguments. This is used around https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L937 to implement unscaled fp4/fp6/fp8 MFMAs that're 16x16x128 and 32x32x64 by setting the scale to a constant 0 (because that's what the compiler people decided to do to represent those operations)

(Similarly, the "make my MFMA argument the type LLVM expects" handling is already set up for these scaled MFMAs, so you can reuse it in the lowering)

We want a wrapper around these MFMAs when they're used with a scale, so that we can start building up higher-level operations.

(Also, the ROCDL intrinsics should probably be fixed to make all their immargs attributes, but that's not critical in this PR)

Notable differences from amdgpu.mfma

While we want to take some design cues from amdgpu.mfma, amdgpu.scaled_mfma will need some differences.

  1. There's no need to have the abid, cbsz, or blgp attributes. For the scaled MFMA, CBSZ and BLGP are used for the type code, and ABID is used internally for scale/no scale. (CBSZ and BLGP are the type selectors for matrix A and B respectively). That is, these MFMAs don't support broadcasting
  2. All the scaled MFMAs have block/batch/B == 1. As best I can tell, the MFMA operators are moving away from that design space, so we can just omit that attribute. (blocks is for stuff like the 32x32x1 MFMA that does multiple multiplications in one op)
  3. These scaled MFMAs take a scale. At the LLVM level, the scale argument is an i32, which is, morally, a <4 x i8>. There are separate scales for A and B, and they both come with an opsel argument (which, at the MLIR level, has to be an attribute) that selects which byte of the i32 should be used

Going with the MLIR-level preference for not lying about types, the scales should be passed in as <4 x i8> arguments with a selector for which byte gets used. We should probably include the ergonomic convenience of allowing a i8 scale and turning it into a <4 x i8> on the user's behalf in case they've already loaded just one scale

krzysz00 pushed a commit to llvm/llvm-project that referenced this issue May 2, 2025
Create a wrapper around the new scaled MFMAs that operate on specific
element types and tile sizes.

See [Issue](iree-org/iree#20616).

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
@krzysz00
Copy link
Contributor Author
krzysz00 commented May 2, 2025

Closed via llvm/llvm-project#137498

@krzysz00 krzysz00 closed this as completed May 2, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this issue May 6, 2025
Create a wrapper around the new scaled MFMAs that operate on specific
element types and tile sizes.

See [Issue](iree-org/iree#20616).

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this issue May 6, 2025
Create a wrapper around the new scaled MFMAs that operate on specific
element types and tile sizes.

See [Issue](iree-org/iree#20616).

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this issue May 6, 2025
Create a wrapper around the new scaled MFMAs that operate on specific
element types and tile sizes.

See [Issue](iree-org/iree#20616).

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue May 6, 2025
…498)

Create a wrapper around the new scaled MFMAs that operate on specific
element types and tile sizes.

See [Issue](iree-org/iree#20616).

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
6645
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this issue May 7, 2025
Create a wrapper around the new scaled MFMAs that operate on specific
element types and tile sizes.

See [Issue](iree-org/iree#20616).

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0