Define a `amdgpu.scaling_mfma` wrapper #20616

krzysz00 · 2025-04-23T17:54:40Z

Overview

We need a wrapper around the new scaled MFMAs that operate on fp4 (f4E2M1FN), fp6 (f6E2M3FN and f6E3M2FN) and fp8 (f8E4M3FN and f8E5M2) types using either M=N=16, K=128 or M=N=32, K=64 as their tile size.

These intrinsics follow the same pattern as other dense MFMAs (see the MI-300 manual, chapter 7 for all the details on that if you'd like) and (for if you're reading that material) only have a block size of 1.

Over in https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L800-L834, we already have the switch-case for going from tile size and element types to an intrinsic name and the cbsz and blgp arguments. This is used around https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L937 to implement unscaled fp4/fp6/fp8 MFMAs that're 16x16x128 and 32x32x64 by setting the scale to a constant 0 (because that's what the compiler people decided to do to represent those operations)

(Similarly, the "make my MFMA argument the type LLVM expects" handling is already set up for these scaled MFMAs, so you can reuse it in the lowering)

We want a wrapper around these MFMAs when they're used with a scale, so that we can start building up higher-level operations.

(Also, the ROCDL intrinsics should probably be fixed to make all their immargs attributes, but that's not critical in this PR)

Notable differences from `amdgpu.mfma`

While we want to take some design cues from amdgpu.mfma, amdgpu.scaled_mfma will need some differences.

There's no need to have the abid, cbsz, or blgp attributes. For the scaled MFMA, CBSZ and BLGP are used for the type code, and ABID is used internally for scale/no scale. (CBSZ and BLGP are the type selectors for matrix A and B respectively). That is, these MFMAs don't support broadcasting
All the scaled MFMAs have block/batch/B == 1. As best I can tell, the MFMA operators are moving away from that design space, so we can just omit that attribute. (blocks is for stuff like the 32x32x1 MFMA that does multiple multiplications in one op)
These scaled MFMAs take a scale. At the LLVM level, the scale argument is an i32, which is, morally, a <4 x i8>. There are separate scales for A and B, and they both come with an opsel argument (which, at the MLIR level, has to be an attribute) that selects which byte of the i32 should be used

Going with the MLIR-level preference for not lying about types, the scales should be passed in as <4 x i8> arguments with a selector for which byte gets used. We should probably include the ergonomic convenience of allowing a i8 scale and turning it into a <4 x i8> on the user's behalf in case they've already loaded just one scale

The text was updated successfully, but these errors were encountered:

Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

krzysz00 · 2025-05-02T23:07:41Z

Closed via llvm/llvm-project#137498

Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

…498) Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Create a wrapper around the new scaled MFMAs that operate on specific element types and tile sizes. See [Issue](iree-org/iree#20616). --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

krzysz00 assigned Muzammiluddin-Syed-ECE Apr 23, 2025

Muzammiluddin-Syed-ECE mentioned this issue Apr 27, 2025

[mlir][amdgpu] Define an amdgpu.scaling_mfma wrapper llvm/llvm-project#137498

Merged

krzysz00 closed this as completed May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define a `amdgpu.scaling_mfma` wrapper #20616

Define a `amdgpu.scaling_mfma` wrapper #20616

Uh oh!

Define a amdgpu.scaling_mfma wrapper #20616

Define a amdgpu.scaling_mfma wrapper #20616

Comments

Overview

Notable differences from amdgpu.mfma

Uh oh!

Define a `amdgpu.scaling_mfma` wrapper #20616

Define a `amdgpu.scaling_mfma` wrapper #20616

Notable differences from `amdgpu.mfma`