[NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs #20561

krzysz00 · 2025-04-16T18:13:01Z

Currently, all the MFMA intrinsics had the same two accumulaor layouts copy-pasted (depending on whether they had a 16x16 or 32x32 output). These have been factored out into constants, which'll make the code cleaer and allow more MFMAs to not need to copy paste.

Similarly, the layout for the left- and right-hand side inputs to an MFMA (at least for the non-blocked ones, which are all we use) follows a simple pattern that's a function of the K dimension and whether the not-reduction dimension is 16 or 32. These patterns have been factored out into small lambdas to make things clearer and prevent excessive copy-paste.

Currently, all the MFMA intrinsics had the same two accumulaor layouts copy-pasted (depending on whether they had a 16x16 or 32x32 output). These have been factored out into constants, which'll make the code cleaer and allow more MFMAs to not need to copy paste. Similarly, the layout for the left- and right-hand side inputs to an MFMA (at least for the non-blocked ones, which are all we use) follows a simple pattern that's a function of the K dimension and whether the not-reduction dimension is 16 or 32. These patterns have been factored out into small lambdas to make things clearer and prevent excessive copy-paste.

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

bjacob · 2025-04-16T18:40:04Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

+            /*element=*/{1, k / 4}};
+  };
+  auto mfmaRhsKx16 = [](int64_t k) -> MMASingleSubgroupLayout {
+    assert(k % 4 == 0 && "doesn't support blockef MFMAs");


typo: blockef

bjacob · 2025-04-16T18:40:12Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

+            /*element=*/{1, k / 2}};
+  };
+  auto mfmaRhsKx32 = [](int64_t k) -> MMASingleSubgroupLayout {
+    assert(k % 2 == 0 && "doesn't support blockef MFMAs");


bjacob · 2025-04-16T18:43:47Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

@@ -162,83 +162,96 @@ getUnsupportedMNKShape(MMAIntrinsic intrinsic) {

 MMASingleSubgroupLayout getSingleSubgroupLayout(MMAIntrinsic intrinsic,
                                                MMAFragment fragment) {
+  auto mfmaLhs16xK = [](int64_t k) -> MMASingleSubgroupLayout {


Wouldn't it be useful if this didn't have a hidden 4 in the implementation that the name mfmaLhs16xk doesn't convey?

How about mfmaLhs16x4xE(int e) where the caller passes for e what is currently k / 4 ? That would also remove the need for the assert.

The assert is there because there are some wonderful old intrinsics like mfma_f32_32x32x1_f32 which perform batched matmul over multiple "blocks"

I figured that just using the K dimension would make it easiest to spot-check that the value is passed is correct

The fact that there's a division by 4 is an implementation detail - logically, you're always passing K elements to the MFMA, and the layout tells you how they're distributed

krzysz00 requested review from bjacob, antiagainst and qedawkins April 16, 2025 18:13

bjacob reviewed Apr 16, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp Outdated Show resolved Hide resolved

Mised a spot

40d66b6

krzysz00 requested a review from bjacob April 16, 2025 18:32

bjacob reviewed Apr 16, 2025

View reviewed changes

bjacob self-requested a review April 16, 2025 19:51

bjacob approved these changes Apr 16, 2025

View reviewed changes

Fix typos

fe418c7

krzysz00 enabled auto-merge (squash) April 16, 2025 20:18

krzysz00 merged commit 04359f2 into main Apr 16, 2025
43 of 44 checks passed

krzysz00 deleted the users/krzysz00/refactor-mma-size-getter branch April 16, 2025 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs #20561

[NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs #20561

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs #20561

[NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs #20561

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!