8000 [NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs by krzysz00 · Pull Request #20561 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs #20561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 16, 2025

Conversation

krzysz00
Copy link
Contributor

Currently, all the MFMA intrinsics had the same two accumulaor layouts copy-pasted (depending on whether they had a 16x16 or 32x32 output). These have been factored out into constants, which'll make the code cleaer and allow more MFMAs to not need to copy paste.

Similarly, the layout for the left- and right-hand side inputs to an MFMA (at least for the non-blocked ones, which are all we use) follows a simple pattern that's a function of the K dimension and whether the not-reduction dimension is 16 or 32. These patterns have been factored out into small lambdas to make things clearer and prevent excessive copy-paste.

Currently, all the MFMA intrinsics had the same two accumulaor layouts
copy-pasted (depending on whether they had a 16x16 or 32x32 output).
These have been factored out into constants, which'll make the code
cleaer and allow more MFMAs to not need to copy paste.

Similarly, the layout for the left- and right-hand side inputs to an
MFMA (at least for the non-blocked ones, which are all we use) follows
a simple pattern that's a function of the K dimension and whether the
not-reduction dimension is 16 or 32. These patterns have been factored
out into small lambdas to make things clearer and prevent excessive
copy-paste.
@krzysz00 krzysz00 requested a review from bjacob April 16, 2025 18:32
/*element=*/{1, k / 4}};
};
auto mfmaRhsKx16 = [](int64_t k) -> MMASingleSubgroupLayout {
assert(k % 4 == 0 && "doesn't support blockef MFMAs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10000

typo: blockef

/*element=*/{1, k / 2}};
};
auto mfmaRhsKx32 = [](int64_t k) -> MMASingleSubgroupLayout {
assert(k % 2 == 0 && "doesn't support blockef MFMAs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -162,83 +162,96 @@ getUnsupportedMNKShape(MMAIntrinsic intrinsic) {

MMASingleSubgroupLayout getSingleSubgroupLayout(MMAIntrinsic intrinsic,
MMAFragment fragment) {
auto mfmaLhs16xK = [](int64_t k) -> MMASingleSubgroupLayout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be useful if this didn't have a hidden 4 in the implementation that the name mfmaLhs16xk doesn't convey?

How about mfmaLhs16x4xE(int e) where the caller passes for e what is currently k / 4 ? That would also remove the need for the assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert is there because there are some wonderful old intrinsics like mfma_f32_32x32x1_f32 which perform batched matmul over multiple "blocks"

I figured that just using the K dimension would make it easiest to spot-check that the value is passed is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that there's a division by 4 is an implementation detail - logically, you're always passing K elements to the MFMA, and the layout tells you how they're distributed

@bjacob bjacob self-requested a review April 16, 2025 19:51
@krzysz00 krzysz00 enabled auto-merge (squash) April 16, 2025 20:18
@krzysz00 krzysz00 merged commit 04359f2 into main Apr 16, 2025
43 of 44 checks passed
@krzysz00 krzysz00 deleted the users/krzysz00/refactor-mma-size-getter branch April 16, 2025 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0