8000 [GPU] Update the cache info for padding resolver in cloneWithSimplifiedConfig. by hanhanW · Pull Request #20371 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[GPU] Update the cache info for padding resolver in cloneWithSimplifiedConfig. #20371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hanhanW
Copy link
Contributor
@hanhanW hanhanW commented Mar 25, 2025

The revision creates an empty GPUPadLayoutAttr in the first place, and materialize the cache config in the interface method. The cache config is only used in encoding specialization and materialization. We do not need such information, which is sort of duplicated config, in any other lowering. Thus, we do not need to hardcode the cache config in the first place. They can be resolved by the new utility (i.e., getL1CacheInfo) when we need it.

Copy link
Member
@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code motion makes sense in the current form. We shouldn't move arch-specific information outside of KnownTargets. What's the issue with keeping the code as-is?

We do not need such information, which is sort of duplicated config, in any other lowering.

gpu_pad_layout is the only place where this information is materialized, so I'm not sure where any duplication happens?

Comment on lines 413 to 420
// GPUPadLayoutAttr is only enabled for CDNA2 and CDNA3 for the time being.
// TODO(kuhar): Enable for other HIP targets.
if (!llvm::is_contained({"gfx90a", "gfx942"}, gpuTarget.getArch())) {
return IREE::Encoding::IdentityEncodingAttr::get(ctx);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should move this code here -- I'd rather have it in KnownTargets together with all the other arch-specific information. We could materialize it in the target chip attribute if necessary, but I don't think nothing else needs it

@hanhanW
Copy link
Contributor Author
hanhanW commented Mar 25, 2025

I don't think this code motion makes sense in the current form. We shouldn't move arch-specific information outside of KnownTargets. What's the issue with keeping the code as-is?

The arch-specific information is described in KnownTargets, and how we use the information can be moved to other places IMO. Like other data-tiling resolvers, we have the cpu_features on CPU side and available intrinsic on GPU side. They all look at target configuration to filter out needed information during specialization and materialization.

I honestly don't know how we figure the cache line bytes and cache sets as constants, and I think they are inferable from the GPUTargetAttr? Otherwise, they should be encoded in GPUTargetAttr and the GPUPadLayoutAttr can get the information from it.

(I'm reworking the MaterializeEncodingIntoPadding pass, and the end goal is to merge it into the generic pass. How we set up the attribute today is a little different from the data-tiling one, which motivates me to raise the PR. I can probably use getHIPTargetEncodingLayoutAttr method, though it exposes more GPU specific to the pass at the moment. We might be able to cut the dep in the future.)

@kuhar
Copy link
Member
kuhar commented Mar 25, 2025

I honestly don't know how we figure the cache line bytes and cache sets as constants, and I think they are inferable from the GPUTargetAttr? Otherwise, they should be encoded in GPUTargetAttr and the GPUPadLayoutAttr can get the information from it.

These depend solely on the gpu architecture and the value will be different across gfx versions. This information could go inside the GPUTargetAttr, but nothing else it the compiler needs it, so it seems excessive to me preserve it throughout the whole compilation pipeline.

@hanhanW
Copy link
Contributor Author
hanhanW commented Mar 25, 2025

This information could go inside the GPUTargetAttr, but nothing else it the compiler needs it, so it seems excessive to me preserve it throughout the whole compilation pipeline.

I see, it makes sense to me. FYI that I gave it a shot in my other PR. It ends up with making the pass depend on GPU/TargetUtils/KnownTargets.h. In other setup, we only depend on CPU attributes and GPU attributes.

@kuhar
Copy link
Member
kuhar commented Mar 26, 2025

It ends up with making the pass depend on GPU/TargetUtils/KnownTargets.h. In other setup, we only depend on CPU attributes and GPU attributes.

I think that's not to bad since we already depend on GPU/IR in this file? Alternatively, we can make a helper function that given a gpu target attr returns std::optional<L1CacheInfo> with the implementation in KnownTargets -- this way, the details would still stay in KnownTargets but not increase the IR size. WDYT?

@hanhanW hanhanW marked this pull request as draft March 28, 2025 16:35
@hanhanW
Copy link
Contributor Author
hanhanW commented Mar 28, 2025

Alternatively, we can make a helper function that given a gpu target attr returns std::optional with the implementation in KnownTargets -- this way, the details would still stay in KnownTargets but not increase the IR size. WDYT?

It sounds better to me. I think it is good to have a place to describe target details, and there are utilities that help us retrieve the information.

hanhanW added 2 commits March 31, 2025 15:22
The revision creates a generic GPUPadLayoutAttr in the first place, and
materialize the cache config in the interface method. The cache config
is only used in encoding specialization and materialization. We do not
need such information, which is sort of duplicated config, in any other
lowering. Thus, we do not need to hardcode the cache config in the first
place. They can be resolved when we need it.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Signed-off-by: hanhanW <hanhan0912@gmail.com>
@hanhanW hanhanW force-pushed the move-config-setup-to-interface-method branch from 0e9a36e to 1524ada Compare March 31, 2025 22:23
@hanhanW hanhanW changed the title [GPU] Move the padding cache config to cloneWithSimplifiedConfig. [GPU] Update the cache info for padding resolver in cloneWithSimplifiedConfig. Mar 31, 2025
@hanhanW hanhanW marked this pull request as ready for review March 31, 2025 22:25
@hanhanW hanhanW requested a review from kuhar March 31, 2025 22:25
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Copy link
Member
@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: hanhanW <hanhan0912@gmail.com>
@hanhanW hanhanW enabled auto-merge (squash) April 3, 2025 15:35
@hanhanW hanhanW merged commit b268762 into iree-org:main Apr 3, 2025
43 checks passed
@hanhanW hanhanW deleted the move-config-setup-to-interface-method branch April 3, 2025 16:10
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Apr 8, 2025
…edConfig. (iree-org#20371)

The revision creates an empty GPUPadLayoutAttr in the first place, and
materialize the cache config in the interface method. The cache config
is only used in encoding specialization and materialization. We do not
need such information, which is sort of duplicated config, in any other
lowering. Thus, we do not need to hardcode the cache config in the first
place. They can be resolved by the new utility (i.e., `getL1CacheInfo`)
when we need it.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0