8000 Fix kernel_dot_part2 for smaller BLOCKSIZE by pbauman · Pull Request #87 · ROCm/rocHPCG · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix kernel_dot_part2 for smaller BLOCKSIZE #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

pbauman
Copy link
Collaborator
@pbauman pbauman commented May 28, 2025

Builds on #86 and is the primary motivation from my end for moving to c++17, namely to be able to use if constexpr(). This is not the most general solution, but does fix the existing use-cases in the dot product code; I did not scour the other parts of the code for similar instances. Commit message lays out the issue in detail.

If we don't think this is enough impetus (together with rocPRIM deprecating c++14) to move to c++17 then I can cook up a different solution to this problem that only uses c++14.

Tested the fix on the offending setup, as well as existing (functioning) setups.

pbauman added 3 commits May 27, 2025 18:29
This kernel is called with both BLOCKSIZE 1024 and 256. For the
256 case, the first two if-statements are trivially true, but then
we access entries in LDS outside of the allocated LDS space. We
leverage the constexpr-if idiom in C++17 to determine at compile
time if we need the additional checks at larger BLOCKSIZE.
So hopefullly we don't make the same mistake again.
768C
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0