-
Notifications
You must be signed in to change notification settings - Fork 702
[GPU] Enable vector distribute on reduction operations by default #20751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Every kernel, huh? |
Apologies, I was just testing on the CI. |
(that's cool - in the future, please note that everyone watching the repository sees these and they are enshrined in the repo forever - write something :) |
47491d1
to
a7939a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
-- Set only reduction tile sizes on the parallel operation (the workgroup tile sizes are dominated by the reduction operation). -- The bitwidth now selects max(operands_bitwidth) for an operation, this is just to match warp distribution numerics.
Also, update the wgp tile sizes
going to try a rerun, but fyi this may have caused a regression:
|
Sure, if it persists, I'll revert. |
Looks like it's right on the edge - subsequent merges on main are failing: https://github.com/iree-org/iree/actions/runs/15007458464/job/42169770312 you may need to up the tolerance (or figure out why it regressed) |
The latest one is passing though: https://github.com/iree-org/iree/actions/runs/15007816465/job/42171017802 |
This PR also goes through: https://github.com/iree-org/iree/actions/runs/15007320793/job/42172666060 . The CI would be flaky. |
Oh! I missed it. Yes, it's right on edge. |
Actually looks like only VAE regressed. Would be good to triage that, but almost everything else got better. Maybe we should change the golden time for others as well |
yeah, if timings went down the lower so we don't backslide! |
-- Set only reduction tile sizes on the parallel operation (the
workgroup tile sizes are dominated by the reduction operation).
-- The bitwidth now selects max(operands_bitwidth) for an operation,
this is just to match warp distribution numerics.