[GPU] Enable vector distribute on reduction operations by default #20751

pashu123 · 2025-05-07T23:55:26Z

-- Set only reduction tile sizes on the parallel operation (the
workgroup tile sizes are dominated by the reduction operation).
-- The bitwidth now selects max(operands_bitwidth) for an operation,
this is just to match warp distribution numerics.

benvanik · 2025-05-08T00:48:20Z

Every kernel, huh?
(Please provide useful descriptions on PRs ;)

pashu123 · 2025-05-08T01:35:29Z

Every kernel, huh? (Please provide useful descriptions on PRs ;)

Apologies, I was just testing on the CI.

benvanik · 2025-05-08T02:04:55Z

(that's cool - in the future, please note that everyone watching the repository sees these and they are enshrined in the repo forever - write something :)

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

Groverkss

LGTM

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

-- Set only reduction tile sizes on the parallel operation (the workgroup tile sizes are dominated by the reduction operation). -- The bitwidth now selects max(operands_bitwidth) for an operation, this is just to match warp distribution numerics.

Also, update the wgp tile sizes

benvanik · 2025-05-13T22:28:40Z

going to try a rerun, but fyi this may have caused a regression:
https://github.com/iree-org/iree/actions/runs/15007320793/job/42169379980

ERROR iree-test-suites/sharktank_models/benchmarks/model_benchmark_run.py::sdxl :: vae_rocm - check 69.94073186302559 <= 68.2: sdxl vae benchmark time should not regress more than a factor of 1.1

pashu123 · 2025-05-13T23:07:24Z

going to try a rerun, but fyi this may have caused a regression: https://github.com/iree-org/iree/actions/runs/15007320793/job/42169379980
ERROR iree-test-suites/sharktank_models/benchmarks/model_benchmark_run.py::sdxl :: vae_rocm - check 69.94073186302559 <= 68.2: sdxl vae benchmark time should not regress more than a factor of 1.1

Sure, if it persists, I'll revert.

benvanik · 2025-05-13T23:13:06Z

Looks like it's right on the edge - subsequent merges on main are failing: https://github.com/iree-org/iree/actions/runs/15007458464/job/42169770312

you may need to up the tolerance (or figure out why it regressed)

pashu123 · 2025-05-13T23:20:43Z

Looks like it's right on the edge - subsequent merges on main are failing: https://github.com/iree-org/iree/actions/runs/15007458464/job/42169770312

you may need to up the tolerance (or figure out why it regressed)

The latest one is passing though: https://github.com/iree-org/iree/actions/runs/15007816465/job/42171017802

pashu123 · 2025-05-13T23:24:55Z

Looks like it's right on the edge - subsequent merges on main are failing: https://github.com/iree-org/iree/actions/runs/15007458464/job/42169770312
you may need to up the tolerance (or figure out why it regressed)

The latest one is passing though: https://github.com/iree-org/iree/actions/runs/15007816465/job/42171017802

This PR also goes through: https://github.com/iree-org/iree/actions/runs/15007320793/job/42172666060 . The CI would be flaky.

pashu123 · 2025-05-13T23:26:00Z

Oh! I missed it. Yes, it's right on edge.

MaheshRavishankar · 2025-05-14T01:27:19Z

Actually looks like only VAE regressed. Would be good to triage that, but almost everything else got better. Maybe we should change the golden time for others as well

benvanik · 2025-05-14T01:30:54Z

yeah, if timings went down the lower so we don't backslide!

pashu123 changed the title ~~Every kernel~~ [GPU] Enable vector distribute pipeline by default. May 8, 2025

pashu123 force-pushed the every_kernel branch 4 times, most recently from 47491d1 to a7939a8 Compare May 9, 2025 19:55

pashu123 changed the title ~~[GPU] Enable vector distribute pipeline by default.~~ [GPU] Enable vector distribute on reduction operations by default May 9, 2025

pashu123 marked this pull request as ready for review May 12, 2025 21:46

pashu123 requested review from MaheshRavishankar, qedawkins, kuhar, Groverkss and hanhanW as code owners May 12, 2025 21:46

Groverkss reviewed May 13, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp Show resolved Hide resolved

Groverkss approved these changes May 13, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp Outdated Show resolved Hide resolved

pashu123 added 10 commits May 13, 2025 20:59

Update nvvm tests.

e0607ee

Update reduction pipeline cuda tests

aca6262

Update reduction rocm tests

815a60c

Update reduction pipeline softmax tests

d57daf4

Update config_vector distribute reduction

fbc663c

Add the rematerialize parallel ops pass in the common passes

65a2ef1

Also, update the wgp tile sizes

Run softmax decomposition before elementwise fusion

a46545e

Move the fusion inside softmax decomposition

304786e

Address comments

b686a97

pashu123 force-pushed the every_kernel branch from 13296a5 to b686a97 Compare May 13, 2025 20:59

pashu123 merged commit c402b9c into iree-org:main May 13, 2025
41 checks passed

pashu123 mentioned this pull request May 14, 2025

VAE time regressed while enabling vector distribute by default. #20810

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] Enable vector distribute on reduction operations by default #20751

[GPU] Enable vector distribute on reduction operations by default #20751

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[GPU] Enable vector distribute on reduction operations by default #20751

[GPU] Enable vector distribute on reduction operations by default #20751

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!