8000 [Stream] Avoid clone on single user in CloneToConsumers by jtuyls · Pull Request #20667 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Stream] Avoid clone on single user in CloneToConsumers #20667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jtuyls
Copy link
Contributor
@jtuyls jtuyls commented Apr 29, 2025

There's a test case in CI that hits the maxIterationCount limit in CloneToConsumers, see following snippet from link:

2025-04-28T22:30:40.8293436Z /home/esaimana/actions-runner-2/_work/iree/iree/iree-test-suites/sharktank_models/llama3.1/assets/toy_llama_tp2.mlir:3:1: remark: clone to consumers pass failed to reach a fixed point after 32 iterations; ambiguous affinity may be present
2025-04-28T22:30:40.8295022Z module @module {
2025-04-28T22:30:40.8295393Z ^
2025-04-28T22:30:40.8295873Z ------------------------------ Captured log setup ------------------------------
2025-04-28T22:30:40.8299532Z INFO     root:binaries.py:185 Invoke IREE Tool: /home/esaimana/actions-runner-2/_work/iree/iree/venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile /home/esaimana/actions-runner-2/_work/iree/iree/iree-test-suites/sharktank_models/llama3.1/assets/toy_llama_tp2.mlir --iree-input-type=auto --iree-vm-bytecode-module-output-format=flatbuffer-binary --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false '--iree-hal-target-device=hip[0]' '--iree-hal-target-device=hip[1]' --iree-hip-target=gfx1100

This can be reproduced by running the following command on clone_to_consumers_remark.mlir.txt:

iree-opt --iree-stream-clone-to-consumers clone_to_consumers_remark.mlir

I reduced this further to the following test case:

module {
  util.global private @param_dev0 {stream.affinity = #hal.device.affinity<@__device_0>} : tensor<1xi32>
  util.global private @param_dev1 {stream.affinity = #hal.device.affinity<@__device_1>} : tensor<1xi32>
  util.func private @single_user_multi_device() -> tensor<1xi32> {
    %weight1 = util.global.load immutable @param_dev0 : tensor<1xi32>
    %weight2 = util.global.load immutable @param_dev1 : tensor<1xi32>
    %splat_value = arith.constant 123 : i32
    %splat = flow.tensor.splat %splat_value : tensor<i32>
    %16 = flow.dispatch @test::@test(%weight1, %weight2, %splat) : (tensor<1xi32>, tensor<1xi32>, tensor<i32>) -> tensor<1xi32>
    util.return %16 : tensor<1xi32>
  }
}

I am actually not entirely sure whether a dispatch with params that have different affinity is correct or whether this should never happen? It seems from the remark message that it can happen:

clone to consumers pass failed to reach a fixed point after 32 iterations; ambiguous affinity may be present

If this is correct, adding a hasOneUse check should break up the indefinite cloning of the single producer.

Signed-off-by: Jorn Tuyls <jorn.tuyls@gmail.com>
@jtuyls jtuyls requested a review from benvanik as a code owner April 29, 2025 09:28
@jtuyls
Copy link
Contributor Author
jtuyls commented May 9, 2025

@benvanik Could you help review this?

Copy link
Collaborator
@benvanik benvanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not cloning when not required seems like a good change! Independently, you are correct that behavior is undefined if you use mixed affinities - if you're producing IR like that you'll want to fix it.

Noted in the review is RE the pass option to print and the test using it - please remove that (debug printing would still be useful, but not outs and not via an option) and have the test check the condition by testing for the presence/absence of IR instead.

Comment on lines +190 to +192
if (printIterations) {
llvm::outs() << "iterationCount: " << iterationCount << "\n";
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kind of things go under LLVM_DEBUG blocks instead - then a user can pass -debug-> and see the output. This file does declare the DEBUG_TYPE but has no uses yet; you'd add this as:

LLVM_DEBUG(llvm::dbgs() << "[clone-to-consumers] iteration " << iterationCount << "/" << maxIterationCount << "\n");


// Tests that splats with only one use are never cloned.

// CHECK: iterationCount: 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't rely on debug prints in tests - instead, check what you're actually testing. Here, if you don't expect the splat to be cloned then test that the splat is not present twice.

Comment on lines -8 to +9
// CHECK: %[[SPLAT_A:.+]] = flow.tensor.splat
// CHECK: %[[SPLAT_B:.+]] = flow.tensor.splat
// CHECK-NEXT: %[[SPLAT_A:.+]] = flow.tensor.splat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these orderings seem brittle and backwards - why did you do B followed by A?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0