[Stream] Avoid clone on single user in CloneToConsumers #20667

jtuyls · 2025-04-29T09:28:43Z

There's a test case in CI that hits the maxIterationCount limit in CloneToConsumers, see following snippet from link:

2025-04-28T22:30:40.8293436Z /home/esaimana/actions-runner-2/_work/iree/iree/iree-test-suites/sharktank_models/llama3.1/assets/toy_llama_tp2.mlir:3:1: remark: clone to consumers pass failed to reach a fixed point after 32 iterations; ambiguous affinity may be present
2025-04-28T22:30:40.8295022Z module @module {
2025-04-28T22:30:40.8295393Z ^
2025-04-28T22:30:40.8295873Z ------------------------------ Captured log setup ------------------------------
2025-04-28T22:30:40.8299532Z INFO     root:binaries.py:185 Invoke IREE Tool: /home/esaimana/actions-runner-2/_work/iree/iree/venv/lib/python3.11/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile /home/esaimana/actions-runner-2/_work/iree/iree/iree-test-suites/sharktank_models/llama3.1/assets/toy_llama_tp2.mlir --iree-input-type=auto --iree-vm-bytecode-module-output-format=flatbuffer-binary --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false '--iree-hal-target-device=hip[0]' '--iree-hal-target-device=hip[1]' --iree-hip-target=gfx1100

This can be reproduced by running the following command on clone_to_consumers_remark.mlir.txt:

iree-opt --iree-stream-clone-to-consumers clone_to_consumers_remark.mlir

I reduced this further to the following test case:

module {
  util.global private @param_dev0 {stream.affinity = #hal.device.affinity<@__device_0>} : tensor<1xi32>
  util.global private @param_dev1 {stream.affinity = #hal.device.affinity<@__device_1>} : tensor<1xi32>
  util.func private @single_user_multi_device() -> tensor<1xi32> {
    %weight1 = util.global.load immutable @param_dev0 : tensor<1xi32>
    %weight2 = util.global.load immutable @param_dev1 : tensor<1xi32>
    %splat_value = arith.constant 123 : i32
    %splat = flow.tensor.splat %splat_value : tensor<i32>
    %16 = flow.dispatch @test::@test(%weight1, %weight2, %splat) : (tensor<1xi32>, tensor<1xi32>, tensor<i32>) -> tensor<1xi32>
    util.return %16 : tensor<1xi32>
  }
}

I am actually not entirely sure whether a dispatch with params that have different affinity is correct or whether this should never happen? It seems from the remark message that it can happen:

clone to consumers pass failed to reach a fixed point after 32 iterations; ambiguous affinity may be present

If this is correct, adding a hasOneUse check should break up the indefinite cloning of the single producer.

Signed-off-by: Jorn Tuyls <jorn.tuyls@gmail.com>

jtuyls · 2025-05-09T07:47:30Z

@benvanik Could you help review this?

benvanik

Not cloning when not required seems like a good change! Independently, you are correct that behavior is undefined if you use mixed affinities - if you're producing IR like that you'll want to fix it.

Noted in the review is RE the pass option to print and the test using it - please remove that (debug printing would still be useful, but not outs and not via an option) and have the test check the condition by testing for the presence/absence of IR instead.

benvanik · 2025-05-09T15:23:42Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/CloneToConsumers.cpp

+    if (printIterations) {
+      llvm::outs() << "iterationCount: " << iterationCount << "\n";
+    }


These kind of things go under LLVM_DEBUG blocks instead - then a user can pass -debug-> and see the output. This file does declare the DEBUG_TYPE but has no uses yet; you'd add this as:
LLVM_DEBUG(llvm::dbgs() << "[clone-to-consumers] iteration " << iterationCount << "/" << maxIterationCount << "\n");

benvanik · 2025-05-09T15:24:41Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/test/clone_to_consumers.mlir

+
+// Tests that splats with only one use are never cloned.
+
+// CHECK:       iterationCount: 0


Don't rely on debug prints in tests - instead, check what you're actually testing. Here, if you don't expect the splat to be cloned then test that the splat is not present twice.

benvanik · 2025-05-09T15:25:15Z

compiler/src/iree/compiler/Dialect/Stream/Transforms/test/clone_to_consumers.mlir

-  //      CHECK: %[[SPLAT_A:.+]] = flow.tensor.splat
+  //      CHECK: %[[SPLAT_B:.+]] = flow.tensor.splat
+  // CHECK-NEXT: %[[SPLAT_A:.+]] = flow.tensor.splat


these orderings seem brittle and backwards - why did you do B followed by A?

[Stream] Avoid clone on single user in CloneToConsumers

1ba7663

Signed-off-by: Jorn Tuyls <jorn.tuyls@gmail.com>

jtuyls requested a review from benvanik as a code owner April 29, 2025 09:28

8000

benvanik requested changes May 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Stream] Avoid clone on single user in CloneToConsumers #20667

[Stream] Avoid clone on single user in CloneToConsumers #20667

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		// Tests that splats with only one use are never cloned.

		// CHECK: iterationCount: 0

[Stream] Avoid clone on single user in CloneToConsumers #20667

Are you sure you want to change the base?

[Stream] Avoid clone on single user in CloneToConsumers #20667

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!