8000 Do not wrap single unset_encoding op into dispatch region. by hanhanW · Pull Request #20377 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Do not wrap single unset_encoding op into dispatch region. #20377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hanhanW
Copy link
Contributor
@hanhanW hanhanW commented Mar 25, 2025

There are few pieces to plumb it through the software stack. What the revision does is:

  • Do not treat unset_encoding op as a root op, so it won't get fused into dispatches.
  • Add the unset_encoding ops to clonable list, so the CloneProducersIntoDispatchRegions pass can pull it into the consumer dispatch.
  • If there is no consumer dispatch, the unset_encoding op is converted to flow.tensor.encode op.
  • Similar to other tensor encode op, the op is folded away if an identity layout is recognized. Otherwise, the MaterializeEncodingsPass materializes unique executables for stream.tensor.encode ops and replaces them with dispatches to those executables.

The IR dump shows that there are no additional dispatches when the backend does not support encoding.

To repro, run

iree-compile \
  --output-format=vm-bytecode \
  --iree-hal-target-backends=vulkan-spirv \
  --iree-global-opt-enable-early-materialization=false \
  --mlir-print-ir-after=iree-stream-materialize-encodings \
  --mlir-print-ir-before=iree-stream-materialize-encodings \
  --mlir-disable-threading \
  matmul_f32.mlir -o /tmp/matmul_f32.vmfb

MLIR Source:

func.func @main(%lhs: tensor<?x?xf32>, %rhs: tensor<?x?xf32>) -> tensor<?x?xf32> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %M = tensor.dim %lhs, %c0 : tensor<?x?xf32>
  %N = tensor.dim %rhs, %c1 : tensor<?x?xf32>
  %cst = arith.constant 0.0 : f32
  %init = tensor.empty(%M, %N) : tensor<?x?xf32>
  %fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>
  %op = linalg.matmul
      ins(%lhs, %rhs : tensor<?x?xf32>, tensor<?x?xf32>)
      outs(%fill : tensor<?x?xf32>) -> tensor<?x?xf32>
  return %op : tensor<?x?xf32>
}

The revision deletes two old tests from dispatch_linalg_on_tensors.mlir because the functionality is tested in each pass and the pipeline tests.

Fixes #20187

There are few pieces to plumb it through the software stack. What the
revision does is:

- Do not treat unset_encoding op as a root op, so it won't get fused
  into dispatches.
- Add the unset_encoding ops to clonable list, so the
  CloneProducersIntoDispatchRegions pass can pull it into the consumer
  dispatch.
- If there is no consumer dispatch, the unset_encoding op is converted
  to flow.tensor.encode op.
- Similar to other tensor encode op, the op is folded away if an
  identity layout is recognized. Otherwise, the MaterializeEncodingsPass
  materializes uniqued executables for `stream.tensor.encode` ops and
  replaces them with dispatches to those executables.

The [IR dump](https://gist.github.com/hanhanW/581d4ae9cd8ce26e47c723a27192f7d9) shows that there are no additional dispatches when the
backend does not support encoding.

To repro, run

```bash
iree-compile \
  --output-format=vm-bytecode \
  --iree-hal-target-backends=vulkan-spirv \
  --iree-global-opt-enable-early-materialization=false \
  --mlir-print-ir-after=iree-stream-materialize-encodings \
  --mlir-print-ir-before=iree-stream-materialize-encodings \
  --mlir-disable-threading \
  matmul_f32.mlir -o /tmp/matmul_f32.vmfb
```

```mlir
func.func @main(%lhs: tensor<?x?xf32>, %rhs: tensor<?x?xf32>) -> tensor<?x?xf32> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %M = tensor.dim %lhs, %c0 : tensor<?x?xf32>
  %N = tensor.dim %rhs, %c1 : tensor<?x?xf32>
  %cst = arith.constant 0.0 : f32
  %init = tensor.empty(%M, %N) : tensor<?x?xf32>
  %fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>
  %op = linalg.matmul
      ins(%lhs, %rhs : tensor<?x?xf32>, tensor<?x?xf32>)
      outs(%fill : tensor<?x?xf32>) -> tensor<?x?xf32>
  return %op : tensor<?x?xf32>
}
```

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Copy link
Contributor
@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Signed-off-by: hanhanW <hanhan0912@gmail.com>
@hanhanW hanhanW merged commit 2c0a24c into iree-org:main Mar 26, 2025
43 checks passed
@hanhanW hanhanW deleted the convert-single-unset-encoding-to-flow-tensor-encode branch March 26, 2025 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eliminate set_encoding dispatch if the encoding is dropped
2 participants
0