-
Notifications
You must be signed in to change notification settings - Fork 703
[Integrate] Upstream narrow type emulation is breaking iree test #20645 8000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Updated LIT test from landing llvm/llvm-project#136640 which folds linalg.index when size is unit dim (1). This patch carries revert of llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracking issue in iree-org#20645. llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Updated LIT test from landing llvm/llvm-project#136640 which folds linalg.index when size is unit dim (1). Added chipSet argument into populateGpuToROCDLConversionPatterns based on changes in llvm/llvm-project#137360 This patch carries revert of llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue in #20645. llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
So, I think what should be done there is that the slow path should be guarded by a runtime condition as in llvm/llvm-project#135014 But, in addition, the narrow type emulation should be moved before scf-to-cf - heck, before OptimizeIntArithmetic - so that we can take advantage of divisibility analysis |
That is the way to go. Currently the transformation is also unable to handle some other cases, all needs runtime check. |
To clarify, which "other cases"? |
All the cases with a non-constant storing index. |
@raikonenfnu Was the emulation in the case of non-constant/dynamic indexing already working before? I think it was not working as intended even before this problematic patch. |
I do think the patch was fixing a real correctness issue, but didn't do it in a performant way |
This patch carries 4 reverts from #20646: llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue in #20645. llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Sorry that my patch is causing this issue - I'd like to see if I can help resolve it. To start, I’m trying to better understand the underlying problem. When I lower the reproducer (with my patch reverted), I get the following IR: func.func @_f32_to_i4_1d_dispatch_0_elementwise_8_f32xi4() {
%cst = arith.constant dense<4> : vector<2xi32>
%cst_0 = arith.constant dense<15> : vector<2xi32>
%c4 = arith.constant 4 : index
%c8 = arith.constant 8 : index
%c0 = arith.constant 0 : index
%0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : memref<8xf32>
memref.assume_alignment %0, 64 : memref<8xf32>
%1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(Indirect) : memref<4xi8>
memref.assume_alignment %1, 64 : memref<4xi8>
cf.br ^bb1(%c0 : index)
^bb1(%2: index): // 2 preds: ^bb0, ^bb2
%3 = arith.cmpi slt, %2, %c8 : index
cf.cond_br %3, ^bb2, ^bb3
^bb2: // pred: ^bb1
%4 = vector.load %0[%2] : memref<8xf32>, vector<4xf32>
%5 = arith.fptoui %4 : vector<4xf32> to vector<4xi32>
%6 = affine.apply #map()[%2]
%7 = vector.shuffle %5, %5 [0, 2] : vector<4xi32>, vector<4xi32>
%8 = arith.andi %7, %cst_0 : vector<2xi32>
%9 = vector.shuffle %5, %5 [1, 3] : vector<4xi32>, vector<4xi32>
%10 = arith.andi %9, %cst_0 : vector<2xi32>
%11 = arith.shli %10, %cst : vector<2xi32>
%12 = arith.ori %8, %11 : vector<2xi32>
%13 = arith.trunci %12 : vector<2xi32> to vector<2xi8>
vector.store %13, %1[%6] : memref<4xi8>, vector<2xi8>
%14 = arith.addi %2, %c4 : index
cf.br ^bb1(%14 : index)
^bb3: // pred: ^bb1
return
} What stands out to me is that %cst_0 = arith.constant dense<15> : vector<2xi32> ... but it doesn't look correct. So, it feels like the failing test is exercising an incorrect lowering path in upstream MLIR? And my patch effectively disables that. Does this agree with your understanding? |
Carries the same 4 reverts as #20657: - llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue in #20645. - llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel - llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. - llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Updated LIT test from landing llvm/llvm-project#136640 which folds linalg.index when size is unit dim (1). Added chipSet argument into populateGpuToROCDLConversionPatterns based on changes in llvm/llvm-project#137360 This patch carries revert of llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue in iree-org#20645. llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
This patch carries 4 reverts from iree-org#20646: llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue in iree-org#20645. llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Carries the same 4 reverts as iree-org#20657: - llvm/llvm-project#133231. This PR breaks fp_to_subbytes and emulation_subbyte_types on llvm-cpu tests. iree-test-deps. tracker issue in iree-org#20645. - llvm/llvm-project#137122. StableHLO and Torch-mlir needs to update their usage of GreedyRewriteConfig to use fluent API. i.e enableRegionSimplification = VS setRegionSimplificationLevel - llvm/llvm-project#135970. StableHLO has issue with VHLO_IntegerAttr and APInt not being updated. StableHLO needs to be updated with that PR's change for us to be able to integrate. - llvm/llvm-project#121389. Torch-MLIR needs to be updated with that PR's change for us to be able to integrate. Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
I think I know what's happening. Below is the IR that I move the emulation before scf->cf lowering, which looks easier. It also trims down IREE specific ops. When we do tiling in IREE, the tile sizes are driven by native vector size on CPU. For func.func @main(%arg0: memref<8xf32>, %arg1: memref<8xi4>) {
%c4 = arith.constant 4 : index
%c8 = arith.constant 8 : index
%c0 = arith.constant 0 : index
scf.for %arg2 = %c0 to %c8 step %c4 {
%0 = vector.load %arg0[%arg2] : memref<8xf32>, vector<4xf32>
%1 = arith.fptoui %0 : vector<4xf32> to vector<4xi32>
%2 = arith.trunci %1 : vector<4xi32> to vector<4xi4>
vector.store %2, %arg1[%arg2] : memref<8xi4>, vector<4xi4>
}
return
} The upstream fix is reasonable, and it enables the support for unaligned cases. Here, I think we miss a hint for the pattern. This can either be:
I don't know how to achieve (2) at the moment, maybe ValueRange analysis can help. I only use it for querying upper_bound. Let me take a look if it can check if a value is a multiple of something or not. In IREE, we have iree/compiler/src/iree/compiler/Codegen/Common/test/block_dynamic_dims.mlir Lines 14 to 17 in 5140464
|
Okay, I think upstream may not support this. To fix it properly in IREE, I think we should move some patterns back to IREE and use TensorDynamicDimAnalysis to get the information. We can prioritize the IREE patterns in this case. |
yeah, especially for stuff like this I just assume (hah) upstream doesn't support what we need in conjunction with our better type support and assume ops. really good detective work on this - very tricky interactions! |
@lialan may be busy on other stuff. Assigning it to me for now, and I can find someone or me to work on it. (cc @MaheshRavishankar ) |
We could also get around to adding |
Another thing that can be done upstream is to put the slow and the fast cases in two arms of an if statement and let integer range analysis clear off the slow path. This is what got done for the buffer OOB stuff |
I'd prefer not using this approach atm, because we can get all the needed information from IR. I can be convinced if there are other use cases.
I'm not sure how integrate range analysis works with upstream patterns. It seems like the analysis should only be run once because of efficiency. Then we collect the ops and apply the transform. Also, the analysis is only available in IREE, so I'm not sure how they can be connected. Anyway, I think we have a solid plan, and we can fix it if someone pick this up. |
Agreed RE upstream analysis interaction - I think we could provide interface implementations if upstream uses them that distills the info (I don't think we do). The particular benefit of the |
Integer range analysis lives upstream - the only thing that doesn't is divisibility analysis, and that should be fairly easily upstreamable And even if integer range analysis doesn't work, having the pattern gener9ate
instead of just generating |
What happened?
llvm/llvm-project#133231 seems to be breaking our subbyte emulation tests.
A simple repro is https://gist.github.com/raikonenfnu/1ea07b7e231d8997bfa1c29502df637d
original test https://github.com/iree-org/iree/blob/1a8d229431e62b50eb297e75ca4bf1dba3b67f65/tests/e2e/linalg/fp_to_subbyte.mlir
Based on the above test, currently IREE is generating code that has different trailing dim and "virtually"/fake non constant offset into the store
vector.store %0[%2]
where %2 is actually a constant, but since it is inside a branch region, %c0 becomes %2 which is a region argument and is now a non-constant.The condition above leads our program into this code path where it cannot determine
!foldedNumFrontPadElems
and fails this lowering. as seen in:https://github.com/llvm/llvm-project/blob/2de936b6eb38e7a37224a97c2a22aa79b9dfb9dc/mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp#L619-L629
IIUC, that PR is somewhat correct, as long as the trailing dim does not match, we may need partial stores.
I don't think I have enough context on the best way to solve this. We can:
Steps to reproduce your issue
wget https://gist.githubusercontent.com/raikonenfnu/1ea07b7e231d8997bfa1c29502df637d/raw/6e3311f6794dd5afeaeac0f82422cbe498d67272/emulate_failure.mlir
iree-opt --iree-codegen-emulate-narrow-type emulate_failure.mlir -o out.mlir
What component(s) does this issue relate to?
No response
Version information
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: