-
Notifications
You must be signed in to change notification settings - Fork 701
Hang in iree-run-module on ONNX dequantizelinear test case #16666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
try --trace_execution? |
|
feels like a miscompile to me - I have no idea what's going on there (copies and then a load and such) |
That was my feeling too, or a divergent / infinite loop based on the input values (wondering about the ui8 inputs). |
When a test times out the error messages are basically useless (for the entire run, not just the specific test cases), since https://pypi.org/project/pytest-timeout/ terminates the test rather forcefully... but timing out is still better than running forever. For the test cases in #16666 that timed out, I just excluded them from the test job configs.
I no longer see these hangs locally. Weird, since I want to debug an unrelated hang and thought these could be helpful with debugging :/ |
Starting to narrow down the poor failure mode here. Seems like pytest-timeout and pytest-retry are (sometimes?) incompatible with one another. Still not sure why the tests are actually hanging though. |
Progress on #16666. The [pytest-retry](https://pypi.org/project/pytest-retry/) and [pytest-timeout](https://pypi.org/project/pytest-timeout/) plugins initially appeared to be compatible. However, we've been observing tests hanging without ever tripping the timeout. Hangs under [pytest-xdist](https://pypi.org/project/pytest-xdist/) don't log information about individual tests until after the full suite finishes, making such failures impossible to debug directly ([sample logs](https://github.com/iree-org/iree/actions/runs/9038964717/job/24841208573#step:9:43)). Conversely, test failure flakes that would benefit from a retry show which test failed and can be attempted again from the GitHub UI. Tests that hang still need to be added to `skip_run_tests` instead of `expected_run_failures`, since our conftest.py logic is only looking for failures in `iree-compile` or `iree-run-module`, not timeout signals from the test runner. See [these sample logs](https://github.com/iree-org/iree/actions/runs/9069578568/job/24919962287#step:16:4699) for what a timeout failure now looks like. ci-exactly: build_packages,regression_test_amdgpu_rocm,regression_test_cpu,regression_test_nvidiagpu_cuda,regression_test_nvidiagpu_vulkan,test_tensorflow_cpu
…#17384) Progress on iree-org#16666. The [pytest-retry](https://pypi.org/project/pytest-retry/) and [pytest-timeout](https://pypi.org/project/pytest-timeout/) plugins initially appeared to be compatible. However, we've been observing tests hanging without ever tripping the timeout. Hangs under [pytest-xdist](https://pypi.org/project/pytest-xdist/) don't log information about individual tests until after the full suite finishes, making such failures impossible to debug directly ([sample logs](https://github.com/iree-org/iree/actions/runs/9038964717/job/24841208573#step:9:43)). Conversely, test failure flakes that would benefit from a retry show which test failed and can be attempted again from the GitHub UI. Tests that hang still need to be added to `skip_run_tests` instead of `expected_run_failures`, since our conftest.py logic is only looking for failures in `iree-compile` or `iree-run-module`, not timeout signals from the test runner. See [these sample logs](https://github.com/iree-org/iree/actions/runs/9069578568/job/24919962287#step:16:4699) for what a timeout failure now looks like. ci-exactly: build_packages,regression_test_amdgpu_rocm,regression_test_cpu,regression_test_nvidiagpu_cuda,regression_test_nvidiagpu_vulkan,test_tensorflow_cpu
…#17384) Progress on iree-org#16666. The [pytest-retry](https://pypi.org/project/pytest-retry/) and [pytest-timeout](https://pypi.org/project/pytest-timeout/) plugins initially appeared to be compatible. However, we've been observing tests hanging without ever tripping the timeout. Hangs under [pytest-xdist](https://pypi.org/project/pytest-xdist/) don't log information about individual tests until after the full suite finishes, making such failures impossible to debug directly ([sample logs](https://github.com/iree-org/iree/actions/runs/9038964717/job/24841208573#step:9:43)). Conversely, test failure flakes that would benefit from a retry show which test failed and can be attempted again from the GitHub UI. Tests that hang still need to be added to `skip_run_tests` instead of `expected_run_failures`, since our conftest.py logic is only looking for failures in `iree-compile` or `iree-run-module`, not timeout signals from the test runner. See [these sample logs](https://github.com/iree-org/iree/actions/runs/9069578568/job/24919962287#step:16:4699) for what a timeout failure now looks like. ci-exactly: build_packages,regression_test_amdgpu_rocm,regression_test_cpu,regression_test_nvidiagpu_cuda,regression_test_nvidiagpu_vulkan,test_tensorflow_cpu Signed-off-by: Lubo Litchev <lubol@google.com>
I'm seeing
iree-run-module
hang indefinitely with this call stack:Original test case: https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests/onnx/node/generated/test_dequantizelinear
Input MLIR:
After
iree-compile --compile-to=input dequantize_linear.mlir
:--print-ir-after-all
output: https://gist.github.com/ScottTodd/15ea43eae8733806b80d7d9ed1fb4f1aRun command (using the input .npy files from the test case folder):
Running those .npy files through
iree\tools\test\echo_npy.py
produces(first is type ui8 though, so why negative?)
The text was updated successfully, but these errors were encountered: