-
Notifications
You must be signed in to change notification settings - Fork 703
Tensor cores not utilised when using iree-run-module --device=cuda
#11887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To use tensorcore you need to pass the right target as IREE is meant for cross compilation and doesn't query the target. On A100 you need to set Currently convolution don't have a codegen using tensorcore by default, in order for it to happen you need to set those flags to convert conv to matmul:
With all those flags set Tensorcore will be used. |
I would recommend updating your iree installation or building from head as well. Your current version is from September 30th which would be quite out of date by this point. |
Thanks. We are doing a performance comparison and using the git HEAD isn't ideal. Is there a recommended most recent stable/performing commit to use? Also, can the python packages please be updated here https://pypi.org/project/iree-tools-tf/? |
Thanks. Is there a stability/performance reason these passes aren't enabled by default? The reason I'm asking is that we are doing a performance comparison and we'd like to use a uniform and standard set of flags across all models as much as possible. |
The target-arch level will always be required in some fashion to generate code which correctly exploits a hardware generation. The others represent temporary passes that we added while implementing more generic/proper support for various features. Specifically:
We don't like to enable options by default that are partial implementations that we are working to finish properly, and these each would be subsumed by active projects. There isn't anything wrong with them that we know of, and people who are using this for real work do set them. But they are not general. |
Thanks for clarifying this. Sounds good. |
Hi @ThomasRaoux I am at
|
Those flags changed starting from yesterday's commit. Sorry for the inconvenience. What you want to use on latest iree is: |
Hi @ThomasRaoux, thanks. This is executing now. However tensor cores are not getting used nor there is any difference in perf with or without these flags. I am running ResNet50V2 from keras. I suspect that it is due to the data type of the operands to |
We don't have a flag to automatically demote operands from fp32 to fp16. Could you share the mhlo IR? |
Hi @ThomasRaoux, I am attaching two files one with the default tensorflow precision policy and the other with mixed-float16 precision policy. The one with mixed-float16 policy fails to lower with the following error:-
|
Hi @ThomasRaoux, is there a way I can still use tensor cores for this IR? |
@ThomasRaoux Bumping this up, can you take a look? |
Sorry for missing this issue. We need implicit gemm support for this to happen. |
@ThomasRaoux @mattwalsh Setting as a P2 since we don't yet have the implicit gemm support for this - please bump up when needed. |
Uh oh!
There was an error while loading. Please reload this page.
Hi all,
Following the instructions here (https://iree-org.github.io/iree/deployment-configurations/gpu-cuda-rocm/), I am trying to run ResNet50 using IREE command line tools downloaded via PIP. However, upon profiling the model using Nsight compute, I see that the model is not using tensor cores.
Is there a flag/env_var that needs to be set to enable tensor cores? Any suggestion would be appreciated.
Thanks
Package versions:
The text was updated successfully, but these errors were encountered: