You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During the 10 minute wait. The following behaviour was observed about the system:
One of the GPUs has VRAM usage immediately jumps to ~58%.
The rest stay at 0% VRAM usage until right before the very end.
GPU usage is at most 1%
1 CPU core stay at 100% usage most of this time.
Running the same prefill of the model through iree-run-module immediately fills the VRAM on all 8 GPUs to the expected levels and the cores appear to be actively utilized, which the python script takes 10 minutes just to create the context.
I don't know about the python level, but a capture with tracy would show what's taking so long. You can definitely capture a trace with iree-run-module (https://iree.dev/developers/performance/profiling-with-tracy/#quickstart) and see what the expected behavior is, and then getting a trace from python would let you see what is/isn't happening.
What happened?
Running the attached simplified python script takes ~10 minutes to execute the
iree.runtime.VmContext()
call when using the attached zipped Deepseek VMFB.During the 10 minute wait. The following behaviour was observed about the system:
Running the same prefill of the model through
iree-run-module
immediately fills the VRAM on all 8 GPUs to the expected levels and the cores appear to be actively utilized, which the python script takes 10 minutes just to create the context.Steps to reproduce your issue
What component(s) does this issue relate to?
Runtime
Version information
20250522
Additional context
No response
The text was updated successfully, but these errors were encountered: