8000 Slow iree.runtime.VmContext creation through python · Issue #20900 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Slow iree.runtime.VmContext creation through python #20900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Alex-Vasile opened this issue May 26, 2025 · 1 comment
Open

Slow iree.runtime.VmContext creation through python #20900

Alex-Vasile opened this issue May 26, 2025 · 1 comment
Labels
performance ⚡ Performance/optimization related work across the compiler and runtime

Comments

@Alex-Vasile
Copy link
Contributor

What happened?

Running the attached simplified python script takes ~10 minutes to execute the iree.runtime.VmContext() call when using the attached zipped Deepseek VMFB.

During the 10 minute wait. The following behaviour was observed about the system:

  • One of the GPUs has VRAM usage immediately jumps to ~58%.
  • The rest stay at 0% VRAM usage until right before the very end.
  • GPU usage is at most 1%
  • 1 CPU core stay at 100% usage most of this time.

Running the same prefill of the model through iree-run-module immediately fills the VRAM on all 8 GPUs to the expected levels and the cores appear to be actively utilized, which the python script takes 10 minutes just to create the context.

Steps to reproduce your issue

  1. Run python file in attached python script with the provided zipped vmfb mi300x-3 machine.
  2. Wait 10 minutes.

What component(s) does this issue relate to?

Runtime

Version information

20250522

Additional context

No response

@Alex-Vasile Alex-Vasile added the performance ⚡ Performance/optimization related work across the compiler and runtime label May 26, 2025
@benvanik
Copy link
Collaborator

I don't know about the python level, but a capture with tracy would show what's taking so long. You can definitely capture a trace with iree-run-module (https://iree.dev/developers/performance/profiling-with-tracy/#quickstart) and see what the expected behavior is, and then getting a trace from python would let you see what is/isn't happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance ⚡ Performance/optimization related work across the compiler and runtime
Projects
None yet
Development

No branches or pull requests

2 participants
0