-
Notifications
You must be signed in to change notification settings - Fork 699
[Mistral] Performance degradation with VMFB containing prefill functions of multiple batch sizes #20836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Initial analysis As part of the DeduplicateExecutables pass, the following dispatch gets changed as follows
to
At HAL for the prefill_bs_2_8 case, the workgroup sizes are (512, 4, z)
At HAL for prefill_bs_4_8 case, the workgroup sizes are (1024, 2, z)
Verified by disabling the DeduplicateExecutables to see the performance gain, though it increases the VMFB size. |
Good find! This looks like a case that specialization should be able to handle - the analysis information derived from the dispatch sites should be present during executable configuration, but AFAIK today that's not really used by codegen. It'd be good to check if what's required to specialize is there (--iree-hal-dump-executable-sources-to= should show it). This same situation would arise if a single input function dispatched with different sizes, and this particular case of globbing things together just happens to definitely show it. |
(also, great triage! thanks for digging in!) |
Following packages are required to be installed to generate irpa file:
|
Uh oh!
There was an error while loading. Please reload this page.
What happened?
When a single vmfb containing prefill functions for multiple batch sizes (2 and 8), there is a performance degradation while running prefill function with batch size 8 when compared to running the VMFB with single prefill function of batch size 8.
This is not visible with batch sizes (4,8) etc.
prefill_bs_4_8.txt
prefill_bs_2_8.txt
prefill_bs_8.txt
Steps to reproduce your issue
The mlir files with prefill batch size (2,8), (4,8) and (8) are attached in this ticket.
Generate the vmfb using the following command
Run the benchmark for prefill_bs8 (SharkMI300x-3)
With prefill_bs_8.mlir / prefill_bs_4_8.mlir
With prefill_bs_2_8.mlir
What component(s) does this issue relate to?
Compiler
Version information
IREE compiler version 3.5.0rc20250514 @ d63e15e
Additional context
Steps to download model and irpa files are available here.
https://gist.github.com/pravg-amd/1b9f3e3c3abcb6f2c35fdc10a09db09d
The text was updated successfully, but these errors were encountered: