intmm is being compiled inconsistently with errors.

@jerryzh168

🐛 Describe the bug

We've noticed a bunch of CI failures in TorchAO

e.g. the autoquant compile test errors here: https://github.com/pytorch/ao/actions/runs/15569349188/job/43841241599

note the subsequent commit doesn't have that error so this is inconsistent: https://hud.pytorch.org/pytorch/ao/commit/b6bb7dc240b9083d105b52ee8a0393496cdbc428

What seems to be happening (link to paste of error: https://gist.github.com/HDCharles/03903b2612c727c39cd11a47594c66b0) is that a kernel is being selected that is incompatible with the actual input sizes. quantization and compilation happen without issue and then when the compiled model is run, it throws an error.

This error only started showing up as of this weekend. I'm unable to reproduce this error locally but its showing up in CI a lot.

Any help would be appreciated. I'm not sure if there's an easy way to generate a tlparse or minifier repro from CI.

Error logs

https://gist.github.com/HDCharles/03903b2612c727c39cd11a47594c66b0

Versions

see CI settings

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @msaroufim @chauhang @penguinwu @voznesenskym @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @aakhundov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 Describe the bug

Error logs

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

🐛 Describe the bug

Error logs

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions