Description
🐛 Describe the bug
We've noticed a bunch of CI failures in TorchAO
e.g. the autoquant compile test errors here: https://github.com/pytorch/ao/actions/runs/15569349188/job/43841241599
note the subsequent commit doesn't have that error so this is inconsistent: https://hud.pytorch.org/pytorch/ao/commit/b6bb7dc240b9083d105b52ee8a0393496cdbc428
What seems to be happening (link to paste of error: https://gist.github.com/HDCharles/03903b2612c727c39cd11a47594c66b0) is that a kernel is being selected that is incompatible with the actual input sizes. quantization and compilation happen without issue and then when the compiled model is run, it throws an error.
This error only started showing up as of this weekend. I'm unable to reproduce this error locally but its showing up in CI a lot.
Any help would be appreciated. I'm not sure if there's an easy way to generate a tlparse or minifier repro from CI.
Error logs
https://gist.github.com/HDCharles/03903b2612c727c39cd11a47594c66b0
Versions
see CI settings
cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @msaroufim @chauhang @penguinwu @voznesenskym @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @aakhundov