Open
Description
Summary
torch._grouped_mm
currently only supports compute capability 9.0, but should also support SM100/B200 cards.
Current Behavior
torch._grouped_mm(...)
# Error: torch._grouped_mm is only supported on CUDA devices with compute capability = 9.0
Expected Behavior
torch._grouped_mm
should work on SM100/B200 hardware (compute capability 9.0+).
Environment
- Hardware: SM100/B200 (Blackwell)
- PyTorch: current main