You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're currently using a custom JIT system to target CUDA kernels. Would it make sense to explore integrating Triton as a backend particularly for accelerating complex kernels like GEMM, attention, or fused operations?
Triton now supports both CUDA and ROCm/HIP (AMD) backends, which could open the door to more portable high-performance code. This might also make it easier for users to bring in custom or optimized kernels from the PyTorch ecosystem (where Triton adoption is growing).
(cc @zcbenz) curious if this has been considered already, or if there are challenges that make Triton incompatible with the current JIT system?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We're currently using a custom JIT system to target CUDA kernels. Would it make sense to explore integrating Triton as a backend particularly for accelerating complex kernels like GEMM, attention, or fused operations?
Triton now supports both CUDA and ROCm/HIP (AMD) backends, which could open the door to more portable high-performance code. This might also make it easier for users to bring in custom or optimized kernels from the PyTorch ecosystem (where Triton adoption is growing).
(cc @zcbenz) curious if this has been considered already, or if there are challenges that make Triton incompatible with the current JIT system?
Beta Was this translation helpful? Give feedback.
All reactions