Triton Integration for High-Performance CUDA Kernels? #2301

NripeshN · 2025-06-16T22:40:11Z

NripeshN
Jun 16, 2025

We're currently using a custom JIT system to target CUDA kernels. Would it make sense to explore integrating Triton as a backend particularly for accelerating complex kernels like GEMM, attention, or fused operations?

Triton now supports both CUDA and ROCm/HIP (AMD) backends, which could open the door to more portable high-performance code. This might also make it easier for users to bring in custom or optimized kernels from the PyTorch ecosystem (where Triton adoption is growing).

(cc @zcbenz) curious if this has been considered already, or if there are challenges that make Triton incompatible with the current JIT system?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Triton Integration for High-Performance CUDA Kernels? #2301

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Triton Integration for High-Performance CUDA Kernels? #2301

Uh oh!

Uh oh!

NripeshN Jun 16, 2025

Replies: 0 comments

NripeshN
Jun 16, 2025