triton

Here are 3 public repositories matching this topic...

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

cuda triton attention quantization video-generation mlsys inference-acceleration efficient-attention llm

Updated Feb 15, 2025
Cuda

romitjain / learning-gpu-programming

Star

Learnings and experimentation with GPU programming

gpu cuda triton gpu-programming

Updated Feb 1, 2025
Cuda

alexkranias / triton_vs_cuda

Star

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

python gpu cuda triton cuda-kernels parallel-programming gpu-programming

Updated Sep 7, 2024
Cuda

Improve this page

Add a description, image, and links to the triton topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the triton topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly