devkade

DongKyu Kang devkade

Achievements

Stars

7 repositories

Triton kernels for Flux

Python 20 Updated Jan 1, 2025

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

Python 39,134 3,064 Updated May 21, 2025

A performance library for machine learning applications.

Python 183 13 Updated Oct 12, 2023

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python 537 29 Updated May 16, 2025

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,566 97 Updated Feb 16, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,495 672 Updated May 14, 2025

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python 349 22 Updated Mar 10, 2025