Stars
Quick scripts to calculate CLIP text-image similarity
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
End2EndPerception deployment solution based on vision sparse transformer paradigm is open sourced.
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Efficient Triton Kernels for LLM Training
VideoSys: An easy and efficient system for video generation
Natural Language Processing Tutorial for Deep Learning Researchers
DLRover: An Automatic Distributed Deep Learning System
Fast and memory-efficient exact attention
Modular visual interface for GDB in Python
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
A simple C++11 Thread Pool implementation
A collection of modern C++ libraries, include coro_http, coro_rpc, compile-time reflection, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple etc.
A high performance LLVM-based dynamic binary instrumentation framework
Simple, light-weight and easy-to-use asynchronous components
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A curated list of awesome projects related to eBPF.