Stars
FlashMLA: Efficient MLA decoding kernels
FlashInfer: Kernel Library for LLM Serving
Performance instrumentation and tracing for Android, Linux and Chrome
An extremely fast Python package and project manager, written in Rust.
Community maintained hardware plugin for vLLM on Ascend
SGLang is a fast serving framework for large language models and vision language models.
Mac Mouse Fix - Make Your $10 Mouse Better Than an Apple Trackpad!
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Zero-runtime-dependency package acting as bridge between Node projects and their package managers
An open source trusted cloud native registry project that stores, signs, and scans content.
fanshiqing / grouped_gemm
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Efficient Training (including pre-training and fine-tuning) for Big Models
⚓ A collection of JavaScript tools written in Rust.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
PyTorch extensions for high performance and large scale training.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Measuring Massive Multitask Language Understanding | ICLR 2021
FastAPI framework, high performance, easy to learn, fast to code, ready for production
Monitor Memory usage of Python code
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Fast and memory-efficient exact attention
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Extremely fast Vite-compatible web build tool written in Rust
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference