Stars
MAGI-1: Autoregressive Video Generation at Scale
Distributed Triton for Parallel Systems
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
Official implementation of Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
An open source, self-hosted implementation of the Tailscale control server
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
📚 Collection of awesome generation acceleration resources.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
A curated list of recent diffusion models for video generation, editing, and various other applications.
Official inference repo for FLUX.1 models
A flexible and efficient training framework for large-scale alignment tasks
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Retrieval and Retrieval-augmented LLMs
FlagPerf is an open-source software platform for benchmarking AI chips.
FlagScale is a large model toolkit based on open-sourced projects.
A collection of memory efficient attention operators implemented in the Triton language.
FlagGems is an operator library for large language models implemented in the Triton Language.
SGLang is a fast serving framework for large language models and vision language models.
Run PyTorch LLMs locally on servers, desktop and mobile
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Create book from markdown files. Like Gitbook but implemented in Rust
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.