Stars
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA environments.
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
8000 div>Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Fast and memory-efficient exact attention
Wan: Open and Advanced Large-Scale Video Generative Models
SGLang is a fast serving framework for large language models and vision language models.
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
JundaLi07 / ktransformers
Forked from kvcache-ai/ktransformersA Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
NVIDIA Linux open GPU with P2P support
Computer science books Recommended by AzatAI. (Education ONLY)
IIMS College AI class of batch 2022
Machine Learning Resources, Practice and Research
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
🦜🔗 Build context-aware reasoning applications
Accelerate inference without tears
REST: Retrieval-Based Speculative Decoding, NAACL 2024
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training