-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedApr 28, 2025 -
Megatron-LM Public
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
-
Awesome-Efficient-LLM Public
Forked from horseee/Awesome-Efficient-LLMA curated list for Efficient Large Language Models
Python UpdatedApr 6, 2025 -
Oobleck Public
Forked from SymbioticLab/OobleckA resilient distributed training framework
-
gemma_pytorch Public
Forked from google/gemma_pytorchThe official PyTorch implementation of Google's Gemma models
Python Apache License 2.0 UpdatedMar 12, 2025 -
Metis Public
Forked from zzhx1/Metis[ATC '24] Metis: Fast automatic distributed training on heterogeneous GPUs (https://www.usenix.org/conference/atc24/presentation/um)
Python Other UpdatedMar 11, 2025 -
-
DeepEP Public
Forked from deepseek-ai/DeepEPDeepEP: an efficient expert-parallel communication library
Cuda MIT License UpdatedFeb 25, 2025 -
-
Mooncake Public
Forked from kvcache-ai/MooncakeMooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
C++ Apache License 2.0 UpdatedFeb 18, 2025 -
Awesome-ML-SYS-Tutorial Public
Forked from zhaochenyang20/Awesome-ML-SYS-TutorialMy learning notes/codes for ML SYS.
Python Apache License 2.0 UpdatedFeb 13, 2025 -
Awesome-LLM-Inference Public
Forked from xlite-dev/Awesome-LLM-Inference📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
GNU General Public License v3.0 UpdatedFeb 5, 2025 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedFeb 2, 2025 -
CUDA-Learn-Notes Public
Forked from xlite-dev/LeetCUDA📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Cuda GNU General Public License v3.0 UpdatedJan 23, 2025 -
Awesome-Diffusion-Inference Public
Forked from xlite-dev/Awesome-DiT-Inference📖A curated list of Awesome Diffusion Inference Papers with codes, such as Sampling, Caching, Multi-GPUs, etc. 🎉🎉
GNU General Public License v3.0 UpdatedJan 16, 2025 -
lightllm Public
Forked from ModelTC/lightllmLightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Python Apache License 2.0 UpdatedJan 8, 2025 -
Triton-Puzzles Public
Forked from srush/Triton-PuzzlesPuzzles for learning Triton
Jupyter Notebook Apache License 2.0 UpdatedNov 18, 2024 -
long-context-attention Public
Forked from feifeibear/long-context-attentionUSP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Python Apache License 2.0 UpdatedNov 1, 2024 -
multi-gpu-programming-models Public
Forked from NVIDIA/multi-gpu-programming-modelsExamples demonstrating available options to program multiple GPUs in a single node or a cluster
Cuda BSD 3-Clause "New" or "Revised" License UpdatedOct 30, 2024 -
nccl Public
Forked from NVIDIA/ncclOptimized primitives for collective multi-GPU communication
C++ Other UpdatedSep 17, 2024 -
FlashFlex Public
Forked from Relaxed-System-Lab/HexiScaleAccommodating Large Language Model Training over Heterogeneous Environment.
Python Apache License 2.0 UpdatedSep 4, 2024 -
-
apex Public
Forked from NVIDIA/apexA PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Python BSD 3-Clause "New" or "Revised" License UpdatedJun 12, 2024 -
mnist-dits Public
Forked from owenliang/mnist-ditsDiffusion Transformers (DiTs) trained on MNIST dataset
Python UpdatedApr 4, 2024 -
SuperScaler Public
Forked from microsoft/SuperScalerAn experimental parallel training platform
Other UpdatedMar 25, 2024 -
my_ib_traffic_gen_roce Public
The ib tranffic gen for RoCE. Basically RDMA send/write the same memory.
C UpdatedDec 7, 2023 -
UCAS-enroll Public
Forked from amefumi/UCAS-enrollA Python course enrollment assistant framework. 一个Python的选课助手框架
Python UpdatedJun 7, 2023 -
my_vmm Public
A VMM demo writen in rust, using the crates in rust-vmm which is binding of KVM.
Rust UpdatedMar 31, 2023 -
-