Lists (5)
Sort Name ascending (A-Z)
Stars
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
wolfecameron / nanoMoE
Forked from karpathy/nanoGPTAn extension of the nanoGPT repository for training small MOE models.
Machine Learning Engineering Open Book
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
ByteCheckpoint: An Unified Checkpointing Library for LFMs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
A Datacenter Scale Distributed Inference Serving Framework
Minimal reproduction of DeepSeek R1-Zero
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
Minimalistic large language model 3D-parallelism training
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
PyTorch native quantization and sparsity for training and inference
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
Official Repo for Open-Reasoner-Zero
Helpful tools and examples for working with flex-attention
Textbook on reinforcement learning from human feedback
verl: Volcano Engine Reinforcement Learning for LLMs
A high-throughput and memory-efficient inference and serving engine for LLMs
Fully open reproduction of DeepSeek-R1
Minimalistic 4D-parallelism distributed training framework for education purpose
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Transform datasets at scale. Optimize datasets for fast AI model training.