- Hangzhou, China
Stars
Train transformer language models with reinforcement learning.
ByteCheckpoint: An Unified Checkpointing Library for LFMs
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
verl: Volcano Engine Reinforcement Learning for LLMs
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
A PyTorch native platform for training generative AI models
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
An industrial deep learning framework for high-dimension sparse data
Kubernetes-native Deep Learning Framework
DLRover: An Automatic Distributed Deep Learning System
Policy based networking for cloud native applications
flannel is a network fabric for containers, designed for Kubernetes
gRPC to JSON proxy generator following the gRPC HTTP spec
PyTorch extensions for high performance and large scale training.
Making large AI models cheaper, faster and more accessible
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术