Highlights
- Pro
Stars
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
verl: Volcano Engine Reinforcement Learning for LLMs
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team, Alibaba Cloud.
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
A high-throughput and memory-efficient inference and serving engine for LLMs
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintained🔥]
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
[ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge
An open-source AI agent that brings the power of Gemini directly into your terminal.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
Train transformer language models with reinforcement learning.
A framework for few-shot evaluation of language models.
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Efficient Triton Kernels for LLM Training
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Ongoing research training transformer models at scale
LLM UI with advanced features, easy setup, and multiple backend support.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)