Stars
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Gemma open-weight LLM library, from Google DeepMind
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
verl: Volcano Engine Reinforcement Learning for LLMs
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Wan: Open and Advanced Large-Scale Video Generative Models
Official Repo for Open-Reasoner-Zero
R1-onevision, a visual language model capable of deep CoT reasoning.
Solve Visual Understanding with Reinforced VLMs
Fully open reproduction of DeepSeek-R1
Code release for "LLMs can see and hear without any training"
[CVPR 2025 Highlight] Official implementation of "MangaNinja: Line Art Colorization with Precise Reference Following"
The ultimate training toolkit for finetuning diffusion models
ReNeg: Learning Negative Embedding with Reward Guidance
Official implementation of "DepthLab: From Partial to Complete"
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation