Stars
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
Arena-Hard-Auto: An automatic LLM benchmark.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
A high-throughput and memory-efficient inference and serving engine for LLMs
Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)
verl: Volcano Engine Reinforcement Learning for LLMs
"AutoAgent: Fully-Automated and Zero-Code LLM Agent Framework"
Official PyTorch Implementation of "History-Guided Video Diffusion"
[3DV 2024] official repo of 3DV paper "RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation"
Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
[ICLR 2025 Spotlight] Official Implementation for ToST (Token Statistics Transformer)
Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"
Implementation of Learning Video Representations without Natural Videos
Janus-Series: Unified Multimodal Understanding and Generation Models
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Empowering Unified MLLM with Multi-granular Visual Generation
Flops counter for neural networks in pytorch framework
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.