Stars
tulip-berkeley / open_clip
Forked from mlfoundations/open_clipAn open source implementation of CLIP (With TULIP Support)
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Official Repository of Absolute Zero Reasoner
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Official repository for "AM-RADIO: Reduce All Domains Into One"
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
A preview-version of one novel multimodal reasoning benchmark CharmBench.
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
MINT-1T: A one trillion token multimodal interleaved dataset.
official training and inference code of bitwise tokenizer
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Code for paper: SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
[CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text
The collection of awesome papers on alignment of diffusion models.
Dimple, the first Discrete Diffusion Multimodal Large Language Model