Stars
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation
[CVPR' 25] Official repo for From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement learning—to …
The official code repository for the FullFront benchmark
instruction-following benchmark for large reasoning models
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
FlashAttention-2 in Triton for sliding window attention (fwd + bwd pass)
Official Repository of "Learning to Reason under Off-Policy Guidance"
😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
A framework for few-shot evaluation of language models.
Fast and memory-efficient exact attention
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Efficient vision foundation models for high-resolution generation and perception.
Test-time preferenece optimization (ICML 2025).
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
Scalable RL solution for advanced reasoning of language models
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型