Stars
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.
WorldVLA: Towards Autoregressive Action World Model
An open-source AI agent that brings the power of Gemini directly into your terminal.
收集全国各高校招生时不会写明,却会实实在在影响大学生活质量的要求与细节
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.
An agent benchmark with tasks in a simulated software company.
slime is a LLM post-training framework aiming at scaling RL.
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grained visual understanding".
Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
Muon: An optimizer for hidden layers in neural networks
Awesome Unified Multimodal Models
🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement learning—to …
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.
Kinetics: Rethinking Test-Time Scaling Laws
XiaomiMiMo / lmms-eval
Forked from EvolvingLMMs-Lab/lmms-evalAccelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models