Stars
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
verl: Volcano Engine Reinforcement Learning for LLMs
A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.
A PyTorch native platform for training generative AI models
Minimalistic 4D-parallelism distributed training framework for education purpose
A highly optimized LLM inference acceleration engine for Llama and its variants.
Tools for merging pretrained large language models.
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
MINT-1T: A one trillion token multimodal interleaved dataset.
Ongoing research training transformer models at scale
Stable Diffusion web UI
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
a state-of-the-art-level open visual language model | 多模态预训练模型
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
A series of large language models developed by Baichuan Intelligent Technology
DataComp: In search of the next generation of multimodal datasets
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
A 13B large language model developed by Baichuan Intelligent Technology
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
✨✨Latest Advances on Multimodal Large Language Models
Research Trends in LLM-guided Multimodal Learning.