More
Lists (2)
Sort Name ascending (A-Z)
Stars
MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips
verl: Volcano Engine Reinforcement Learning for LLMs
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Model Context Protocol Servers
A course on aligning smol models.
Solve Visual Understanding with Reinforced VLMs
An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…
Awesome Reasoning LLM Tutorial/Survey/Guide
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
[KDD 2024] Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs
Scalable RL solution for advanced reasoning of language models
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Witness the aha moment of VLM with less than $3.
Fully open reproduction of DeepSeek-R1
Minimal reproduction of DeepSeek R1-Zero
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
An open source code repository of driving world models, with training, inferencing, evaluation tools, and pretrained checkpoints.
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
A self-learning tutorail for CUDA High Performance Programing.
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.