Starred repositories
An open-source AI agent that brings the power of Gemini directly into your terminal.
Towards a Generative 3D World Engine for Embodied Intelligence
The simplest, fastest repository for training/finetuning small-sized VLMs.
Official implementation of OpenWBT.
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
DreamO: A Unified Framework for Image Customization
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Specifications and tools for 360º video and spatial audio.
[ICML2025] Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
Framework for running AI locally on mobile devices and wearables. Hardware-aware C/C++ backend with wrappers for Flutter & React Native. Kotlin & Swift coming soon.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Awesome Unified Multimodal Models
[CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Official repo for: SuperEdit - Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Create web-based user interfaces with Python. The nice way.
The world’s fastest framework for building websites.
LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
A TTS model capable of generating ultra-realistic dialogue in one pass.
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
MCP server for RDS Services via OPENAPI.
[CVPR 2025 Highlight] Real-time dense scene reconstruction with SLAM3R
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.