Stars
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
A computer algebra system written in pure Python
Align Anything: Training All-modality Model with Feedback
Train your AI self, amplify you, bridge the world
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
🔥 🔥 🔥 A paper list of some recent Computer Vision(CV) works
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
Solve Visual Understanding with Reinforced VLMs
🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手 - 视频字幕生成、断句、校正、字幕翻译全流程处理!- A powered tool for easy and efficient video subtitling.
Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning. Supports 1,107+ languages!
A simple screen parsing tool towards pure vision based GUI agent
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.