Stars
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
An open-source AI agent that brings the power of Gemini directly into your terminal.
[ECCV 2024] This is the official implementation of HRMapNet, maintaining and utilizing a low-cost global rasterized map to enhance online vectorized map perception.
[ICLR 2024] Map Learning with Lane Segment for Autonomous Driving
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
【AIGC 实战入门笔记 —— AIGC 摩天大楼】分享 大语言模型(LLMs),大模型高效微调(SFT),检索增强生成(RAG),智能体(Agent),PPT自动生成, 角色扮演,文生图(Stable Diffusion) ,图像文字识别(OCR),语音识别(ASR),语音合成(TTS),人像分割(SA),多模态(VLM),Ai 换脸(Face Swapping), 文生视频(VD),图生…
🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applic…
[CVPR 2025] Multiple Object Tracking as ID Prediction
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"
[CVPR2024] The code for "MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction"
Uni-PrevPredMap: Extending PrevPredMap to a Unified Framework of Prior-Informed Modeling for Online Vectorized HD Map Construction
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
获取微信信息;读取数据库,本地查看聊天记录并导出为csv、html等格式用于AI训练,自动回复等。支持多账户信息获取,支持所有微信版本。
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
🚀 One-stop solution for creating your digital avatar from chat history 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. …
Code for paper "MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping", ECCV 2024 (Oral)
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
A computer algebra system written in pure Python
Align Anything: Training All-modality Model with Feedback
Train your AI self, amplify you, bridge the world
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
🔥 🔥 🔥 A paper list of some recent Computer Vision(CV) works