Stars
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
A Collection of Works Related to 3D Object Detection with 4D mmWave Radar
LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian(Published in SIGGRAPH Asia 2024)
【CVPR 2025 Highlight】MonSter: Marry Monodepth to Stereo Unleashes Power
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Dragon Lake Parking Dataset by MPC Lab.
[SIGGRAPH'24] 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
[TVCG2024] PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction
VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving
This is a source repository for Multi-Agent Reinforcement Learning for Autonomous Driving research
Repo of "GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving"
Karras et al. (2022) diffusion models for PyTorch
Train transformer language models with reinforcement learning.
Official repository for SlaBins: Fisheye Depth Estimation using Slanted Bins on Road Environments (ICCV 2023)
Adding guardrails to large language models.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS 2024 Datasets and Benchmarks Track] Closed-Loop E2E-AD Benchmark Enhanced by World Model RL Expert
An Open-source RL System from ByteDance Seed and Tsinghua AIR
verl: Volcano Engine Reinforcement Learning for LLMs