Stars
[RSS'25] This repository is the implementation of "NaVILA: Legged Robot Vision-Language-Action Model for Navigation"
[CVPR 2025 Highlight] Towards Autonomous Micromobility through Scalable Urban Simulation
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
Train transformer language models with reinforcement learning.
A curated list of awesome 3D scene generation papers. (arXiv 2505.05474)
ICCV 2025 | TesserAct: Learning 4D Embodied World Models
[CVPR 2025] The offical Implementation of "Universal Actions for Enhanced Embodied Foundation Models"
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
Some Conferences' accepted paper lists (including AI, ML, Robotic)
Code for paper Empowering Large Language Model Agents through Action Learning
Create a ros-humble-ros1-bridge package that can be used directly in ROS2 Humble
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.
[CVPR 2025] RoomTour3D - Geometry-aware, cheap and automatic data from web videos for embodied navigation
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method (CVPR-25)
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
The tutorial and docker image to play with Unitree Go2 quadruped robot dog.
[CVPR2025] CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Unitree Go2, Unitree G1 support for Nvidia Isaac Lab (Isaac Gym / Isaac Sim)
An open-source framework for Gaussian Splats Compression research
(ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback
FlashMLA: Efficient MLA decoding kernels
[CVPR 2025] UniScene: Unified Occupancy-centric Driving Scene Generation
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control