8000 GitHub - xiaomi-mlab/Orion: Official code of ORION
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

xiaomi-mlab/Orion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ORION: A Holistic End-to-End Autonomous Driving Framework
by Vision-Language Instructed Action Generation

Haoyu Fu1*, Diankun Zhang2*, Zongchuang Zhao1*, Jianfeng Cui2, Dingkang Liang1†,
Chong Zhang2, Dingyuan Zhang1, Hongwei Xie2†, Bing Wang2, Xiang Bai1

1 Huazhong University of Science & Technology, 2 Xiaomi EV

(*) Equal contribution. (†) Project leader.

Paper PDF Project Page

Abstract

End-to-end (E2E) autonomous driving methods still struggle to make correct decisions in interactive closed-loop evaluation due to limited causal reasoning capability. Current methods attempt to leverage the powerful understanding and reasoning abilities of Vision-Language Models (VLMs) to resolve this dilemma. However, the problem is still open that few VLMs for E2E methods perform well in the closed-loop evaluation due to the gap between the semantic reasoning space and the purely numerical trajectory output in the action space. To tackle this issue, we propose ORION, a hOlistic E2E autonomous dRiving framework by vIsion-language instructed actiON generation. ORION uniquely combines a QT-Former to aggregate long-term history context, a Large Language Model (LLM) for driving scenario reasoning, and a generative planner for precision trajectory prediction. ORION further aligns the reasoning space and the action space to implement a unified E2E optimization for both visual question-answering (VQA) and planning tasks. Our method achieves an impressive closed-loop performance of 77.74 Driving Score (DS) and 54.62% Success Rate (SR) on the challenge Bench2Drive datasets, which outperforms state-of-the-art (SOTA) methods by a large margin of 14.28 DS and 19.61% SR.

Overview

News

[2025/04/10] ORION inference code and checkpoint release.

[2025/03/26] ArXiv paper release.

Currently Supported Features

  • ORION Inference Framework
  • Open-loop Evaluation
  • Close-loop Evalution
  • ORION Checkpoint
  • Chat-B2D Dataset
  • ORION Training Framework

Getting Started

git clone https://github.com/xiaomi-mlab/Orion.git
cd ./ORION
conda create -n orion python=3.8 -y
conda activate orion
pip install torch==2.4.1+cu118 torchvision==0.19.1+cu118 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118
pip install -v -e .
pip install -r requirements.txt

Preperation

You can refer to here to prepare the Bench2drive dataset.

ORION uses the pretrained 2D llm weights and vision encoder + projector weights provided by Omnidrive

cd /path/to/OmniDrive
mkdir ckpts

The vision encoder + projector weights are extracted from ckpts/pretrain_qformer/, which is pretrained by using llava data.

Open-loop evaluation

You can perform an open-loop evaluation of ORION with the following command

./adzoo/orion/orion_dist_eval.sh adzoo/orion/configs/orion_stage3.py [--PATH_CHECKPOINTS] 1

You also can perform a CoT inference of ORION with (this might be quite slow)

./adzoo/orion/orion_dist_eval.sh adzoo/orion/configs/orion_stage3_cot.py [--PATH_CHECKPOINTS] 1

We recommend inference for ORION on an NVIDIA A100 or other GPUs with more than 32GB of memory (inference in FP32, as default).

Meanwhile, Orion can also perform FP16 inference and achieve almost the same performance. We recommend fp16 inference on a GPU with more than 17GB of memory.

./adzoo/orion/orion_dist_eval.sh adzoo/orion/configs/orion_stage3_fp16.py [--PATH_CHECKPOINTS] 1

Close-loop evaluation

You can refer to here to clone Bench2Drive evaluation tools and prepare CARLA for it.

Follow here to use evaluation tools of Bench2Drive.

Note that you may first verify the correctness of the team agent, you need to set GPU_RANK, TEAM_AGENT, TEAM_CONFIG in the eval scripts.

You can set as following for close-loop evaluation

TEAM_CONFIG=adzoo/orion/configs/orion_stage3_agent.py+[CHECKPOINT_PATH]

Results and Checkpoints

Orion and other baselines

The results of UniAD & VAD are refer to the official results of Bench2DriveZoo

Method L2 (m) 2s Driving Score Success Rate(%) Config Download Eval Json
UniAD-Tiny 0.80 40.73 13.18 config Hugging Face/Baidu Cloud Json
UniAD-Base 0.73 45.81 16.36 config Hugging Face/Baidu Cloud Json
VAD 0.91 42.35 15.00 config Hugging Face/Baidu Cloud Json
ORION 0.68 77.74 54.62 config Hugging Face Json

Qalitative visualization & Analysis

We provide some visualization videos and qualitatively analysis for Orion and compared them with TCP-traj, UniAD-Base, VAD-Base at here.

Citation

If this work is helpful for your research, please consider citing:

@article{fu2025orion,
  title={ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation},
  author={Haoyu Fu and Diankun Zhang and Zongchuang Zhao and Jianfeng Cui and Dingkang Liang and Chong Zhang and Dingyuan Zhang and Hongwei Xie and Bing Wang and Xiang Bai},
  journal={arXiv:2503.19755},
  year={2025}
}

About

Official code of ORION

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0