Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

Original title: Dose End-to-End Autonomous Driving Really Need Perception Tasks?

Peidong Li, Dixiao Cui

Zhijia Technology, Suzhou, China

News

2025.03.23 Code and checkpoint released. 🚀
2025.02.01 Chinese Blog of SSR available in zhihu.
2025.01.23 SSR is accepted to ICLR 2025. 🎉
2024.09.30 Paper of SSR available in arxiv.

Introduction

We introduce SSR, a novel framework that leverages navigation-guided Sparse Scene Representation, achieving state-of-the-art performance with minimal costs. Inspired by how human drivers selectively focus on scene elements based on navigation cues, we find that only a minimal set of tokens from dense BEV features is necessary for effective scene representation in autonomous driving.

Overview

SSR consists of two parts: the purple part, which is used during both training and inference, and the gray part, which is only used during training. In the purple part, the dense BEV feature is first compressed by the Scenes TokenLearner into sparse queries, which are then used for planning via cross-attention. In the gray part, the predicted BEV feature is obtained from the BEV world model. The future BEV feature is then used to supervise the predicted BEV feature, enhancing both the scene representation and the planning decoder.

Prepare

Train and Test

Train SSR with 8 GPUs

cd /path/to/SSR
conda activate ssr
python -m torch.distributed.run --nproc_per_node=8 --master_port=2333 tools/train.py projects/configs/SSR/SSR_e2e.py --launcher pytorch --deterministic --work-dir path/to/save/outputs

Eval SSR with 1 GPU

cd /path/to/SSR
conda activate ssr
CUDA_VISIBLE_DEVICES=0 python tools/test.py projects/configs/SSR/SSR_e2e.py /path/to/ckpt.pth --launcher none --eval bbox --tmpdir tmp

Results

*Af 8649 ter refactoring, the released checkpoint exhibits minor difference with results reported in the paper.

Log and Checkpoint : Google Drive

UniAD-style metric protocal

Method	L2_MAX (m) 1s	L2_MAX (m) 2s	L2_MAX (m) 3s	L2_MAX (m) Avg.	CR_MAX (%) 1s	CR_MAX (%) 2s	CR_MAX (%) 3s	CR_MAX (%) Avg.
SSR	0.25	0.64	1.33	0.74	0.00	0.08	0.43	0.17

VAD-style metric protocal

Method	L2_AVG (m) 1s	L2_AVG (m) 2s	L2_AVG (m) 3s	L2_AVG (m) Avg.	CR_AVG (%) 1s	CR_AVG (%) 2s	CR_AVG (%) 3s	CR_AVG (%) Avg.
SSR	0.19	0.36	0.62	0.39	0.02	0.03	0.13	0.06

Visualization

We visualize the results of our framework on the nuScenes dataset and Carla Town05 Long benchmark.

nuScenes

Carla

video_ssr.mp4

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{li2025navigationguidedsparsescenerepresentation,
  title={Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving},
  author={Peidong Li and Dixiao Cui},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

License

All code in this repository is under the Apache License 2.0.

Acknowledgement

SSR is based on the following projects: VAD, GenAD, BEV-Planner and TokenLearner. Many thanks for their excellent contributions to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
projects		projects
resources		resources
tools		tools
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

News

Introduction

Overview

Prepare

Train and Test

Train SSR with 8 GPUs

Eval SSR with 1 GPU

Results

UniAD-style metric protocal

VAD-style metric protocal

Visualization

nuScenes

Carla

Citation

License

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

PeidongLi/SSR

Folders and files

Latest commit

History

Repository files navigation

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

News

Introduction

Overview

Prepare

Train and Test

Train SSR with 8 GPUs

Eval SSR with 1 GPU

Results

UniAD-style metric protocal

VAD-style metric protocal

Visualization

nuScenes

Carla

Citation

License

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages