Mingqian Ji , Jian Yang , Shanshan Zhang ✉
Nanjing University of Science and Technology
✉ Corresponding author
This repository represents the official implementation of the paper titled "DepthFusion: Depth-Aware Hybrid Feature Fusion for LiDAR-Camera 3D Object Detection".
We propose a novel depth encoding strategy to guide multi-modal fusion in 3D object detection. By encoding depth, our method adaptively adjusts the modality weights at both the global and local feature levels, enabling more effective LiDAR-camera fusion across varying depth ranges.
- [2025.05.06] - Released the code and model weights for DepthFusion.
- [2025.02.24] - Submitted DepthFusion to IEEE Transactions on Multimedia (TMM).
Config | mAP | NDS | Backbone | Image size | Latency | FPS | Model |
---|---|---|---|---|---|---|---|
DepthFusion-Light | 69.8 | 73.3 | ResNet18 | 256 |
72.4 | 13.8 | GoogleDrive |
DepthFusion-Base | 71.2 | 74.0 | ResNet50 | 320 |
114.9 | 8.7 | GoogleDrive |
DepthFusion-Large | 72.3 | 74.4 | SwinTiny | 384 |
175.4 | 5.7 | GoogleDrive |
step 1. Please prepare environment as that in Docker and run:
pip install -r requirements.txt
We use the following main environment:
torch 1.10.0+cu111
torchvision 0.10.0+rocm4.1
mmcls 0.25.0
mmcv-full 1.5.3
mmdet 2.25.1
mmdet3d 1.0.0rc4
mmsegmentation 0.25.0
step 2. Prepare DepthFusion repo by running:
git clone https://github.com/Mingqj/DepthFusion.git
cd DepthFusion
pip install -v -e .
step 3. Prepare nuScenes dataset and create the pkl for DepthFusion by running:
python tools/create_data_bevdet.py
step 4. Arrange the folder as:
DepthFusion
└──data
└── nuscenes
├── v1.0-trainval
├── sweeps
├── samples
└── gts
# single gpu
python tools/train.py /configs/depthfusion/depthfuison-light.py # light version
python tools/train.py /configs/depthfusion/depthfuison-base.py # base version
python tools/train.py /configs/depthfusion/depthfuison-large.py # large version
# multiple gpu
./tools/dist_train.sh /configs/depthfusion/depthfuison-light.py 8 # light version
./tools/dist_train.sh /configs/depthfusion/depthfuison-base.py 8 # base version
./tools/dist_train.sh /configs/depthfusion/depthfuison-large.py 8 # large version
# single gpu
python tools/test.py /configs/depthfusion/depthfuison-light.py $checkpoint --eval mAP # light version
python tools/test.py /configs/depthfusion/depthfuison-base.py $checkpoint --eval mAP # base version
python tools/test.py /configs/depthfusion/depthfuison-large.py $checkpoint --eval mAP # large version
# multiple gpu
./tools/dist_test.sh /configs/depthfusion/depthfuison-light.py $checkpoint 8 --eval mAP # light version
./tools/dist_test.sh /configs/depthfusion/depthfuison-base.py $checkpoint 8 --eval mAP # base version
./tools/dist_test.sh /configs/depthfusion/depthfuison-large.py $checkpoint 8 --eval mAP # large version
# light version
python tools/test.py /configs/depthfusion/depthfuison-light.py $checkpoint --format-only --eval-options jsonfile_prefix=$savepath
python tools/analysis_tools/vis.py $savepath/pts_bbox/results_nusc.json
# base version
python tools/test.py /configs/depthfusion/depthfuison-base.py $checkpoint --format-only --eval-options jsonfile_prefix=$savepath
python tools/analysis_tools/vis.py $savepath/pts_bbox/results_nusc.json
# large version
python tools/test.py /configs/depthfusion/depthfuison-large.py $checkpoint --format-only --eval-options jsonfile_prefix=$savepath
python tools/analysis_tools/vis.py $savepath/pts_bbox/results_nusc.json
We thank these great works and open-source codebases: MMDetection3D, BEVDet, CenterPoint, Lift-Splat-Shoot, Swin Transformer, BEVFusion, BEVDepth, nuScenes-C