[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Chang-Bin Zhang1, Yujie Zhong2, Kai Han1
1 The University of Hong Kong
2 Meituan Inc.
- [04/25] We release π€Online Demo of Mr. DETR.
- [04/25] Mr. DETR supports Instance segmentation now. We release the code and pre-trained weights.
- [03/25] We release the code and weights of Mr. DETR for object detection. You may find pre-trained weights at Huggingface.
- [03/25] Mr. DETR is accepted by CVPR 2025.
Demo Video for Street
Demo Video for Dense and Crowded Scene
Model | Backbone | Query | Epochs | AP | AP50 | AP75 | APs | APm | APl | |
---|---|---|---|---|---|---|---|---|---|---|
Mr. DETR-Deformable | Config & Weights | R50 | 300 | 12 | 49.5 | 67.0 | 53.7 | 32.1 | 52.5 | 64.7 |
Mr. DETR-Deformable | Config & Weights | R50 | 900 | 12 | 50.7 | 68.2 | 55.4 | 33.6 | 54.3 | 64.6 |
Mr. DETR-Deformable | Config & Weights | R50 | 900 | 24 | 51.4 | 69.0 | 56.2 | 34.9 | 54.8 | 66.0 |
Mr. DETR-DINO | Config & Weights | R50 | 900 | 12 | 50.9 | 68.4 | 55.6 | 34.6 | 53.8 | 65.2 |
Mr. DETR-Align | Config & Weights | R50 | 900 | 12 | 51.4 | 68.6 | 55.7 | 33.8 | 54.7 | 66.3 |
Mr. DETR-Align | Config & Weights | R50 | 900 | 24 | 52.3 | 69.5 | 56.7 | 35.2 | 56.0 | 67.0 |
Mr. DETR-Align | Config & Weights | Swin-L | 900 | 12 | 58.4 | 76.3 | 63.9 | 40.8 | 62.8 | 75.3 |
Mr. DETR-Align* | Config & Weights | Swin-L | 900 | 12 | 61.8 | 79.0 | 67.6 | 47.7 | 65.6 | 75.7 |
*: The model is fine-tuned on the Objects365 Pretrained Model with 5-scale. Due to the limited GPU resources, we only pre-trained the Swin-L based Mr. DETR for 549K iterations (batchsize of 16).
Model | Backbone | Query | Epochs | APbox | APmask | |
---|---|---|---|---|---|---|
Mr. DETR-Deformabl 8000 e-InstanceSeg | Config & Weights | R50 | 300 | 12 | 49.5 | 36.0 |
Mr. DETR-Deformable-InstanceSeg | Config & Weights | R50 | 300 | 24 | 50.3 | 37.6 |
- This repository is based on the Detrex framework, thus you may refer to installation docs.
- Python
$\ge$ 3.7 and PyTorch$\ge$ 1.10 are required. - First, clone
Mr. DETR
repository and initialize thedetectron2
submodule.
git clone https://github.com/Visual-AI/Mr.DETR.git
cd Mr.DETR
git submodule init
git submodule update
- Second, install
detectron2
anddetrex
pip install -e detectron2
pip install -r requirements.txt
pip install -e .
- If you encounter any
compilation error of cuda runtime
, you may try to use
export CUDA_HOME=<your_cuda_path>
- You may start with COCO 2017 dataset, which is organized as:
datasets/
βββ coco2017/
β
βββ annotations/
β βββ instances_train2017.json
β βββ instances_val2017.json
β
βββ train2017/
β βββ ...
β
βββ val2017/
βββ ...
- Then set the path of
DETECTRON2_DATASETS
by
export DETECTRON2_DATASETS=<.../datasets/>
You may also refer to the document.
- Visualize an image:
python demo/demo.py --config-file <config_file> \
--input assets/000000028449.jpg \
--output visualized_000000028449.jpg \
--confidence-threshold 0.5 \
--opts train.init_checkpoint=<checkpoint_path>
- Visualize a video:
python demo/demo.py --config-file <config_file> \
--video-input xxx.mp4 \
--output visualized.mp4 \
--confidence-threshold 0.5 \
--opts train.init_checkpoint=<checkpoint_path>
- Visualize test results:
python tools/visualize_json_results.py --input /path/to/x.json \ # path to the saved testing results
--output dir/ \
--dataset coco_2017_val
- For R50 based models:
python projects/train_net.py \
--config-file <config-file> \
--num-gpus N \
dataloader.train.total_batch_size=16 \
train.output_dir=<output_dir> \
train.amp.enabled=True \ # mixed precision training
model.transformer.encoder.use_checkpoint=True \ # gradient checkpointing, save gpu memory but lower speed
# to get mean model, which is more stable than ema, and improves about 0.1~0.2%.
python projects/modelmean_12ep.py --folder <output_dir>
python projects/modelmean_24ep.py --folder <output_dir>
python projects/train_net.py \
--config-file <config-file> \
--num-gpus N \
--eval-only \
train.output_dir=<output_dir> \
train.init_checkpoint=<output_dir>/meanmodel.pth \
- For Swin-L based models, set the weight decay as 0.05:
python projects/mr_detr_align/train_net_swin.py \
--config-file <config-file> \
--num-gpus N \
dataloader.train.total_batch_size=16 \
train.output_dir=<output_dir> \
train.amp.enabled=True \ # mixed precision training
model.transformer.encoder.use_checkpoint=True \ # gradient checkpointing, save gpu memory but lower speed
# to get mean model, which is more stable than ema, and improves about 0.1~0.2%.
python projects/modelmean_12ep.py --folder <output_dir>
python projects/modelmean_24ep.py --folder <output_dir>
python projects/train_net.py \
--config-file <config-file> \
--num-gpus N \
--eval-only \
train.output_dir=<output_dir> \
train.init_checkpoint=<output_dir>/meanmodel.pth \
python projects/train_net.py \
--config-file <config_file> \
--eval-only \
--num-gpus=4 \
train.init_checkpoint=<checkpoint_path> \
@inproceedings{zhang2024mr,
title={Mr. DETR: Instructive Multi-Route Training for Detection Transformers},
author={Zhang, Chang-Bin and Zhong, Yujie and Han, Kai},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}