8000 GitHub - Visual-AI/Mr.DETR: [CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers

License

Notifications You must be signed in to change notification settings

Visual-AI/Mr.DETR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mr. DETR

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Chang-Bin Zhang1, Yujie Zhong2, Kai Han1
1 The University of Hong Kong
2 Meituan Inc.

Conference Paper Project emal

PWC

Updates

  • [04/25] We release πŸ€—Online Demo of Mr. DETR.
  • [04/25] Mr. DETR supports Instance segmentation now. We release the code and pre-trained weights.
  • [03/25] We release the code and weights of Mr. DETR for object detection. You may find pre-trained weights at Huggingface.
  • [03/25] Mr. DETR is accepted by CVPR 2025.

Performance

Demo Video for Street
Demo Video for Dense and Crowded Scene

Method

Model Zoo

Model Backbone Query Epochs AP AP50 AP75 APs APm APl
Mr. DETR-Deformable Config & Weights R50 300 12 49.5 67.0 53.7 32.1 52.5 64.7
Mr. DETR-Deformable Config & Weights R50 900 12 50.7 68.2 55.4 33.6 54.3 64.6
Mr. DETR-Deformable Config & Weights R50 900 24 51.4 69.0 56.2 34.9 54.8 66.0
Mr. DETR-DINO Config & Weights R50 900 12 50.9 68.4 55.6 34.6 53.8 65.2
Mr. DETR-Align Config & Weights R50 900 12 51.4 68.6 55.7 33.8 54.7 66.3
Mr. DETR-Align Config & Weights R50 900 24 52.3 69.5 56.7 35.2 56.0 67.0
Mr. DETR-Align Config & Weights Swin-L 900 12 58.4 76.3 63.9 40.8 62.8 75.3
Mr. DETR-Align* Config & Weights Swin-L 900 12 61.8 79.0 67.6 47.7 65.6 75.7

*: The model is fine-tuned on the Objects365 Pretrained Model with 5-scale. Due to the limited GPU resources, we only pre-trained the Swin-L based Mr. DETR for 549K iterations (batchsize of 16).


Model Backbone Query Epochs APbox APmask
Mr. DETR-Deformabl 8000 e-InstanceSeg Config & Weights R50 300 12 49.5 36.0
Mr. DETR-Deformable-InstanceSeg Config & Weights R50 300 24 50.3 37.6

Environment Setup

  • This repository is based on the Detrex framework, thus you may refer to installation docs.
  • Python $\ge$ 3.7 and PyTorch $\ge$ 1.10 are required.
  • First, clone Mr. DETR repository and initialize the detectron2 submodule.
git clone https://github.com/Visual-AI/Mr.DETR.git
cd Mr.DETR
git submodule init
git submodule update
  • Second, install detectron2 and detrex
pip install -e detectron2
pip install -r requirements.txt
pip install -e .
  • If you encounter any compilation error of cuda runtime, you may try to use
export CUDA_HOME=<your_cuda_path>
  • You may start with COCO 2017 dataset, which is organized as:
datasets/
└── coco2017/
    β”‚
    β”œβ”€β”€ annotations/                  
    β”‚   β”œβ”€β”€ instances_train2017.json  
    β”‚   └── instances_val2017.json    
    β”‚
    β”œβ”€β”€ train2017/                    
    β”‚   └── ...
    β”‚
    └── val2017/                   
        └── ...
  • Then set the path of DETECTRON2_DATASETS by
export DETECTRON2_DATASETS=<.../datasets/>

API and Demo

You may also refer to the document.

  • Visualize an image:
python demo/demo.py --config-file <config_file> \
                    --input assets/000000028449.jpg \
                    --output visualized_000000028449.jpg \
                    --confidence-threshold 0.5 \
                    --opts train.init_checkpoint=<checkpoint_path> 
  • Visualize a video:
python demo/demo.py --config-file <config_file> \
                    --video-input xxx.mp4 \
                    --output visualized.mp4 \
                    --confidence-threshold 0.5 \
                    --opts train.init_checkpoint=<checkpoint_path> 
  • Visualize test results:
python tools/visualize_json_results.py --input /path/to/x.json \ # path to the saved testing results
                                       --output dir/ \
                                       --dataset coco_2017_val

Train

  • For R50 based models:
python projects/train_net.py \
    --config-file <config-file> \
    --num-gpus N \
    dataloader.train.total_batch_size=16 \
    train.output_dir=<output_dir> \
    train.amp.enabled=True \ # mixed precision training
    model.transformer.encoder.use_checkpoint=True \ # gradient checkpointing, save gpu memory but lower speed

# to get mean model, which is more stable than ema, and improves about 0.1~0.2%.
python projects/modelmean_12ep.py --folder <output_dir>
python projects/modelmean_24ep.py --folder <output_dir>

python projects/train_net.py \
    --config-file <config-file> \
    --num-gpus N \
    --eval-only \
    train.output_dir=<output_dir> \
    train.init_checkpoint=<output_dir>/meanmodel.pth \
  • For Swin-L based models, set the weight decay as 0.05:
python projects/mr_detr_align/train_net_swin.py \
    --config-file <config-file> \
    --num-gpus N \
    dataloader.train.total_batch_size=16 \
    train.output_dir=<output_dir> \
    train.amp.enabled=True \ # mixed precision training
    model.transformer.encoder.use_checkpoint=True \ # gradient checkpointing, save gpu memory but lower speed

# to get mean model, which is more stable than ema, and improves about 0.1~0.2%.
python projects/modelmean_12ep.py --folder <output_dir>
python projects/modelmean_24ep.py --folder <output_dir>

python projects/train_net.py \
    --config-file <config-file> \
    --num-gpus N \
    --eval-only \
    train.output_dir=<output_dir> \
    train.init_checkpoint=<output_dir>/meanmodel.pth \

Evaluate

python projects/train_net.py \
    --config-file <config_file> \
    --eval-only \
    --num-gpus=4 \
    train.init_checkpoint=<checkpoint_path> \

Citation

@inproceedings{zhang2024mr,
  title={Mr. DETR: Instructive Multi-Route Training for Detection Transformers},
  author={Zhang, Chang-Bin and Zhong, Yujie and Han, Kai},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

About

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

0