8000 GitHub - haiduo/DeepKD: This repository is the official implementation of "DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ DeepKD Public

This repository is the official implementation of "DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer"

License

Notifications You must be signed in to change notification settings

haiduo/DeepKD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepKD

This repo is the official implementation of DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer.

DeepKD is a novel knowledge distillation framework that addresses two fundamental challenges in knowledge transfer: (1) the inherent conflict between target-class and non-target-class knowledge flows, and (2) the noise introduced by low-confidence dark knowledge in non-target classes.

Key Features

  • Dual-Level Decoupling: Implements independent momentum updaters for task-oriented gradient (TOG), target-class gradient (TCG), and non-target-class gradient (NCG) components based on theoretical analysis of gradient signal-to-noise ratio (GSNR).

  • Adaptive Denoising: Introduces a dynamic top-k mask (DTM) mechanism that progressively filters low-confidence logits from both teacher and student models, following curriculum learning principles.

  • Theoretically Grounded: Provides rigorous analysis of loss components and optimization parameters in knowledge distillation, with momentum coefficients optimized based on GSNR characteristics.

  • Versatile Integration: Seamlessly works with existing logit-based distillation approaches while consistently achieving state-of-the-art performance across multiple benchmark datasets.

Performance

DeepKD demonstrates superior performance across various benchmarks including CIFAR-100, ImageNet, and MS-COCO, achieving higher GSNR and flatter loss landscapes compared to previous methods like vanilla KD, DKD, and DOT.

Framework

Usage

Installation

Environments:

  • Python ≥ 3.6
  • PyTorch ≥ 1.8.0
  • torchvision ≥ 0.10.0

Install the package:

pip install -r requirements.txt
python setup.py develop

Training on CIFAR-100

  • Download the cifar_teachers.tar and untar it to ./download_ckpts via tar xvf cifar_teachers.tar.
  1. For KD
# KD
python tools/train.py --cfg configs/cifar100/kd/resnet32x4_resnet8x4.yaml
# KD+Ours
python tools/train.py --cfg configs/cifar100/kd/resnet32x4_resnet8x4.yaml 
  1. For DKD
# DKD
python tools/train.py --cfg configs/cifar100/dkd/resnet32x4_resnet8x4.yaml 
# DKD+Ours
python tools/train.py --cfg configs/cifar100/dkd/resnet32x4_resnet8x4.yaml
  1. For MLKD
# MLKD
python tools/train.py --cfg configs/cifar100/mlkd/resnet32x4_resnet8x4.yaml
# MLKD+Ours
python tools/train.py --cfg configs/cifar100/mlkd/resnet32x4_resnet8x4.yaml
  1. For CRLD
# CRLD
python tools/train.py --cfg configs/cifar100/crld/res32x4_res8x4.yaml
# CLRD+Ours
python tools/train.py --cfg configs/cifar100/crld/res32x4_res8x4.yaml

Training on ImageNet

# KD+Ours
python tools/train.py --cfg configs/imagenet/r34_r18/deepkd_kd.yaml
# DKD+Ours
python tools/train.py --cfg configs/imagenet/r34_r18/deepkd_dkd.yaml
# MLKD+Ours
python tools/train.py --cfg configs/imagenet/r34_r18/deepkd_mlkd.yaml
# CRKD+Ours
python tools/train.py --cfg configs/imagenet/r34_r18/deepkd_crld.yaml 

Object Detection on MS-COCO

Our implementation is built upon the ReviewKD codebase.

Installation

  1. Install Detectron2:

  2. Dataset Setup:

    • Download the COCO dataset
    • Place the dataset in the datasets/ directory
  3. Pretrained Models:

    • Download pretrained weights from ReviewKD releases
    • Place the weights in the pretrained/ directory
    • Note: The provided weights include both teacher models (from Detectron2's pretrained detectors) and student models (ImageNet pretrained weights)

Training Commands

Train different model configurations using the following commands:

# Tea: R-101, Stu: R-18
python train_net.py --config-file configs/DEEPKD/DKD-R18-R101.yaml --num-gpus 4

# Tea: R-101, Stu: R-50
python train_net.py --config-file configs/DEEPKD/DKD-R50-R101.yaml --num-gpus 4

# Tea: R-50, Stu: MV2
python train_net.py --config-file configs/DEEPKD/DKD-MV2-R50.yaml --num-gpus 4

Acknowledgement

We would like to express our sincere gratitude to the following open-source projects that have contributed to the development of DeepKD:

Citation

If this repo is helpful for your research, please consider citing the paper:

@misc{huang2025deepkddeeplydecoupleddenoised,
    title={DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer}, 
    author={Haiduo Huang and Jiangcheng Song and Yadong Zhang and Pengju Ren},
    year={2025},
    eprint={2505.15133},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2505.15133}, 
}

About

This repository is the official implementation of "DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0