DeepKD

This repo is the official implementation of DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer.

DeepKD is a novel knowledge distillation framework that addresses two fundamental challenges in knowledge transfer: (1) the inherent conflict between target-class and non-target-class knowledge flows, and (2) the noise introduced by low-confidence dark knowledge in non-target classes.

Key Features

Dual-Level Decoupling: Implements independent momentum updaters for task-oriented gradient (TOG), target-class gradient (TCG), and non-target-class gradient (NCG) components based on theoretical analysis of gradient signal-to-noise ratio (GSNR).
Adaptive Denoising: Introduces a dynamic top-k mask (DTM) mechanism that progressively filters low-confidence logits from both teacher and student models, following curriculum learning principles.
Theoretically Grounded: Provides rigorous analysis of loss components and optimization parameters in knowledge distillation, with momentum coefficients optimized based on GSNR characteristics.
Versatile Integration: Seamlessly works with existing logit-based distillation approaches while consistently achieving state-of-the-art performance across multiple benchmark datasets.

Performance

DeepKD demonstrates superior performance across various benchmarks including CIFAR-100, ImageNet, and MS-COCO, achieving higher GSNR and flatter loss landscapes compared to previous methods like vanilla KD, DKD, and DOT.

Framework

Usage

Installation

Environments:

Python ≥ 3.6
PyTorch ≥ 1.8.0
torchvision ≥ 0.10.0

Install the package:

pip install -r requirements.txt
python setup.py develop

Training on CIFAR-100

Download the cifar_teachers.tar and untar it to ./download_ckpts via tar xvf cifar_teachers.tar.

For KD

# KD
python tools/train.py --cfg configs/cifar100/kd/resnet32x4_resnet8x4.yaml
# KD+Ours
python tools/train.py --cfg configs/cifar100/kd/resnet32x4_resnet8x4.yaml

For DKD

# DKD
python tools/train.py --cfg configs/cifar100/dkd/resnet32x4_resnet8x4.yaml 
# DKD+Ours
python tools/train.py --cfg configs/cifar100/dkd/resnet32x4_resnet8x4.yaml

For MLKD

# MLKD
python tools/train.py --cfg configs/cifar100/mlkd/resnet32x4_resnet8x4.yaml
# MLKD+Ours
python tools/train.py --cfg configs/cifar100/mlkd/resnet32x4_resnet8x4.yaml

For CRLD

# CRLD
python tools/train.py --cfg configs/cifar100/crld/res32x4_res8x4.yaml
# CLRD+Ours
python tools/train.py --cfg configs/cifar100/crld/res32x4_res8x4.yaml

Training on ImageNet

Download the dataset at https://image-net.org/ and put it to ./data/imagenet

# KD+Ours
python tools/train.py --cfg configs/imagenet/r34_r18/deepkd_kd.yaml
# DKD+Ours
python tools/train.py --cfg configs/imagenet/r34_r18/deepkd_dkd.yaml
# MLKD+Ours
python tools/train.py --cfg configs/imagenet/r34_r18/deepkd_mlkd.yaml
# CRKD+Ours
python tools/train.py --cfg configs/imagenet/r34_r18/deepkd_crld.yaml

Object Detection on MS-COCO

Our implementation is built upon the ReviewKD codebase.

Installation

Install Detectron2:
- Follow the official installation guide at https://github.com/facebookresearch/detec 6EC0 tron2
Dataset Setup:
- Download the COCO dataset
- Place the dataset in the datasets/ directory
Pretrained Models:
- Download pretrained weights from ReviewKD releases
- Place the weights in the pretrained/ directory
- Note: The provided weights include both teacher models (from Detectron2's pretrained detectors) and student models (ImageNet pretrained weights)

Training Commands

Train different model configurations using the following commands:

# Tea: R-101, Stu: R-18
python train_net.py --config-file configs/DEEPKD/DKD-R18-R101.yaml --num-gpus 4

# Tea: R-101, Stu: R-50
python train_net.py --config-file configs/DEEPKD/DKD-R50-R101.yaml --num-gpus 4

# Tea: R-50, Stu: MV2
python train_net.py --config-file configs/DEEPKD/DKD-MV2-R50.yaml --num-gpus 4

Acknowledgement

We would like to express our sincere gratitude to the following open-source projects that have contributed to the development of DeepKD:

mdistiller - A comprehensive knowledge distillation toolkit
Multi-Level-Logit-Distillation - Implementation of multi-level logit distillation
logit-standardization-KD - Knowledge distillation with logit standardization

Citation

If this repo is helpful for your research, please consider citing the paper:

@misc{huang2025deepkddeeplydecoupleddenoised,
    title={DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer}, 
    author={Haiduo Huang and Jiangcheng Song and Yadong Zhang and Pengju Ren},
    year={2025},
    eprint={2505.15133},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2505.15133}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
images		images
mdistiller		mdistiller
tools		tools
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepKD

Key Features

Performance

Framework

Usage

Installation

Training on CIFAR-100

Training on ImageNet

Object Detection on MS-COCO

Installation

Training Commands

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

haiduo/DeepKD

Folders and files

Latest commit

History

Repository files navigation

DeepKD

Key Features

Performance

Framework

Usage

Installation

Training on CIFAR-100

Training on ImageNet

Object Detection on MS-COCO

Installation

Training Commands

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages