[CVPR 2025] DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
Haoyang Li, Liang Wang, Chao Wang, Jing Jiang, Yan Peng and Guodong Long.
Shanghai University, University of Technology Sydney
Arxiv Link: https://arxiv.org/abs/2503.13443
-
NOTE: We are preparing our code repository (mainly rewriting comments to improve readability). We hope to release code in April.
-
(12 Apr. 2025) The code of PromptKD+DPC is released.
-
(18 Mar. 2025) Our paper is published on Arxiv.
-
(28 Feb. 2025) Our paper is accepted by CVPR 2025!
Figure 1. Overview of our proposed DPC. In (a) fine-tuning stage, DPC initializes parallel prompt P′ based on tuned prompt P obtained by fine-tuning backbone. Negative Sampler applies tuned prompt P as query to sample hard negatives, then feed them into Dynamic Hard Negative Optimizer to enhance base tasks. In (b) inference stage, DPC decouples base and new tasks by independent weight accumulation on dual prompts.The Base-New Trade-off (BNT) problem universally exists during the optimization of CLIP-based prompt tuning, where continuous fine-tuning on base (target) classes leads to a simultaneous decrease of generalization ability on new (unseen) classes. Existing approaches attempt to regulate the prompt tuning process to balance BNT by appending constraints. However, imposed on the same target prompt, these constraints fail to fully avert the mutual exclusivity between the optimization directions for base and new.
As a novel solution to this challenge, we propose the plug-and-play Dual-Prompt Collaboration (DPC) framework, the first that decoupling the optimization processes of base and new tasks at the prompt level.
Specifically, we clone a learnable parallel prompt based on the backbone prompt, and introduce a variable Weighting-Decoupling framework to independently control the optimization directions of dual prompts specific to base or new tasks, thus avoiding the conflict in generalization. Meanwhile, we propose a Dynamic Hard Negative Optimizer, utilizing dual prompts to construct a more challenging optimization task on base classes for enhancement. Extensive experiments on multiple backbones demonstrate that DPC can significantly improve base performance without introducing any external knowledge beyond the base classes, while maintaining generalization to new classes.
(1) To the best of our knowledge, DPC is the first prompt tuning enhancement strategy that decouples at the prompt level to overcome the BNT problem.
(2) We design a novel Dynamic Hard Negative Optimizer, significantly enhancing the base-class performance of DPC by establishing harder visual-text aligning tasks using dual prompts, achieving new State-Of-The-Art.
(3) We introduce plug-and-play and self-contained features to the model, endowing it with outstanding adaptability and transferability while minimizing requirements of external knowledge.
Results reported below show accuracy for base and new classes on 11 recognition-based datasets. For all 4 backbone models, DPC achieves general base-class performance improvements while fully retaining new class generalization.
For plug-and-play characteristic, we compare DPC with DePT (decouples base and new tasks at the feature level). Results indicate that decoupling at the prompt level is more thorough, furnishing a broader optimization space for the plug-and-play model.
(Acknowledgement: This part is modified from PromptKD's official repository.)
- Create the environment and install Dassl.pytorch library. Please follow the instructions detailed in INSTALL.md.
- Download publicly released pre-trained teacher ViT-L/14 CLIP models of PromptKD.
Files are publicly available at [Baidu Yun] [TeraBox] [Google Drive]
(Note that due to cloud space limitations, we only provide a limited number of models in Google Cloud. Sorry.)
After obtaining the teacher model, unzip these files and place the model in the./teacher_model
folder. - Download the original ViT-B/16 and ViT-L/14 CLIP model weights from the official OpenAI website. Then place these models in the
./clip
folder.
[ViT-B/16 CLIP] [ViT-L/14 CLIP] - Download the zip file of DPC-specific annotation files: [Google Drive] [Baidu Yun]
Then unzip and place theseSPLE_XXX.json
files in the./DATA/SPLE_Database
folder. - Prepare the dataset. Please follow the instructions detailed in DATASETS.md.
Since DPC is a plug-and-play model, DPC first uses the original backbone model (e.g., CoOp, PromptKD, ...) for the first stage of fine-tuning to get the tuned prompt. Then, in the second stage, fine-tuning based on the DPC-related trainer is performed to introduce and tune parallel prompt.
We will release scripts to automate this process in the future.
Below, we take the base-to-new fine-tuning task of the DPC+PromptKD model on the StanfordCars dataset as an example (epoch=20, weight_for_base=0.2; weight_for_new=0.0):
-
Execute fine-tuning on PromptKD backbone to get tuned prompt (and other activated parameters):
python train.py --root DATA/stanford_cars --seed 1 --trainer PromptKD --dataset-config-file configs/datasets/stanford_cars.yaml --config-file configs/trainers/PromptKD/vit_b16_c2_ep20_batch32_4+4ctx.yaml --output-dir output/PromptKD/base2new/train_base/stanford_cars/1_PromptKD_baseline/vit_b16_c2_ep20_batch32_4+4ctx/seed1 DATASET.NUM_SHOTS 0 TRAINER.MODAL base2novel TRAINER.PROMPTKD.TEMPERATURE 1.0 TRAINER.PROMPTKD.KD_WEIGHT 1000.0 TEST.SPLIT val
-
Based on the PromptKD backbone, continue to fine-tune by DPC on base classes:
python train.py --root DATA/stanford_cars --seed 1 --trainer StackSPLE_PromptKD --dataset-config-file configs/datasets/stanford_cars.yaml --config-file configs/trainers/SPLE/PromptKD/vit_b16_c2_ep20_batch4_4+4ctx.yaml --output-dir output/PromptKD/base2new/train_base/stanford_cars/3_SPLE_converse/vit_b16_c2_ep20_batch4_4+4ctx_con20/seed1 DATASET.NUM_SHOTS 16 SPLE.BACK_CKPT_PATH output/PromptKD/base2new/train_base/stanford_cars/1_PromptKD_baseline/vit_b16_c2_ep20_batch32_4+4ctx/seed1 SPLE.BACK_CKPT_EPOCH 20 SPLE.PIC_LIB DATA/SPLE_database/SPLE_StanfordCars.json SPLE.STACK.MODE converse SPLE.STACK.WEIGHT 0.2 DATASET.SUBSAMPLE_CLASSES base SPLE.STACK.WEIGHT_FOR_NEW 0.0 TRAINER.MODAL base2novel TRAINER.PROMPTKD.TEMPERATURE 1.0 TRAINER.PROMPTKD.KD_WEIGHT 1000.0 TEST.SPLIT val
-
Test the new-class generalization of DPC:
python train.py --root DATA/stanford_cars --seed 1 --trainer StackSPLE_PromptKD --dataset-config-file configs/datasets/stanford_cars.yaml --config-file configs/trainers/SPLE/PromptKD/vit_b16_c2_ep20_batch4_4+4ctx.yaml --output-dir output/PromptKD/base2new/train_base/stanford_cars/3_SPLE_converse/vit_b16_c2_ep20_batch4_4+4ctx_con20/seed1 DATASET.NUM_SHOTS 16 SPLE.BACK_CKPT_PATH output/PromptKD/base2new/train_base/stanford_cars/1_PromptKD_baseline/vit_b16_c2_ep20_batch32_4+4ctx/seed1 SPLE.BACK_CKPT_EPOCH 20 SPLE.PIC_LIB DATA/SPLE_database/SPLE_StanfordCars.json SPLE.STACK.MODE converse SPLE.STACK.WEIGHT 0.2 DATASET.SUBSAMPLE_CLASSES new SPLE.STACK.WEIGHT_FOR_NEW 0.0 TRAINER.MODAL base2novel TRAINER.PROMPTKD.TEMPERATURE 1.0 TRAINER.PROMPTKD.KD_WEIGHT 1000.0 TEST.SPLIT test
If you have any questions about our DPC model, you can submit an issue on GitHub or contact me by email (haoyang.li-3@student.uts.edu.au).
If you find our paper or repo helpful for your research, please consider citing our paper and giving this repo a star⭐. Thank you!
@article{li2025dpc,
title={DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models},
author={Li, Haoyang and Wang, Liang and Wang, Chao and Jiang, Jing and Peng, Yan and Long, Guodong},
journal={arXiv preprint arXiv:2503.13443},
year={2025}
}
Our code is based on PromptKD, DePT, PromptSRC, MaPLe and CoOp repository. We thank the authors for releasing their code.