The official implementation of Low-Rank Few-Shot Adaptation of Vision-Language Models.
Authors: Maxime Zanella, Ismail Ben Ayed.
We present CLIP-LoRA, an easy-to-use few-shot method for Vision-Language Models with fixed hyperparameters for every task and every number of shots. This repository also aims at facilitating the usage of Low-Rank adapters (LoRA) in Vision-Language Models like CLIP.
Figure 1: Low-Rank Adaptation (LoRA) is easy to use and does not create any additional inference latency.
Here is how to run the experiments:
A quick guide on how LoRA is implemented in this repository:
Please consider supporting our work:
If you have any inquiries:
Our code requires an environment with PyTorch installed. If you don't have one, consider creating a Python environment with:
conda create -y --name CLIP-LoRA python=3.10.0
conda activate CLIP-LoRA
And install Pytorch for instance with:
pip3 install torch==2.0.1 torchaudio==2.0.2 torchvision==0.15.2
Please follow DATASETS.md to install the datasets.
Execute CLIP-LoRA on the ImageNet dataset with a random seed of 1 by entering the following command:
python main.py --root_path /path/to/your/data --dataset imagenet --seed 1
You can also exectute CLIP-LoRA on the 10 other datasets:
python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1
You can optionally provide a save_path to save the LoRA modules, which can be reload easily with the --eval_only ar 7E57 gument. The code will automatically check if your trained LoRA with the corresponding rank, alpha, encoder, params and position to ensure compatibility. The folder will be structured like that:
/your/save/path
└── backbone
└── dataset
└── Xshots
├── seedY
Here is the command line:
python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1 --save_path /your/save/path --eval_only
The PlainMultiheadAttentionLoRA
class in loralib/layers.py
extends the standard PyTorch multi-head attention mechanism by incorporating Low-Rank Adaptation (LoRA). This class constructs explicit linear modules for each component of the attention mechanism—query (q
), key (k
), value (v
), and output (o
)—providing a structured and adaptable foundation for your experiments.
PlainMultiheadAttentionLoRA
takes an existing nn.MultiheadAttention
module, replicates its configuration, and integrates LoRA linear modules.
- Parameter Initialization: The initialization process involves copying weights and biases from a pre-existing multi-head attention model. Each LoRA module (
q
,k
,v
,o
) is adapted based on the specified requirements in theenable_lora
list. - LoRA Integration: The replacement of standard linear layers with
LinearLoRA
layers introduces low-rank matrices, which are parameterized by the rank of adaptation (r
) and the scaling factor (lora_alpha
). - Forward Pass: The
forward_module
method manages the attention computation, incorporating optional dropout settings on the LoRA modules.
The following snippet demonstrates how to initialize the PlainMultiheadAttentionLoRA
with an existing multi-head attention module.
from loralib.layers import PlainMultiheadAttentionLoRA
# Initialize with an existing MultiheadAttention module
existing_mha = nn.MultiheadAttention(embed_dim=512, num_heads=8)
lora_mha = PlainMultiheadAttentionLoRA(existing_mha, enable_lora=['q', 'k', 'v', 'o'], r=4, lora_alpha=2)
Figure 2: Detailed few-shot learning results on the 10 fine-grained datasets and ImageNet with the ViT-B/16 visual backbone. Average performance for the ViT-B/16, ViT-B/32 and ViT-L/14 on the same 11 datasets is reported in the last three plots.
If you find this project useful, please cite it as follows:
@inproceedings{zanella2024low,
title={Low-Rank Few-Shot Adaptation of Vision-Language Models},
author={Zanella, Maxime and Ben Ayed, Ismail},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
pages={1593--1603},
year={2024}
}
For any inquiries, feel free to create an issue or contact us at maxime.zanella@uclouvain.be.
We express our gratitude to the CoOp and Tip-Adapter authors for their open-source contribution.