Low-Rank Few-Shot Adaptation of Vision-Language Models [CVPRW 2024]

The official implementation of Low-Rank Few-Shot Adaptation of Vision-Language Models.

Authors: Maxime Zanella, Ismail Ben Ayed.

We present CLIP-LoRA, an easy-to-use few-shot method for Vision-Language Models with fixed hyperparameters for every task and every number of shots. This repository also aims at facilitating the usage of Low-Rank adapters (LoRA) in Vision-Language Models like CLIP.

Figure 1: Low-Rank Adaptation (LoRA) is easy to use and does not create any additional inference latency.

Here is how to run the experiments:

Installation
Usage

A quick guide on how LoRA is implemented in this repository:

LoRA in MultiheadAttention

Please consider supporting our work:

Citation

If you have any inquiries:

Contact

Installation

Environment configuration

Our code requires an environment with PyTorch installed. If you don't have one, consider creating a Python environment with:

conda create -y --name CLIP-LoRA python=3.10.0
conda activate CLIP-LoRA

And install Pytorch for instance with:

pip3 install torch==2.0.1 torchaudio==2.0.2 torchvision==0.15.2

Datasets installation

Please follow DATASETS.md to install the datasets.

How to execute CLIP-LoRA

Execute CLIP-LoRA on the ImageNet dataset with a random seed of 1 by entering the following command:

python main.py --root_path /path/to/your/data --dataset imagenet --seed 1

You can also exectute CLIP-LoRA on the 10 other datasets:

python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1

You can optionally provide a save_path to save the LoRA modules, which can be reload easily with the --eval_only ar 7E57 gument. The code will automatically check if your trained LoRA with the corresponding rank, alpha, encoder, params and position to ensure compatibility. The folder will be structured like that:

/your/save/path
└── backbone
    └── dataset
        └── Xshots
            ├── seedY

Here is the command line:

python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1 --save_path /your/save/path --eval_only

LoRA in MultiheadAttention

The PlainMultiheadAttentionLoRA class in loralib/layers.py extends the standard PyTorch multi-head attention mechanism by incorporating Low-Rank Adaptation (LoRA). This class constructs explicit linear modules for each component of the attention mechanism—query (q), key (k), value (v), and output (o)—providing a structured and adaptable foundation for your experiments.

Class Overview

PlainMultiheadAttentionLoRA takes an existing nn.MultiheadAttention module, replicates its configuration, and integrates LoRA linear modules.

Key Features

Parameter Initialization: The initialization process involves copying weights and biases from a pre-existing multi-head attention model. Each LoRA module (q, k, v, o) is adapted based on the specified requirements in the enable_lora list.
LoRA Integration: The replacement of standard linear layers with LinearLoRA layers introduces low-rank matrices, which are parameterized by the rank of adaptation (r) and the scaling factor (lora_alpha).
Forward Pass: The forward_module method manages the attention computation, incorporating optional dropout settings on the LoRA modules.

Example Usage

The following snippet demonstrates how to initialize the PlainMultiheadAttentionLoRA with an existing multi-head attention module.

from loralib.layers import PlainMultiheadAttentionLoRA

# Initialize with an existing MultiheadAttention module
existing_mha = nn.MultiheadAttention(embed_dim=512, num_heads=8)
lora_mha = PlainMultiheadAttentionLoRA(existing_mha, enable_lora=['q', 'k', 'v', 'o'], r=4, lora_alpha=2)

Few-shot performance

Figure 2: Detailed few-shot learning results on the 10 fine-grained datasets and ImageNet with the ViT-B/16 visual backbone. Average performance for the ViT-B/16, ViT-B/32 and ViT-L/14 on the same 11 datasets is reported in the last three plots.

Citation

If you find this project useful, please cite it as follows:

@inproceedings{zanella2024low,
  title={Low-Rank Few-Shot Adaptation of Vision-Language Models},
  author={Zanella, Maxime and Ben Ayed, Ismail},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  pages={1593--1603},
  year={2024}
}

Contact

For any inquiries, feel free to create an issue or contact us at maxime.zanella@uclouvain.be.

Acknowledgement

We express our gratitude to the CoOp and Tip-Adapter authors for their open-source contribution.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
clip		clip
datasets		datasets
loralib		loralib
.gitattributes		.gitattributes
DATASETS.md		DATASETS.md
README.md		README.md
few_shot.png		few_shot.png
lora.py		lora.py
main.py		main.py
peft1.jpg		peft1.jpg
peft2.jpg		peft2.jpg
run_utils.py		run_utils.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Low-Rank Few-Shot Adaptation of Vision-Language Models [CVPRW 2024]

Installation

Environment configuration

Datasets installation

How to execute CLIP-LoRA

LoRA in MultiheadAttention

Class Overview

Key Features

Example Usage

Few-shot performance

Citation

Contact

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

aheldis/CLIP-LoRA

Folders and files

Latest commit

History

Repository files navigation

Low-Rank Few-Shot Adaptation of Vision-Language Models [CVPRW 2024]

Installation

Environment configuration

Datasets installation

How to execute CLIP-LoRA

LoRA in MultiheadAttention

Class Overview

Key Features

Example Usage

Few-shot performance

Citation

Contact

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages