8000 GitHub - aheldis/CLIP-LoRA: An easy way to apply LoRA to CLIP. Implementation of the paper "Low-Rank Few-Shot Adaptation of Vision-Language Models" (CLIP-LoRA) [CVPRW 2024].
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

An easy way to apply LoRA to CLIP. Implementation of the paper "Low-Rank Few-Shot Adaptation of Vision-Language Models" (CLIP-LoRA) [CVPRW 2024].

Notifications You must be signed in to change notification settings

aheldis/CLIP-LoRA

 
 

Repository files navigation

Low-Rank Few-Shot Adaptation of Vision-Language Models [CVPRW 2024]

The official implementation of Low-Rank Few-Shot Adaptation of Vision-Language Models.

Authors: Maxime Zanella, Ismail Ben Ayed.

We present CLIP-LoRA, an easy-to-use few-shot method for Vision-Language Models with fixed hyperparameters for every task and every number of shots. This repository also aims at facilitating the usage of Low-Rank adapters (LoRA) in Vision-Language Models like CLIP.

PEFT
Figure 1: Low-Rank Adaptation (LoRA) is easy to use and does not create any additional inference latency.

Here is how to run the experiments:

  1. Installation
  2. Usage

A quick guide on how LoRA is implemented in this repository:

  1. LoRA in MultiheadAttention

Please consider supporting our work:

  1. Citation

If you have any inquiries:

  1. Contact

Installation

Environment configuration

Our code requires an environment with PyTorch installed. If you don't have one, consider creating a Python environment with:

conda create -y --name CLIP-LoRA python=3.10.0
conda activate CLIP-LoRA

And install Pytorch for instance with:

pip3 install torch==2.0.1 torchaudio==2.0.2 torchvision==0.15.2

Datasets installation

Please follow DATASETS.md to install the datasets.

How to execute CLIP-LoRA

Execute CLIP-LoRA on the ImageNet dataset with a random seed of 1 by entering the following command:

python main.py --root_path /path/to/your/data --dataset imagenet --seed 1

You can also exectute CLIP-LoRA on the 10 other datasets:

python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1

You can optionally provide a save_path to save the LoRA modules, which can be reload easily with the --eval_only ar 7E57 gument. The code will automatically check if your trained LoRA with the corresponding rank, alpha, encoder, params and position to ensure compatibility. The folder will be structured like that:

/your/save/path
└── backbone
    └── dataset
        └── Xshots
            ├── seedY

Here is the command line:

python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1 --save_path /your/save/path --eval_only 

LoRA in MultiheadAttention

The PlainMultiheadAttentionLoRA class in loralib/layers.py extends the standard PyTorch multi-head attention mechanism by incorporating Low-Rank Adaptation (LoRA). This class constructs explicit linear modules for each component of the attention mechanism—query (q), key (k), value (v), and output (o)—providing a structured and adaptable foundation for your experiments.

Class Overview

PlainMultiheadAttentionLoRA takes an existing nn.MultiheadAttention module, replicates its configuration, and integrates LoRA linear modules.

Key Features

  • Parameter Initialization: The initialization process involves copying weights and biases from a pre-existing multi-head attention model. Each LoRA module (q, k, v, o) is adapted based on the specified requirements in the enable_lora list.
  • LoRA Integration: The replacement of standard linear layers with LinearLoRA layers introduces low-rank matrices, which are parameterized by the rank of adaptation (r) and the scaling factor (lora_alpha).
  • Forward Pass: The forward_module method manages the attention computation, incorporating optional dropout settings on the LoRA modules.

Example Usage

The following snippet demonstrates how to initialize the PlainMultiheadAttentionLoRA with an existing multi-head attention module.

from loralib.layers import PlainMultiheadAttentionLoRA

# Initialize with an existing MultiheadAttention module
existing_mha = nn.MultiheadAttention(embed_dim=512, num_heads=8)
lora_mha = PlainMultiheadAttentionLoRA(existing_mha, enable_lora=['q', 'k', 'v', 'o'], r=4, lora_alpha=2)

Few-shot performance

few_shot
Figure 2: Detailed few-shot learning results on the 10 fine-grained datasets and ImageNet with the ViT-B/16 visual backbone. Average performance for the ViT-B/16, ViT-B/32 and ViT-L/14 on the same 11 datasets is reported in the last three plots.

Citation

If you find this project useful, please cite it as follows:

@inproceedings{zanella2024low,
  title={Low-Rank Few-Shot Adaptation of Vision-Language Models},
  author={Zanella, Maxime and Ben Ayed, Ismail},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  pages={1593--1603},
  year={2024}
}

Contact

For any inquiries, feel free to create an issue or contact us at maxime.zanella@uclouvain.be.

Acknowledgement

We express our gratitude to the CoOp and Tip-Adapter authors for their open-source contribution.

About

An easy way to apply LoRA to CLIP. Implementation of the paper "Low-Rank Few-Shot Adaptation of Vision-Language Models" (CLIP-LoRA) [CVPRW 2024].

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
0