This repository includes a PyTorch implementation of the ICML 2023 paper On the Generalization of Multi-modal Contrastive Learning authored by Qi Zhang*, Yifei Wang*, Yisen Wang.
In this repository, we consider four strategies for leveraging CLIP to help self-supervised contrastive learning with SimCLR.
Method | Baseline (SimCLR) | AddNewPositive | DropFalsePositive | DropFalseNegative | DropEasyNegative |
---|---|---|---|---|---|
Linear Acc | 61.2 | 67.4 (+6.2) | 61.8 (+0.6) | 61.4 (+0.2) | 62.3 (+1.1) |
Create a python enviroment with the provided config file and miniconda:
conda env create -f environment.yml
conda activate simclr_pytorch
export IMAGENET_PATH=... # If you have enough RAM using /dev/shm usually accelerates data loading time
export EXMAN_PATH=... # A path to logs
Install the official CLIP respository and download the official CLIP models.
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
Model training consists of two steps: (1) self-supervised encoder pretraining and (2) classifier learning with the encoder representations. Both steps are done with the train.py
script.
The configs imagenet_params_epochs*_bs*.yaml
contain the parameters to reproduce results for ImageNet dataset. The pretraining command is:
python train.py --config configs/imagenet_train_epochs100_bs512.yaml --method <Method>
The methods include 'simclr', 'new_positive', 'drop_false_positive', 'drop_false_negative', 'drop_easy_negative'.
To train a linear classifier on top of the pretrained encoder, run the following command:
python train.py --config configs/cifar_eval.yaml --encoder_ckpt <path-to-encoder>
If you find our code useful, please cite
@inproceedings{
zhang2023generalization,
title={On the Generalization of Multi-modal Contrastive Learning},
author={Qi Zhang and Yifei Wang and Yisen Wang},
booktitle={International Conference on Machine Learning},
year={2023},
}
Our codes borrows the implementations of SimCLR in https://github.com/AndrewAtanov/simclr-pytorch.