8000 GitHub - LiYinqi/un2CLIP: A work to improve CLIP's visual detail capturing ability by inverting the unCLIP generative model.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

LiYinqi/un2CLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔄️ un2CLIP:
Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP

Yinqi Li1,2, Jiahe Zhao1,2, Hong Chang1,2, Ruibing Hou1, Shiguang Shan1,2, Xilin Chen1,2

1Institute of Computing Technology, Chinese Academy of Sciences

2University of Chinese Academy of Sciences

unCLIP provides an encoding-decoding tool for observing which features are disregarded by CLIP.

Our un2CLIP further leverages this framework to improve CLIP, aiming to recapture the disregarded features.

Installation

Clone this repository and create a conda environment with the following commands:

git clone git@github.com:LiYinqi/un2CLIP.git
cd un2CLIP

conda env create -f environment.yaml
conda activate un2clip

Pretrained Checkpoints

Our models are released on HuggingFace🤗.

CLIP Model Resolution MMVP-VLM (Original) MMVP-VLM (Ours) Link
OpenAI CLIP ViT-L-14 224 19.3 32.6 openai_vit_l_14_224.ckpt
OpenAI CLIP ViT-L-14 336 20.0 30.4 openai_vit_l_14_336.ckpt
OpenCLIP ViT-H-14 224 28.9 36.3 openclip_vit_h_14_224.ckpt
SigLIP ViT-SO-14 384 37.0 41.5 siglip_vit_so_14_384.ckpt

We assume the checkpoints are saved in the ./pretrained_models directory with their original names.

MMVP-VLM Evaluation

  1. Download the MMVP-VLM benchmark and place it in a local directory.

  2. Run the evaluation script for each CLIP model by specifying different un2clip_ckpt_path arguments. For example, to evaluate OpenAI CLIP ViT-L-14 at 224 resolution, run:

python eval_mmvpvlm.py \
  --benchmark_dir "YOUR_MMVP_VLM_PATH" \
  --un2clip_ckpt_path "./pretrained_models/openai_vit_l_14_224.ckpt"

TODO

  • Release model checkpoints.
  • Release training codes.

Citation

If you find this code or project useful, please consider giving a star⭐ or citing:

@article{li2025un2clip,
  title   = {{un$^2$CLIP}: Improving {CLIP}'s Visual Detail Capturing Ability via Inverting {unCLIP}},
  author  = {Yinqi Li and Jiahe Zhao and Hong Chang and Ruibing Hou and Shiguang Shan and Xilin Chen},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2505.24517}
}

About

A work to improve CLIP's visual detail capturing ability by inverting the unCLIP generative model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0