On Evaluating Adversarial Robustness of
Large Vision-Language Models

[Project Page] | [Slides] | [arXiv] | [Data Repository]

TL, DR:

In this research, we evaluate the adversarial robustness of recent large vision-language (generative) models (VLMs), under the most realistic and challenging setting with threat model of black-box access and targeted goal.

Our proposed method aims for the targeted response generation over large VLMs such as MiniGPT-4, LLaVA, Unidiffuser, BLIP/2, Img2Prompt, etc.

In other words, we mislead and let the VLMs say what you want, regardless of the content of the input image query.

Requirements

Platform: Linux
Hardware: A100 PCIe 40G
lmdb, tqdm
wandb, torchvision, etc.

In our work, we used DALL-E, Midjourney and Stable Diffusion for the target image generation and demonstration. For the large-scale experiments, we apply Stable Diffusion for target image generation. To install Stable Diffusion, we init our conda environment following Latent Diffusion Models. A suitable base conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Note that for different victim models, we will follow their official implementations and conda environments.

Targeted Image Generation

As discussed in our paper, to achieve a flexible targeted attack, we leverage a pretrained text-to-image model to generate an targetd image given a single caption as the targeted text. Consequently, in this way you can specify the targeted caption for attack by yourself!

We use Stable Diffusion, DALL-E or Midjourney as the text-to-image generators in our experiments. Here, we use Stable Diffusion for demonstration (thanks for open-sourcing!).

Prepare the scripts

git clone https://github.com/CompVis/stable-diffusion.git
cd stable-diffusion

then, prepare the full targeted captions from MS-COCO, or download our processed and cleaned version:

https://drive.google.com/file/d/19tT036LBvqYonzI7PfU9qVi3jVGApKrg/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
LAVIS_tool		LAVIS_tool
MiniGPT-4		MiniGPT-4
assets		assets
unidiff_tool		unidiff_tool
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
eval_clip_text_score.py		eval_clip_text_score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On Evaluating Adversarial Robustness of
Large Vision-Language Models

TL, DR:

Requirements

Targeted Image Generation

Prepare the scripts

Generate the targeted images

Adversarial Attack & Black-box Query

Overview of our AttackVLM strategy

Prepare the VLM scripts

Example: Unidiffuser

Evaluation

Visualization

Bibtex

Acknowledgement:

About

Releases

Packages

Contributors 2

Languages

License

yunqing-me/AttackVLM

Folders and files

Latest commit

History

Repository files navigation

On Evaluating Adversarial Robustness of Large Vision-Language Models

TL, DR:

Requirements

Targeted Image Generation

Prepare the scripts

Generate the targeted images

Adversarial Attack & Black-box Query

Overview of our AttackVLM strategy

Prepare the VLM scripts

Example: Unidiffuser

Evaluation

Visualization

Bibtex

Acknowledgement:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

On Evaluating Adversarial Robustness of
Large Vision-Language Models

Packages