Beomyoung Kim, Chanyong Shin, Joonhyun Jeong, Hyungsik Jung, Se-Yun Lee, Sewhan Chun, Dong-Hyun Hwang, Joonsang Yu
NAVER Cloud, ImageVision
The recent segmentation foundation model, Segment Anything Model (SAM), exhibits strong zero-shot segmentation capabilities, but it falls short in generating fine-grained precise masks. To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, constructing the new SA1B-Matte dataset without costly manual annotations. Training SAM with this dataset enables it to generate precise matte masks while maintaining its zero-shot capability. Second, we design the zero-shot matting model equipped with a hierarchical pixel decoder to enhance mask representation, along with a prompt-aware masked attention mechanism to improve performance by enabling the model to focus on regions specified by visual prompts. We evaluate ZIM using the newly introduced MicroMat-3K test set, which contains high-quality micro-level matte labels. Experimental results show that ZIM outperforms existing methods in fine-grained mask generation and zero-shot generalization. Furthermore, we demonstrate the versatility of ZIM in various downstream tasks requiring precise masks, such as image inpainting and 3D NeRF. Our contributions provide a robust foundation for advancing zero-shot matting and its downstream applications across a wide range of computer vision tasks.
- 2024.11.04: official ZIM code update
Install the required packages with the command below:
pip install zim_anything
or
git clone https://github.com/naver-ai/ZIM.git
cd ZIM; pip install -e .
To enable GPU acceleration, please install the compatible onnxruntime-gpu
package based on your environment settings (CUDA and CuDNN versions), following the instructions in the onnxruntime installation docs.
We provide a Gradio demo code in demo/gradio_demo.py
. You can run our model demo locally by running:
python demo/gradio_demo.py
In addition, we provide a Gradio demo code demo/gradio_demo_comparison.py
to qualitatively compare ZIM with SAM:
python demo/gradio_demo.py
After the installation step is done, you can utilize our model in just a few lines as below. ZimPredictor
is compatible with SamPredictor
, such as set_image()
or predict()
.
from zim_anything import zim_model_registry, ZimPredictor
backbone = "vit_l"
ckpt_p = "results/zim_vit_l_2092"
model = zim_model_registry[backbone](checkpoint=ckpt_p)
if torch.cuda.is_available():
model.cuda()
predictor = ZimPredictor(model)
predictor.set_image(<image>)
masks, _, _ = predictor.predict(<input_prompts>)
We also provide code for generating masks for an entire image and visualization:
from zim_anything import zim_model_registry, ZimAutomaticMaskGenerator
from zim_anything.utils import show_mat_anns
backbone = "vit_l"
ckpt_p = "results/zim_vit_l_2092"
model = zim_model_registry[backbone](checkpoint=ckpt_p)
if torch.cuda.is_available():
model.cuda()
mask_generator = ZimAutomaticMaskGenerator(model)
masks = mask_generator.generate(<image>) # Automatically generated masks
masks_vis = show_mat_anns(<image>, masks) # Visualize masks
Additionally, masks can be generated for images from the command line:
bash script/run_amg.sh
We provide Pretrained-weights of ZIM.
MODEL ZOO | Link |
---|---|
zim_vit_b | download |
zim_vit_l | download |
We introduce a new test set named MicroMat-3K, to evaluate zero-shot interactive matting models. It consists of 3,000 high-resolution images paired with micro-level matte labels, providing a comprehensive benchmark for testing various matting models under different levels of detail.
Downloading MicroMat-3K dataset is available here or huggingface
Dataset structure should be as follows:
└── /path/to/dataset/MicroMat3K
├── img
│ ├── 0001.png
├── matte
│ ├── coarse
│ │ ├── 0001.png
│ └── fine
│ ├── 0001.png
├── prompt
│ ├── coarse
│ │ ├── 0001.png
│ └── fine
│ ├── 0001.png
└── seg
├── coarse
│ ├── 0001_01.json
└── fine
├── 0001_01.json
Prompt file configuration should be as follows:
{
"point": [[x1, y1, 1], [x2, y2, 0], ...], # 1: Positive, 0: Negative prompt
"bbox": [x1, y1, x2, y2] # [X, Y, X, Y] format
}
We provide an evaluation script, which includes a comparison with SAM, in script/run_eval.sh
. Make sure the dataset structure is prepared.
First, modify data_root
in script/run_eval.sh
...
data_root="/path/to/dataset/"
...
Then, run evaluation script file.
bash script/run_eval.sh
The evaluation result on the MicroMat-3K dataset would be as follows:
@article{kim2024zim,
title={ZIM: Zero-Shot Image Matting for Anything},
author={Kim, Beomyoung and Shin, Chanyong and Jeong, Joonhyun and Jung, Hyungsik and Lee, Se-Yun and Chun, Sewhan and Hwang, Dong-Hyun and Yu, Joonsang},
journal={arXiv preprint arXiv:2411.00626},
year={2024}
}
ZIM
Copyright (c) 2024-present NAVER Cloud Corp.
CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)