Finlay GC Hudson, William AP Smith
University of York
Paper | Project Page | TABE-51 Dataset | TABE-51 Dataset Generation Code
Clone this repository and then update submodules with:
git submodule update --init --recursive
Make venv in preferred format - for instruction, venv will be shown
python3 -m venv tabe_venv
source tabe_venv/bin/activate
pip install -r requirements.txt
cd third_party/segment-anything-2
pip install -e .
Versions in the requirements.txt
file, are ones that we tested on. But that's not to say different versions of torch etc. won't work
This is all tested on Nvidia A40 GPU's, but required VRAM is at least 32GB for the full pipeline
- Download the Stable Diffusion v1.5 Inpainting model
2. We use git lfs to do this:
# To install git lfs (on Ubuntu: else follow curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash sudo apt-get install git-lfs # To get the model git lfs install git clone https://huggingface.co/botp/stable-diffusion-v1-5-inpainting
- Download the CoCoCo checkpoints.
- Download the SAM2 checkpoint
Within src/tabe/configs/runtime_config.py
, update:
RuntimeConfig.sam_checkpoint
. To be the file path of the downloaded SAM2 checkpoint .pth file."checkpoints/sam2/sam2_hiera_large.pt"
Within src/tabe/configs/video_diffusion_config.py
, update:
VideoDiffusionConfig.sd_inpainting_model_path
. To be the root directory containing the downloaded Stable Diffusion v1.5 Inpainting model. Defaults to"checkpoints/stable-diffusion-v1-5-inpainting"
VideoDiffusionConfig.cococo_unet_weights
. To be the directory containing the 4 checkpoint files downloaded in Pretrained Models (2). Defaults to"checkpoints/cococo"
Datasets expected to be in the structure:
video_name/
├── frames/ # All frames of the video in numerical ordere
├── visible_masks/ # Visible masks in numerical order (must include at least the query mask)
├── gt_masks/ # (Optional) Ground truth amodal masks for frames
└── annos.json # (Optional) File with a dict of {"occlusion": [{"level": occlusion level strings, mapped to OcclusionLevel, "amount": float amount of occlusion}]}
Example of this structure shown in: examples/
Update DataConfig.output_root
to be the directory for model predictions to be saved (default: outputs
)
Trained models will also be saved here if VideoDiffusionTrainingConfig.no_cache
is set to False
, we default this to True
however as these models take a good amount of space!
For running TABE51 dataset, download data from here and set:
RuntimeConfig.dataset
toDatasetTypes.TABE51
DataConfigTABE.data_root
. To be the root directory of TABE51, should end withdata
directory
For running Custom dataset, set:
RuntimeConfig.dataset
toDatasetTypes.CUSTOM
DataConfigCustom.data_root
Directory of custom dataset, aligned with Dataset StructureDataConfigCustom.frame_dir_name
Name of the directory containing the frames. Defaults toframes
DataConfigCustom.vis_mask_dir_name
Name of the directory containing the visible modal masks. Defaults tovisible_masks
_ Additional optional extra data configs are found in src/tabe/configs/runtime_config.py
_
We provide a short example video from our TABE51 dataset, to showcase running on a video with a single query mask. To run the example, set:
RuntimeConfig.dataset
toDatasetTypes.CUSTOM
RuntimeConfig.video_names
totuple(["air_hockey_1"])
Then run:
PYTHONPATH=. python src/runner.py
If needing to run on a specific GPU idx run with:
CUDA_VISIBLE_DEVICES=<GPU_IDX> PYTHONPATH=. python src/runner.py
Once the amodal segmentation masks have been produced, we provide code to evaluate results on our TABE51 dataset.
PYTHONPATH=. python src/eval_tabe51.py
If you utilise our code and/or dataset, please consider citing our paper:
@article{hudson2024track,
title={Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation},
author={Hudson, Finlay GC and Smith, William AP},
journal={arXiv preprint arXiv:2411.19210},
year={2024}
}
We welcome any contributions or collaborations to this work. Also any issues found, we will try and help as best we can in the Issues section :)