Finlay GC Hudson, William AP Smith
University of York
Paper | Project Page | TABE Codebase | TABE-51 Dataset - COMING SOON
We present TABE-51 (Track Anything Behind Everything 51), a video amodal segmentation dataset that provides high-quality ground truth labels for occluded objects, eliminating the need for human assumptions about hidden pixels. TABE-51 leverages a compositing approach using real-world video clips: we record an initial clip of an object without occlusion and then overlay a second clip featuring the occluding object. This compositing yields realistic video sequences with natural motion and highly accurate ground truth masks, allowing models to be evaluated on how well they handle occluded objects in authentic scenarios.
Once this repo is clone locally:
Make venv in preferred format - for instruction, venv will be shown
python3 -m venv tabe51-gen_venv
source tabe51-gen_venv/bin/activate
pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/sam2.git@2b90b9f
Download the SAM2 checkpoint.
Within src/tabe51_gen/configs/video_config.py
The most important configs to set are:
aspect_ratio
(tuple): Defines the aspect ratio of the frames (default: 16:9).video_names
(tuple): List of video file names to process for a single output, as can take in-front and behind frames from different videos to composite togetherdata_name
(str): Name you want to set the output data tosampled_fps
(float): What do you want to downsample the video to (default: 15.0).data_root_dir
(Path): Root directory for storing processed data. To start has to have a "videos" folder containing the video(s) to work within_front_frame_ranges
(tuple[list[int, int]]): Defines the start and stop frames for in-front objects. This can include multiple ranges for different clips or repeated ranges for multiple objects.behind_frame_range
(tuple[int, int]): Defines the frame range for the background object.sam_checkpoint
(Path): Path to the SAM2 segmentation model checkpoint.
Note: This method isn't as automated as would be ideal and is still a bit laborious, work will be done to try and improve this flow!
We showcase an example video within examples
Run: python split_videos_into_frames.py
Desc: First stage to split the video into separate frames, saved
to config.data_root_dir / config.downsampled_frame_dir
Run: python line_up_frames.py
Desc: manually line up the frames, this will save the aligned frames. The script will open a gui.
to config.data_root_dir / config.seperated_frames_dir
.
Instructions:
-
Set Initial Frame Ranges
- Define the background frame range as
behind_frame_range
. - Set the in-front frame ranges as
in_front_frame_ranges
(a tuple of tuples of however many in-front frames are wanted).
- Define the background frame range as
-
Check Frame Alignment
- When the visualizer loads, use the left and right arrow keys to navigate through frames.
- The far-right view shows an alpha overlap, helping visualize alignment.
-
Adjust Frame Ranges
- Close the program, edit the frame ranges, and repeat until the alignment looks correct.
-
Save the Aligned Frames
- In the image viewer, press 's' to save the aligned frames.
Run: python auto_bbox_cropper.py
Desc: We've found that SAM2 performs significantly better when the object of interest is within a cropped area of the full
frame. Attempting to segment directly from the full image can cause details to be lost. To improve accuracy, crop the
bounding boxes of the frames—this will save the cropped frames
to: config.data_root_dir / config.bbox_clipped_frames_dir
.
Instructions:
It will first start with the behind clip object
-
Click points on the object
- Left click for positive points
- Middle click for negative points (not always needed, but can help)
Note: If the object is not yet in the scene, press 'q' to skip that frame.
-
Run Segmentation
- Once points are selected (~5 points is a good baseline, covering all elements of the object), press 's' to segment the item through the video.
-
Check Output
- If the bounding box segments are good, proceed to Step 4.
- If the bounding box segments are poor:
- Delete the outputs.
- Run
auto_bbox_cropper.py
again, but this time:- Press 'm' once the image loads.
- Draw a manual bounding box around the object of interest.
- Press Enter to create the output.
- Either continue using this manual bounding box method or return to Step 1.
-
Once the behind clip object is completed, move to the in-front object and repeat the above steps until completion.
Run: python segment_cropped_frames.py
Desc: Used to segment the object from these cropped images.
Instructions:
Start with the Behind Clip Object
-
Click points on the object
- Same approach as
auto_bbox_cropper.py
. - If the object is not yet in the scene, press 'q' to skip that frame.
- Same approach as
-
Run Segmentation
- Once points are selected, choose one of the following methods:
- Press 's' to segment the object through the entire video.
- Press 'm' to segment only the current frame.
When to use 'm' instead of 's':
- Use 'm' when only part of the object is visible, as propagating incomplete data across the whole video may lead to poor segmentation.
- Use 's' if most or all of the object is visible to segment the full video.
- Once points are selected, choose one of the following methods:
-
Once the behind clip object is completed, move to the in-front object and repeat the above steps until completion.
composite_runner.py
- used to composite the segmented objects together into our dataset format
Note: This will run all top level names within the config.data_root_dir / config.segmented_frames_dir
directory
If you utilise our code and/or dataset, please consider citing our paper:
@article{hudson2024track,
title={Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation},
author={Hudson, Finlay GC and Smith, William AP},
journal={arXiv preprint arXiv:2411.19210},
year={2024}
}
We welcome any contributions or collaborations to this work. Also any issues found, we will try and help as best we can in the Issues section :)