8000 GitHub - yashbhalgat/egoseg3d: Code for ACCV 2024 paper: "3D-Aware Instance Segmentation and Tracking in Egocentric Videos"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Code for ACCV 2024 paper: "3D-Aware Instance Segmentation and Tracking in Egocentric Videos"

Notifications You must be signed in to change notification settings

yashbhalgat/egoseg3d

Repository files navigation

3D-Aware Instance Segmentation and Tracking in Egocentric Videos

Yash Bhalgat1*    Vadim Tschernezki1,2*    Iro Laina1    João F. Henriques1    Andrea Vedaldi1    Andrew Zisserman1

1 Visual Geometry Group, University of Oxford    2 NAVER LABS Europe

* Equal contribution

This repository contains the official implementation of 3D-Aware Instance Segmentation and Tracking in Egocentric Videos.

Our method leverages 3D awareness for robust instance segmentation and tracking in egocentric videos. The approach maintains consistent object identities through occlusions and out-of-view scenarios by integrating scene geometry with instance-level tracking. The figure above shows: (a) input egocentric video frames, (b) DEVA's 2D tracking which loses object identity after occlusion, and (c) our method maintaining consistent tracking through challenging scenarios.

Prerequisites

Before running the code, you'll need to install several external dependencies.

We recommend creating a conda/mamba environment, and then installing other dependencies with the provided requirements.txt.

mamba create -n egoseg3d python=3.8
mamba activate egoseg3d
mamba install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Then continue with the installation of following custom dependencies.

  1. Depth Anything: Required for depth estimation

    git clone https://github.com/LiheYoung/Depth-Anything/tree/1e1c8d373ae6383ef6490a5c2eb5ef29fd085993

    Copy scripts/preprocessing/depth_anything_EPIC.py to the root of the above repository.

  2. Tracking-Anything-with-DEVA (provided with this repository)

  3. MASA (provided with this repository)

Usage

NOTE: If you want to skip the preprocessing (below) and start with the evaluation instead, you can jump to step 5.

After downloading the EPIC-FIELDS datasets, few preprocessing steps are required before running the tracking pipeline.

First, extract the 3D mesh from the sparse point cloud:

bash scripts_sh/reconstruct_mesh.sh <VID_1> <VID_2> <VID_3> ...

Generate depth maps using Depth Anything:

cd Depth-Anything
python depth_anything_EPIC.py --img-path <images dir> --outdir <output dir>

Extract and align depth maps:

# Extract mesh depth
python scripts/preprocessing/extract_mesh_depth.py --vid=$VID --root $ROOT

# Align depth maps
python scripts/preprocessing/extract_aligned_depth.py --vid=$VID --root $ROOT

2. Instance Segmentation and Feature Extraction

Run DEVA for segmentation:

SFACTOR=5
PID=$(echo $VID | cut -d'_' -f1)
python scripts/deva_baseline.py \
    --img_path $ROOT/mesh/$VID/images \
    --output $ROOT/$PID/$VID/segmaps/deva_OWLv2_s$SFACTOR \
    --amp --temporal_setting semionline --prompt "" \
    --DINO_THRESHOLD 0.4 --detector_type owlv2 \
    --subsample_factor=$SFACTOR --classes=$ROOT/visor/$VID\_classes.pt

Extract DINO features:

python scripts/extract_features_DEVA.py \
    --deva_seg_dir $ROOT/$PID/$VID/segmaps/deva_OWLv2_s$SFACTOR \
    --images_dir $ROOT/mesh/$VID/images \
    --output_dir <output directory> \
    --feature_type dinov2

3. VISOR Annotation Extension

Extend VISOR annotations using DEVA:

python scripts/deva_groundtruth.py \
    --img_path /datasets/EPIC-KITCHENS/$VID/ \
    --output /datasets/EPIC-KITCHENS/$VID/visor_DEVA100_segmaps/ \
    --amp --temporal_setting online \
    --gt_dir $ROOT/$PID/$VID/visor_segmaps/ \
    --max_missed_detection_count 100 \
    --prompt "dummy1.dummy2"

python scripts/preprocessing/postprocess_deva_gt.py --vid $VID

4. 3D-Aware Tracking

Run the main tracking pipeline:

python extract_tracks.py \
    --beta_l=${BETAL} --beta_c=${BETAC} \
    --beta_v=${BETAV} --beta_s=${BETAS} \
    --vid=${VID} \
    --exp=tracked-final-bv${BETAV}-bs${BETAS}-bc${BETAC}-bl${BETAL}

5. Evaluation

To verify the reproducibility of the results, we provide the tracking predictions here. You can download the predictions and evaluate them using the following script.

Evaluate OUR results:

# we provide predictions for following hyperparameters and video
BETAV=2
BETAS=10
BETAC=10000
BETAL=10
VID=P01_104

python scripts/eval_deva.py \
    --segment_type=tracked-final-bv${BETAV}-bs${BETAS}-bc${BETAC}-bl${BETAL} \
    --gt_type=visor_DEVA100_segmaps \
    --vid=${VID}

To evaluate the DEVA baseline, replace the segment_type with deva_OWLv2_s5.

Citation

If you find this work useful, please cite:

@InProceedings{Bhalgat24b,
  author       = "Yash Bhalgat and Vadim Tschernezki and Iro Laina and Joao F. Henriques and Andrea Vedaldi and Andrew Zisserman",
  title        = "3D-Aware Instance Segmentation and Tracking in Egocentric Videos",
  booktitle    = "Asian Conference on Computer Vision",
  year         = "2024",
  organization = "IEEE",
}

Acknowledgments

This work was funded by EPSRC AIMS CDT EP/S024050/1 and AWS (Y. Bhalgat), NAVER LABS Europe (V. Tschernezki), ERC-CoG UNION 101001212 (A. Vedaldi and I. Laina), EPSRC VisualAI EP/T028572/1 (I. Laina, A. Vedaldi and A. Zisserman), and Royal Academy of Engineering RF\201819\18\163 (J. Henriques).

About

Code for ACCV 2024 paper: "3D-Aware Instance Segmentation and Tracking in Egocentric Videos"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0