8000 GitHub - havrylovv/iSegProbe: Codebase for probing VFMs and Feature Upsamplers using Intractive Segmentation.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

havrylovv/iSegProbe

Repository files navigation

iSegProbe: Probing VFMs and Feature Upsamplers using Interactive Segmentation

Alt text

Introduction

This repository provides the code for the technical report [Arxiv], and also serves as a standalone suite for probing and evaluating future methods in interactive segmentation (IS).

The iSegProbe repository includes:

  • Pipelines for training and evaluating interactive segmentation models, specifically adapted for probing individual model components (train.py, evaluate.py)
  • Implementations of vision backbones, such as ViT, MaskCLIP, and DINOv2, tailored for the interactive segmentation task (core.model.featurizers)
  • Implementations of multiple feature upsamplers, including LiFT, FeatUp, and LoftUp (core.model.upsamplers)
  • Support for major IS datasets: GrabCut, DAVIS, SBD, Berkeley, COCO+LVIS ... (core.data)
  • Visualization utilities for plotting predictions and features, as well as recreating plots from the report

Contents

Installation

Environment

Developed and tested on Python 3.9, PyTorch 2.4.1, CUDA 12.4, Ubuntu 20.04. To install the required dependencies, run:

pip install -r requirements.txt

Datasets

Download the dataset(s) relevant to your use case and and specify the corresponding paths in the configs/main_cfg.yaml.

📌 Note: Our experiments were conducted on SBD (train) and GrabCut, DAVIS, Berkeley and SBD (test). However, other datasets are fully supported and can be used with minimal effort.

Dataset Description Download Link
ADE20k 22k images with 434k instances (total) official site
OpenImages 944k images with 2.6M instances (total) official site
MS COCO 118k images with 1.2M instances (train) official site
LVIS v1.0 100k images with 1.2M instances (total) official site
COCO+LVIS* 99k images with 1.5M instances (train) original LVIS images +
combined annotations
SBD 8498 images with 20172 instances for (train)
2857 images with 6671 instances for (test)
official site
Grab Cut 50 images with one object each (test) GrabCut.zip (11 MB)
Berkeley 96 images with 100 instances (test) Berkeley.zip (7 MB)
DAVIS 345 images with one object each (test) DAVIS.zip (43 MB)
Pascal VOC 1449 images with 3417 instances (validation) official site
COCO_MVal 800 images with 800 instances (test) COCO_MVal.zip (127 MB)

(*) - To prepare COCO+LVIS, first download the original LVIS v1.0 dataset. Then, download and unpack the pre-processed annotations provided by the RITM team, which combine COCO and LVIS. Place the annotations in the same folder as LVIS v1.0.

For an extended list of supported datasets, refer to the SimpleClick dataset collection: [link]

Upsamplers

Download upsampler weights and specify the corresponding paths in the configs/main_cfg.yaml:

For additional trained upsamplers, refer to the LoftUp repository: [link]

Evaluation

Evaluation of the vision foundation model (and feature upsampler) involves two separate stages: (1) training the interactive segmentation model, and (2) performing the actual evaluation.

Train Your IS Model

General training configurations are specified in configs/train_cfg.yaml. For a detailed explanation of the parameters, please refer directly to that file. Each training experiment (containing IS model, datasets and other components) should be defined in a separate Python file, which is then referenced from train_cfg.yaml. Examples of such files can be found in the models/ directory.

To launch the training process, you can either modify train_cfg.yaml accordingly and run:

python train.py

Or override specific arguments directly with CLI using Hydra syntax, for example:

python train.py +exp.name=my_name +exp.model_path=/path/to/my/model

Evaluate Your IS Model

General evaluation configurations are specified in configs/eval_cfg.yaml. For a detailed explanation of the parameters, please refer directly to that file.

To launch the evaluation process, you can either modify eval_cfg.yaml accordingly and run:

python evaluate.py

Or override specific arguments directly with CLI using Hydra syntax, for example:

python evaluate.py +checkpoint=/path/to/checkpoints +datasets=GrabCut,Berkeley,SBD,DAVIS

Logging

  • Training logs can be visualized using TensorBoard and Weights & Biases.

    TensorBoard

    To enable TensorBoard, locate folders with experiments output (could be also some root f 8B15 older containing multiple runs) and run:

    tensorboard --logdir=PATH_TO_LOG_DIR --port=6006

    Weights & Biases

    To enable logging to W&B, set the wandb.log_wandb=true in train_cfg.yaml.

  • Separate Weights & Biases evaluation logging is available by setting wandb=true in eval_cfg.yaml.

Interactive Demo

Alt text

To launch Tkinter-based interactive demo, run:

python demo.py --checkpoint /path/to/ckpts 

Demo Controls:

Key Description
Left Mouse Button Place a positive click
Right Mouse Button Place a negative click
Scroll Wheel Zoom an image in and out
Right Mouse Button +
Move Mouse
Move an image
Space Finish the current object mask
  • Some test images can be found in the assets/test_imgs folder.
  • For a more detailed description of the demo parameters and functionality, refer to the RITM codebase.

Additional Comments

  • When launching the demo from a remote machine, you may need to have X11 (or XQuartz) installed and running on your local machine with proper X11 forwarding.
  • If the demo exits incorrectly, the process might not terminate properly, leading to the following error on the next launch:
free(): invalid pointer

To resolve this, kill the demo process by running:

pkill -9 -f demo.py 

Plotting Utilities

In the eval_cfg.yaml file, the vis_preds flag is responsible for visualizing the model's predictions, while the save_feats flag controls whether raw features before and after the upsampler are saved. These saved features can be further visualized using the script core.plots.plot_features.py. Additionally, the script core.plots.plot_iou_vs_clicks.py can be used to perform a comparison of the mean Intersection over Union (mIoU) as a function of the number of clicks made.

Citation

If you find this repository useful, please cite our papers:

@misc{huang2025loftuplearningcoordinatebasedfeature,
      title={LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models}, 
      author={Haiwen Huang and Anpei Chen and Volodymyr Havrylov and Andreas Geiger and Dan Zhang},
      year={2025},
      eprint={2504.14032},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.14032}, 
}

@misc{havrylov2025benchmarking,
    title={Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation},
    author={Volodymyr Havrylov and Haiwen Huang and Dan Zhang and Andreas Geiger},
    year={2025},
    eprint={2505.02075},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2505.02075}, 
}

Acknowledgements

This repository is based on SimpleClick and RITM, with most of the featurizers code adapted from FeatUp. We thank the authors of these open-source projects for their valuable contributions.

About

Codebase for probing VFMs and Feature Upsamplers using Intractive Segmentation.

Topics

Resources

License

Stars

Watchers

Forks

0