8000 GitHub - ZhaochongAn/GFS-VL: [CVPR 2025] Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

[CVPR 2025] Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

License

Notifications You must be signed in to change notification settings

ZhaochongAn/GFS-VL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Zhaochong An Β· Guolei Sun† Β· Yun Liu† Β· Runjia Li Β· Junlin Han
Ender Konukoglu Β· Serge Belongie

CVPR 2025 (Paper)

Overview

🌟 Highlights

Key Contributions:

Model Side:

  • GFS-VL Framework: A novel approach for generalized few-shot 3D point cloud segmentation (GFS-PCS) that combines Dense yet noisy knowledge from 3D Vision-Language Models and Precise yet sparse few-shot samples to achieve superior novel class generalization.

Benchmarking Side:

  • Introduces two challenging GFS-PCS benchmarks with diverse novel classes for extensive generalization evaluation, laying a solid foundation for real-world GFS-PCS advancements.

πŸ“ Citation

If you find our work useful, please cite our work:

@inproceedings{an2025generalized,
  title={Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model},
  author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Han, Junlin and Konukoglu, Ender and Belongie, Serge},
  booktitle={CVPR},
  year={2025}
}

@inproceedings{an2024multimodality,
    title={Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation},
    author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Wu, Min 
            and Cheng, Ming-Ming and Konukoglu, Ender and Belongie, Serge},
    booktitle={ICLR},
    year={2025}
}

@inproceedings{an2024rethinking,
  title={Rethinking Few-shot 3D Point Cloud Semantic Segmentation},
  author={An, Zhaochong and Sun, Guolei and Liu, Yun and Liu, Fayao and Wu, Zongwei and Wang, Dan and Van Gool, Luc and Belongie, Serge},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3996--4006},
  year={2024}
}

πŸ“– Table of Contents


πŸ› οΈ Installation

Requirements

  • CUDA: 11.8 and above
  • PyTorch: 1.13.0 and above

Our environment is tested on both RTX 3090 and A100 GPUs.

Environment Setup:

Manual Setup

python -m venv gfs_vl
source gfs_vl/bin/activate

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install h5py pyyaml ninja sharedarray tensorboard tensorboardx addict einops scipy plyfile termcolor timm urllib3 fsspec==2024.2.0 easydict==1.13 yapf==0.40.1
pip install torch-cluster torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu118.html
pip install torch-geometric
pip install git+https://github.com/openai/CLIP.git
pip install flash-attn --no-build-isolation
pip install spconv-cu118 # see https://github.com/traveller59/spconv for details

# Pointops CUDA Dependencies (choose one of the three options)
cd libs/pointops
# Option 1: Standard install
python setup.py install
# Option 2: Docker & Multi-GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python setup.py install
# Option 3: For RTX 3090 (8.6) or A100 (8.0). More details in: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
TORCH_CUDA_ARCH_LIST="8.0" python setup.py install

# For the second CUDA dependency, use the same one of three above build options
cd ../../libs/pointops2
TORCH_CUDA_ARCH_LIST="8.0" python setup.py install

After above steps, install dependencies for the 3D vision-language models (RegionPLC). Refer to the RegionPLC installation guide.

cd pointcept/models/PLA
TORCH_CUDA_ARCH_LIST="8.0" python3 setup.py develop

# Install softgroup_ops:
cd pcseg/external_libs/softgroup_ops
TORCH_CUDA_ARCH_LIST="8.0" python setup.py build_ext develop

# If the direct install fails, follow these steps:
# 1. Download google-sparsehash-2.0.3-1.tar.bz2 from:
#    https://anaconda.org/bioconda/google-sparsehash/files
# 2. Extract it:
tar -I pigz -xvjf google-sparsehash-2.0.3-1.tar.bz2 -C ./google-sparsehash
# 3. Set include path:
export CPLUS_INCLUDE_PATH=$(pwd)/google-sparsehash/include:$CPLUS_INCLUDE_PATH
# 4. Build with the include directories:
TORCH_CUDA_ARCH_LIST="8.0" python setup.py build_ext --include-dirs=$CPLUS_INCLUDE_PATH develop

πŸ“¦ Data Preparation

We follow the Pointcept Data Guidelines.

ScanNet & ScanNet200

  1. Download the ScanNet v2 dataset.
  2. Preprocess the raw data:
    # Set RAW_SCANNET_DIR to your downloaded dataset directory
    # Set PROCESSED_SCANNET_DIR to the desired output directory
    python pointcept/datasets/preprocessing/scannet/preprocess_scannet.py \
      --dataset_root ${RAW_SCANNET_DIR} \
      --output_root ${PROCESSED_SCANNET_DIR}
  • (Alternative): Download our preprocessed data from here (please agree to the official license).

After obtaining the dataset, either set data_root in configs to ${PROCESSED_SCANNET_DIR} or link the processed data:

ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/ScanNet200

ScanNet++

  1. Download the ScanNet++ dataset.
  2. Preprocess the raw data:
    # Set RAW_SCANNETPP_DIR to your downloaded dataset directory
    # Set PROCESSED_SCANNETPP_DIR to the desired output directory
    # NUM_WORKERS: number of parallel workers
    python pointcept/datasets/preprocessing/scannetpp/preprocess_scannetpp.py \
      --dataset_root ${RAW_SCANNETPP_DIR} \
      --output_root ${PROCESSED_SCANNETPP_DIR} \
      --num_workers ${NUM_WORKERS}
  3. Sampling and chunking large point cloud data in train/val split (for training only):
    # For the training split (change --split to val for the validation split):
    python pointcept/datasets/preprocessing/sampling_chunking_data.py \
      --dataset_root ${PROCESSED_SCANNETPP_DIR} \
      --grid_size 0.01 \
      --chunk_range 6 6 \
      --chunk_stride 3 3 \
      --split train \
      --num_workers ${NUM_WORKERS}
  • (Alternative) Download our preprocessed data directly from here (please agree to the official license).

After obtaining the dataset, either set data_root in configs to ${PROCESSED_SCANNET_DIR} or link the processed data:

ln -s ${PROCESSED_SCANNETPP_DIR} ${CODEBASE_DIR}/data/ScanNetpp

πŸ”„ Training

1. Backbone Pretraining

  • Option A: Download our Pretrained weights

  • Option B: Train from scratch using the pretrain config from the configs folder. Training outputs will be saved into the folder specified by ${EXP_NAME}.

    sh scripts/train.sh -p python -d scannet200 -c semseg-pt-v3m1-0-gfspretrain -n ${EXP_NAME} -g 4

Replace -d scannet200 with scannet or scannetpp when training on those datasets.

2. Registration Training

  • Set the configuration file (-c) to:

    • semseg-pt-v3m1-0-gfsregistrain_k1.yaml for 1-shot registration.
    • semseg-pt-v3m1-0-gfsregistrain_k5.yaml for 5-shot registration.
  • Download the pretrained 3D VLM weights from Regionplc repo or our Huggingface repo.

  • Update the config file by setting (-o):

    • vlm_3d_weight: path to the pretrained VLM weight.
    • backbone_weight: path from Backbone Pretraining.
    • data_root: corresponding dataset folder.

The regis_train_list field controls the registration sets to use (By default, training is performed on five different sets; the final performance is reported as an average).

For example, for ScanNet200 (1-shot):

sh scripts/train.sh -p python -d scannet200 -c semseg-pt-v3m1-0-gfsregistrain_k1 -n ${EXP_NAME} -g 4

Replace -d scannet200 with scannet or scannetpp when training on those datasets.

πŸ“Š Evaluation & Visualization

Evaluation

By default, five fine-tuned weights (from five registration sets) are saved. To evaluate a specific weight, use the -w option:

sh scripts/test.sh -p python -d scannet -c semseg-pt-v3m1-0-gfsregistrain_k1 -n ${EXP_NAME} -w regis1_model_last -g 4

Note: Our evaluation is performed on whole scenes (unlike prior evaluations only on small blocks) to better simulate real-world scenarios.

Visualization

Add -o vis VISUALIZATION_SAVE_PATH to the evaluation command to automatically save files for visualization. Then, follow the COSeg visualization guide for high-quality visualization results.

🎯 Model Zoo

Model Dataset K-shot Weights
sc_k1 ScanNet 1-shot Download
sc_k5 ScanNet 5-shot Download
sc200_k1 ScanNet200 1-shot Download
sc200_k5 ScanNet200 5-shot Download
scpp_k1 ScanNet++ 1-shot Download
scpp_k5 ScanNet++ 5-shot Download

Acknowledgement

This repository is developed by Zhaochong An and builds upon the excellent works of COSeg, Pointcept, RegionPLC, and Openscene. Many thanks to all contributors!

For questions or issues, feel free to reach out:

About

[CVPR 2025] Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0