Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Zhaochong An · Guolei Sun^† · Yun Liu^† · Runjia Li · Junlin Han
Ender Konukoglu · Serge Belongie

CVPR 2025 (Paper)

🌟 Highlights

Key Contributions:

Model Side:

GFS-VL Framework: A novel approach for generalized few-shot 3D point cloud segmentation (GFS-PCS) that combines Dense yet noisy knowledge from 3D Vision-Language Models and Precise yet sparse few-shot samples to achieve superior novel class generalization.

Benchmarking Side:

Introduces two challenging GFS-PCS benchmarks with diverse novel classes for extensive generalization evaluation, laying a solid foundation for real-world GFS-PCS advancements.

📝 Citation

If you find our work useful, please cite our work:

@inproceedings{an2025generalized,
  title={Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model},
  author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Han, Junlin and Konukoglu, Ender and Belongie, Serge},
  booktitle={CVPR},
  year={2025}
}

@inproceedings{an2024multimodality,
    title={Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation},
    author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Wu, Min 
            and Cheng, Ming-Ming and Konukoglu, Ender and Belongie, Serge},
    booktitle={ICLR},
    year={2025}
}

@inproceedings{an2024rethinking,
  title={Rethinking Few-shot 3D Point Cloud Semantic Segmentation},
  author={An, Zhaochong and Sun, Guolei and Liu, Yun and Liu, Fayao and Wu, Zongwei and Wang, Dan and Van Gool, Luc and Belongie, Serge},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3996--4006},
  year={2024}
}

🛠️ Installation

Requirements

CUDA: 11.8 and above
PyTorch: 1.13.0 and above

Our environment is tested on both RTX 3090 and A100 GPUs.

Environment Setup:

Manual Setup

python -m venv gfs_vl
source gfs_vl/bin/activate

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install h5py pyyaml ninja sharedarray tensorboard tensorboardx addict einops scipy plyfile termcolor timm urllib3 fsspec==2024.2.0 easydict==1.13 yapf==0.40.1
pip install torch-cluster torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu118.html
pip install torch-geometric
pip install git+https://github.com/openai/CLIP.git
pip install flash-attn --no-build-isolation
pip install spconv-cu118 # see https://github.com/traveller59/spconv for details

# Pointops CUDA Dependencies (choose one of the three options)
cd libs/pointops
# Option 1: Standard install
python setup.py install
# Option 2: Docker & Multi-GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python setup.py install
# Option 3: For RTX 3090 (8.6) or A100 (8.0). More details in: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
TORCH_CUDA_ARCH_LIST="8.0" python setup.py install

# For the second CUDA dependency, use the same one of three above build options
cd ../../libs/pointops2
TORCH_CUDA_ARCH_LIST="8.0" python setup.py install

After above steps, install dependencies for the 3D vision-language models (RegionPLC). Refer to the RegionPLC installation guide.

cd pointcept/models/PLA
TORCH_CUDA_ARCH_LIST="8.0" python3 setup.py develop

# Install softgroup_ops:
cd pcseg/external_libs/softgroup_ops
TORCH_CUDA_ARCH_LIST="8.0" python setup.py build_ext develop

# If the direct install fails, follow these steps:
# 1. Download google-sparsehash-2.0.3-1.tar.bz2 from:
#    https://anaconda.org/bioconda/google-sparsehash/files
# 2. Extract it:
tar -I pigz -xvjf google-sparsehash-2.0.3-1.tar.bz2 -C ./google-sparsehash
# 3. Set include path:
export CPLUS_INCLUDE_PATH=$(pwd)/google-sparsehash/include:$CPLUS_INCLUDE_PATH
# 4. Build with the include directories:
TORCH_CUDA_ARCH_LIST="8.0" python setup.py build_ext --include-dirs=$CPLUS_INCLUDE_PATH develop

📦 Data Preparation

We follow the Pointcept Data Guidelines.

ScanNet & ScanNet200

Download the ScanNet v2 dataset.

Preprocess the raw data:

# Set RAW_SCANNET_DIR to your downloaded dataset directory
# Set PROCESSED_SCANNET_DIR to the desired output directory
python pointcept/datasets/preprocessing/scannet/preprocess_scannet.py \
  --dataset_root ${RAW_SCANNET_DIR} \
  --output_root ${PROCESSED_SCANNET_DIR}

(Alternative): Download our preprocessed data from here (please agree to the official license).

After obtaining the dataset, either set data_root in configs to ${PROCESSED_SCANNET_DIR} or link the processed data:

ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/ScanNet200

ScanNet++

Download the ScanNet++ dataset.

Preprocess the raw data:

# Set RAW_SCANNETPP_DIR to your downloaded dataset directory
# Set PROCESSED_SCANNETPP_DIR to the desired output directory
# NUM_WORKERS: number of parallel workers
python pointcept/datasets/preprocessing/scannetpp/preprocess_scannetpp.py \
  --dataset_root ${RAW_SCANNETPP_DIR} \
  --output_root ${PROCESSED_SCANNETPP_DIR} \
  --num_workers ${NUM_WORKERS}

Sampling and chunking large point cloud data in train/val split (for training only):

# For the training split (change --split to val for the validation split):
python pointcept/datasets/preprocessing/sampling_chunking_data.py \
  --dataset_root ${PROCESSED_SCANNETPP_DIR} \
  --grid_size 0.01 \
  --chunk_range 6 6 \
  --chunk_stride 3 3 \
  --split train \
  --num_workers ${NUM_WORKERS}

(Alternative) Download our preprocessed data directly from here (please agree to the official license).

After obtaining the dataset, either set data_root in configs to ${PROCESSED_SCANNET_DIR} or link the processed data:

ln -s ${PROCESSED_SCANNETPP_DIR} ${CODEBASE_DIR}/data/ScanNetpp

🔄 Training

1. Backbone Pretraining

Option A: Download our Pretrained weights
Option B: Train from scratch using the pretrain config from the configs folder. Training outputs will be saved into the folder specified by ${EXP_NAME}.
```
sh scripts/train.sh -p python -d scannet200 -c semseg-pt-v3m1-0-gfspretrain -n ${EXP_NAME} -g 4
```

Replace -d scannet200 with scannet or scannetpp when training on those datasets.

2. Registration Training

Set the configuration file (-c) to:
- semseg-pt-v3m1-0-gfsregistrain_k1.yaml for 1-shot registration.
- semseg-pt-v3m1-0-gfsregistrain_k5.yaml for 5-shot registration.
Download the pretrained 3D VLM weights from Regionplc repo or our Huggingface repo.
Update the config file by setting (-o):
- vlm_3d_weight: path to the pretrained VLM weight.
- backbone_weight: path from Backbone Pretraining.
- data_root: corresponding dataset folder.

The regis_train_list field controls the registration sets to use (By default, training is performed on five different sets; the final performance is reported as an average).

For example, for ScanNet200 (1-shot):

sh scripts/train.sh -p python -d scannet200 -c semseg-pt-v3m1-0-gfsregistrain_k1 -n ${EXP_NAME} -g 4

Replace -d scannet200 with scannet or scannetpp when training on those datasets.

📊 Evaluation & Visualization

Evaluation

By default, five fine-tuned weights (from five registration sets) are saved. To evaluate a specific weight, use the -w option:

sh scripts/test.sh -p python -d scannet -c semseg-pt-v3m1-0-gfsregistrain_k1 -n ${EXP_NAME} -w regis1_model_last -g 4

Note: Our evaluation is performed on whole scenes (unlike prior evaluations only on small blocks) to better simulate real-world scenarios.

Visualization

Add -o vis VISUALIZATION_SAVE_PATH to the evaluation command to automatically save files for visualization. Then, follow the COSeg visualization guide for high-quality visualization results.

🎯 Model Zoo

Model	Dataset	K-shot	Weights
sc_k1	ScanNet	1-shot	Download
sc_k5	ScanNet	5-shot	Download
sc200_k1	ScanNet200	1-shot	Download
sc200_k5	ScanNet200	5-shot	Download
scpp_k1	ScanNet++	1-shot	Download
scpp_k5	ScanNet++	5-shot	Download

Acknowledgement

This repository is developed by Zhaochong An and builds upon the excellent works of COSeg, Pointcept, RegionPLC, and Openscene. Many thanks to all contributors!

For questions or issues, feel free to reach out:

Email: anzhaochong@outlook.com
Join our Communication Group (WeChat):

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
docs		docs
libs		libs
pointcept		pointcept
scripts		scripts
tools		tools
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

CVPR 2025 (Paper)

🌟 Highlights

Key Contributions:

📝 Citation

📖 Table of Contents

🛠️ Installation

Requirements

Environment Setup:

📦 Data Preparation

ScanNet & ScanNet200

ScanNet++

🔄 Training

1. Backbone Pretraining

2. Registration Training

📊 Evaluation & Visualization

Evaluation

Visualization

🎯 Model Zoo

Acknowledgement

About

Releases

Packages

Languages

License

ZhaochongAn/GFS-VL

Folders and files

Latest commit

History

Repository files navigation

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

CVPR 2025 (Paper)

🌟 Highlights

Key Contributions:

📝 Citation

📖 Table of Contents

🛠️ Installation

Requirements

Environment Setup:

📦 Data Preparation

ScanNet & ScanNet200

ScanNet++

🔄 Training

1. Backbone Pretraining

2. Registration Training

📊 Evaluation & Visualization

Evaluation

Visualization

🎯 Model Zoo

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages