Zhaochong An
Β·
Guolei Sunβ
Β·
Yun Liuβ
Β·
Runjia Li
Β·
Junlin Han
Ender Konukoglu
Β·
Serge Belongie
CVPR 2025 (Paper)
Model Side:
- GFS-VL Framework: A novel approach for generalized few-shot 3D point cloud segmentation (GFS-PCS) that combines Dense yet noisy knowledge from 3D Vision-Language Models and Precise yet sparse few-shot samples to achieve superior novel class generalization.
Benchmarking Side:
- Introduces two challenging GFS-PCS benchmarks with diverse novel classes for extensive generalization evaluation, laying a solid foundation for real-world GFS-PCS advancements.
If you find our work useful, please cite our work:
@inproceedings{an2025generalized,
title={Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model},
author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Han, Junlin and Konukoglu, Ender and Belongie, Serge},
booktitle={CVPR},
year={2025}
}
@inproceedings{an2024multimodality,
title={Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation},
author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Wu, Min
and Cheng, Ming-Ming and Konukoglu, Ender and Belongie, Serge},
booktitle={ICLR},
year={2025}
}
@inproceedings{an2024rethinking,
title={Rethinking Few-shot 3D Point Cloud Semantic Segmentation},
author={An, Zhaochong and Sun, Guolei and Liu, Yun and Liu, Fayao and Wu, Zongwei and Wang, Dan and Van Gool, Luc and Belongie, Serge},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3996--4006},
year={2024}
}
- CUDA: 11.8 and above
- PyTorch: 1.13.0 and above
Our environment is tested on both RTX 3090 and A100 GPUs.
Manual Setup
python -m venv gfs_vl
source gfs_vl/bin/activate
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install h5py pyyaml ninja sharedarray tensorboard tensorboardx addict einops scipy plyfile termcolor timm urllib3 fsspec==2024.2.0 easydict==1.13 yapf==0.40.1
pip install torch-cluster torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu118.html
pip install torch-geometric
pip install git+https://github.com/openai/CLIP.git
pip install flash-attn --no-build-isolation
pip install spconv-cu118 # see https://github.com/traveller59/spconv for details
# Pointops CUDA Dependencies (choose one of the three options)
cd libs/pointops
# Option 1: Standard install
python setup.py install
# Option 2: Docker & Multi-GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python setup.py install
# Option 3: For RTX 3090 (8.6) or A100 (8.0). More details in: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
TORCH_CUDA_ARCH_LIST="8.0" python setup.py install
# For the second CUDA dependency, use the same one of three above build options
cd ../../libs/pointops2
TORCH_CUDA_ARCH_LIST="8.0" python setup.py install
After above steps, install dependencies for the 3D vision-language models (RegionPLC). Refer to the RegionPLC installation guide.
cd pointcept/models/PLA
TORCH_CUDA_ARCH_LIST="8.0" python3 setup.py develop
# Install softgroup_ops:
cd pcseg/external_libs/softgroup_ops
TORCH_CUDA_ARCH_LIST="8.0" python setup.py build_ext develop
# If the direct install fails, follow these steps:
# 1. Download google-sparsehash-2.0.3-1.tar.bz2 from:
# https://anaconda.org/bioconda/google-sparsehash/files
# 2. Extract it:
tar -I pigz -xvjf google-sparsehash-2.0.3-1.tar.bz2 -C ./google-sparsehash
# 3. Set include path:
export CPLUS_INCLUDE_PATH=$(pwd)/google-sparsehash/include:$CPLUS_INCLUDE_PATH
# 4. Build with the include directories:
TORCH_CUDA_ARCH_LIST="8.0" python setup.py build_ext --include-dirs=$CPLUS_INCLUDE_PATH develop
We follow the Pointcept Data Guidelines.
- Download the ScanNet v2 dataset.
- Preprocess the raw data:
# Set RAW_SCANNET_DIR to your downloaded dataset directory # Set PROCESSED_SCANNET_DIR to the desired output directory python pointcept/datasets/preprocessing/scannet/preprocess_scannet.py \ --dataset_root ${RAW_SCANNET_DIR} \ --output_root ${PROCESSED_SCANNET_DIR}
- (Alternative): Download our preprocessed data from here (please agree to the official license).
After obtaining the dataset, either set data_root
in configs to ${PROCESSED_SCANNET_DIR}
or link the processed data:
ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/ScanNet200
- Download the ScanNet++ dataset.
- Preprocess the raw data:
# Set RAW_SCANNETPP_DIR to your downloaded dataset directory # Set PROCESSED_SCANNETPP_DIR to the desired output directory # NUM_WORKERS: number of parallel workers python pointcept/datasets/preprocessing/scannetpp/preprocess_scannetpp.py \ --dataset_root ${RAW_SCANNETPP_DIR} \ --output_root ${PROCESSED_SCANNETPP_DIR} \ --num_workers ${NUM_WORKERS}
- Sampling and chunking large point cloud data in train/val split (for training only):
# For the training split (change --split to val for the validation split): python pointcept/datasets/preprocessing/sampling_chunking_data.py \ --dataset_root ${PROCESSED_SCANNETPP_DIR} \ --grid_size 0.01 \ --chunk_range 6 6 \ --chunk_stride 3 3 \ --split train \ --num_workers ${NUM_WORKERS}
- (Alternative) Download our preprocessed data directly from here (please agree to the official license).
After obtaining the dataset, either set data_root
in configs to ${PROCESSED_SCANNET_DIR}
or link the processed data:
ln -s ${PROCESSED_SCANNETPP_DIR} ${CODEBASE_DIR}/data/ScanNetpp
-
Option A: Download our Pretrained weights
-
Option B: Train from scratch using the pretrain config from the
configs
folder. Training outputs will be saved into the folder specified by${EXP_NAME}
.sh scripts/train.sh -p python -d scannet200 -c semseg-pt-v3m1-0-gfspretrain -n ${EXP_NAME} -g 4
Replace
-d scannet200
withscannet
orscannetpp
when training on those datasets.
-
Set the configuration file (
-c
) to:semseg-pt-v3m1-0-gfsregistrain_k1.yaml
for 1-shot registration.semseg-pt-v3m1-0-gfsregistrain_k5.yaml
for 5-shot registration.
-
Download the pretrained 3D VLM weights from Regionplc repo or our Huggingface repo.
-
Update the config file by setting (
-o
):vlm_3d_weight
: path to the pretrained VLM weight.backbone_weight
: path from Backbone Pretraining.data_root
: corresponding dataset folder.
The regis_train_list
field controls the registration sets to use (By default, training is performed on five different sets; the final performance is reported as an average).
For example, for ScanNet200 (1-shot):
sh scripts/train.sh -p python -d scannet200 -c semseg-pt-v3m1-0-gfsregistrain_k1 -n ${EXP_NAME} -g 4
Replace
-d scannet200
withscannet
orscannetpp
when training on those datasets.
By default, five fine-tuned weights (from five registration sets) are saved. To evaluate a specific weight, use the -w
option:
sh scripts/test.sh -p python -d scannet -c semseg-pt-v3m1-0-gfsregistrain_k1 -n ${EXP_NAME} -w regis1_model_last -g 4
Note: Our evaluation is performed on whole scenes (unlike prior evaluations only on small blocks) to better simulate real-world scenarios.
Add -o vis VISUALIZATION_SAVE_PATH
to the evaluation command to automatically save files for visualization. Then, follow the COSeg visualization guide for high-quality visualization results.
Model | Dataset | K-shot | Weights |
---|---|---|---|
sc_k1 | ScanNet | 1-shot | Download |
sc_k5 | ScanNet | 5-shot | Download |
sc200_k1 | ScanNet200 | 1-shot | Download |
sc200_k5 | ScanNet200 | 5-shot | Download |
scpp_k1 | ScanNet++ | 1-shot | Download |
scpp_k5 | ScanNet++ | 5-shot | Download |
This repository is developed by Zhaochong An and builds upon the excellent works of COSeg, Pointcept, RegionPLC, and Openscene. Many thanks to all contributors!
For questions or issues, feel free to reach out:
- Email: anzhaochong@outlook.com
- Join our Communication Group (WeChat):