Tackling View-Dependent Semantics in 3D Language Gaussian Splatting
Jiazhong Cen1, Xudong Zhou1, Jiemin Fang2✉, Changsong Wen1, Lingxi Xie2, Xiaopeng Zhang2, Wei Shen1✉, Qi Tian2
1MoE Key Lab of Artificial Intelligence, AI Institute, SJTU 2Huawei Inc .
TL;DR: We present LaGa, a language-driven open-vocabulary 3D scene understanding method built upon 3D Gaussian splatting, designed to effectively handle the view dependency of 3D semantics.
LaGa first decomposes the scene into a set of 3D objects with a contrastive learning framework. For each 3D object, LaGa collects its multi-view semantic features and adopt an adaptive K-means strategy to obtain a set of semantic descriptors. To alleviate the effect of noisy descriptors, LaGa employs a weighted descriptor relevance aggregation strategy, which adjusts the weight of different descriptors based on their alignment with the global feature of its corresponding object and the internal compactness of its corresponding feature cluster.
git clone https://github.com/SJTU-DeepVisionLab/LaGa.git
cd LaGa
conda env create --file environment.yml
conda activate laga
The datasets can be downloaded from:
Note: For dataset not designed for 3D Gaussian Splatting, you may need to run convert.py for data conversion. Please refer to the official repo of 3D-GS for more details.
(click to expand)
data
├── 360_v2
│ └── [bicycle|bonsai|counter|garden|kitchen|room|stump]
│ └── [images|images_2|images_4|images_8]
│
├── lerf_ovs
│ └── [figurines|teatime|ramen|waldo_kitchen|label]
│ └── images
│
└── ...
-
Train 3D-GS scene
python train_scene.py -s <path to COLMAP or NeRF Synthetic dataset>
-
Train affinity features (30000 is a default number, but we believe smaller number is enough)
python train_affinity_features.py -m <path to the pre-trained 3DGS model> --iterations 30000
-
Inference and evaluation
Please follow instructions in inference.ipynb
Before running the GUI, make sure you have trained the 3D Gaussian Splatting model and the affinity features as described above. Moreover, you should extract the essential scene decomposition information and semantic descriptors with inference.ipynb. Then, you can run the GUI to visualize and interact with the 3D scene and its semantic segmentation:
python laga_gui.py -m <path to the pre-trained 3DGS model> --scene_iteration 30000 --feature_iteration 30000
Note that the --scene_iteration
and --feature_iteration
arguments should match the iterations used during training.
- Use the mouse to rotate, zoom, and pan the 3D scene.
- Type a text prompt in the input box, and click 'Do Inference' to query LaGa.
- You can disable the 'Postprocess' to reduce GPU memory consumption.
- 'ScoreThres' is used to segment the objects with a relevance score higher than the threshold (click 'preview_segmentation_in_2d' and 'segment3d' to show the segmentation results).
- In 'Render option', the 'RGB' denotes the rendered RGB image, 'RELEVANCE' is the relevance score with the given text prompt and 'DECOMPOSITION' is the scene decomposition result.
- Click 'Save as' to save the segmentatio results. Before that, you should click 'segment3d' first.
- [✓] Add an interactive GUI.
Thanks for the following project for their valuable contributions:
If you find this project helpful for your research, please consider citing our paper and giving a ⭐.
@inproceedings{laga,
title={Tackling View-Dependent Semantics in 3D Language Gaussian Splatting},
author={Jiazhong Cen and Xudong Zhou and Jiemin Fang and Changsong Wen and Lingxi Xie and Xiaopeng Zhang and Wei Shen and Qi Tian},
booktitle = {ICML},
year = {2025},
}