Chuanrui Zhang * · Yingshuang Zou * · Zhuoling Li · Minmin Yi · Haoqian Wang †
a. Create a conda virtual environment and activate it.
conda create --name transplat -y python=3.10.14
conda activate transplat
conda install -y pip
b. Install PyTorch and torchvision.
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
# Recommended torch==2.1.2
c. Install mmcv.
pip install openmim
mim install mmcv==2.1.0
d. Install other requirements.
pip install -r requirements.txt
We use the same training datasets as pixelSplat and MVSplat. Below we quote pixelSplat's detailed instructions on getting datasets.
pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created
datasets
folder in the project root directory.
If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.
We use the same testing datasets as MVSplat. Below we quote MVSplat's detailed instructions on getting datasets.
- Download the preprocessed DTU data dtu_training.rar.
- Convert DTU to chunks by running
python src/scripts/convert_dtu.py --input_dir PATH_TO_DTU --output_dir datasets/dtu
- [Optional] Generate the evaluation index by running
python src/scripts/generate_dtu_evaluation_index.py --n_contexts=N
, where N is the number of context views. (For N=2 and N=3, we have already provided our tested version under/assets
.)
For inference, first prepare pretrained models.
-
get the pretrained models of transplat, and save them to
/checkpoints
-
get the pretrained models of Depth-Anything-V2-Base, and save them to
/checkpoints
-
run the following:
# re10k
python -m src.main +experiment=re10k \
checkpointing.load=./checkpoints/re10k.ckpt \
mode=test \
dataset/view_sampler=evaluation \
test.compute_scores=true
# acid
python -m src.main +experiment=acid \
checkpointing.load=./checkpoints/acid.ckpt \
mode=test \
dataset/view_sampler=evaluation \
dataset.view_sampler.index_path=assets/evaluation_index_acid.json \
test.compute_scores=true
- the rendered novel views will be stored under
outputs/test
You can find more running commands (eg. Cross-Dataset Generalization) in run.sh.
Run the following:
# download the backbone pretrained weight from unimatch and save to 'checkpoints/'
wget 'https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-resumeflowthings-scannet-5d9d7964.pth' -P checkpoints
# train mvsplat
CUDA_VISIBLE_DEVICES=0,1,2,4,5,6,7 python -m src.main +experiment=re10k data_loader.train.batch_size=2 wandb.mode=run wandb.name=transplat-re10k 2>&1 | tee transplat-re10k.log
Our models are trained with 7 RTX3090 (24GB) GPU.
@article{zhang2024transplat,
title={Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers},
author={Zhang, Chuanrui and Zou, Yingshuang and Li, Zhuoling and Yi, Minmin and Wang, Haoqian},
journal={arXiv preprint arXiv:2408.13770},
year={2024}
}
The project is largely based on pixelSplat, MVSplat and has incorporated numerous code snippets from UniMatch, Depth-Anything-V2 from Depth-Anything-V2 and transformer architecture from mmdetection3d. Many thanks to these four projects for their excellent contributions!