TranSplat: Generalizable 3D Gaussian Splatting
from Sparse Multi-View Images with Transformers

Chuanrui Zhang * · Yingshuang Zou * · Zhuoling Li · Minmin Yi · Haoqian Wang †

AAAI 2025

Paper | Project Page | Pretrained Models

Installation

a. Create a conda virtual environment and activate it.

conda create --name transplat -y python=3.10.14
conda activate transplat
conda install -y pip

b. Install PyTorch and torchvision.

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
# Recommended torch==2.1.2

c. Install mmcv.

pip install openmim
mim install mmcv==2.1.0

d. Install other requirements.

pip install -r requirements.txt

Acquiring Datasets

RealEstate10K and ACID

We use the same training datasets as pixelSplat and MVSplat. Below we quote pixelSplat's detailed instructions on getting datasets.

pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created datasets folder in the project root directory.

If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.

DTU (For Testing Only)

We use the same testing datasets as MVSplat. Below we quote MVSplat's detailed instructions on getting datasets.

Download the preprocessed DTU data dtu_training.rar.

Convert DTU to chunks by running python src/scripts/convert_dtu.py --input_dir PATH_TO_DTU --output_dir datasets/dtu

[Optional] Generate the evaluation index by running python src/scripts/generate_dtu_evaluation_index.py --n_contexts=N, where N is the number of context views. (For N=2 and N=3, we have already provided our tested version under /assets.)

Running the Code

Evaluation

For inference, first prepare pretrained models.

get the pretrained models of transplat, and save them to /checkpoints
get the pretrained models of Depth-Anything-V2-Base, and save them to /checkpoints
run the following:

# re10k
python -m src.main +experiment=re10k \
checkpointing.load=./checkpoints/re10k.ckpt \
mode=test \
dataset/view_sampler=evaluation \
test.compute_scores=true 

# acid
python -m src.main +experiment=acid \
checkpointing.load=./checkpoints/acid.ckpt \
mode=test \
dataset/view_sampler=evaluation \
dataset.view_sampler.index_path=assets/evaluation_index_acid.json \
test.compute_scores=true

the rendered novel views will be stored under outputs/test

You can find more running commands (eg. Cross-Dataset Generalization) in run.sh.

Training

Run the following:

# download the backbone pretrained weight from unimatch and save to 'checkpoints/'
wget 'https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-resumeflowthings-scannet-5d9d7964.pth' -P checkpoints
# train mvsplat
CUDA_VISIBLE_DEVICES=0,1,2,4,5,6,7 python -m src.main +experiment=re10k data_loader.train.batch_size=2 wandb.mode=run wandb.name=transplat-re10k 2>&1 | tee transplat-re10k.log

Our models are trained with 7 RTX3090 (24GB) GPU.

BibTeX

@article{zhang2024transplat,
  title={Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers},
  author={Zhang, Chuanrui and Zou, Yingshuang and Li, Zhuoling and Yi, Minmin and Wang, Haoqian},
  journal={arXiv preprint arXiv:2408.13770},
  year={2024}
}

Acknowledgements

The project is largely based on pixelSplat, MVSplat and has incorporated numerous code snippets from UniMatch, Depth-Anything-V2 from Depth-Anything-V2 and transformer architecture from mmdetection3d. Many thanks to these four projects for their excellent contributions!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TranSplat: Generalizable 3D Gaussian Splatting
from Sparse Multi-View Images with Transformers

AAAI 2025

Paper | Project Page | Pretrained Models

Installation

Acquiring Datasets

RealEstate10K and ACID

DTU (For Testing Only)

Running the Code

Evaluation

Training

BibTeX

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

xingyoujun/transplat

Folders and files

Latest commit

History

Repository files navigation

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

AAAI 2025

Paper | Project Page | Pretrained Models

Installation

Acquiring Datasets

RealEstate10K and ACID

DTU (For Testing Only)

Running the Code

Evaluation

Training

BibTeX

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

TranSplat: Generalizable 3D Gaussian Splatting
from Sparse Multi-View Images with Transformers

Packages