10000 GitHub - xingyoujun/transplat: (AAAI 2025) TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

xingyoujun/transplat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TranSplat: Generalizable 3D Gaussian Splatting
from Sparse Multi-View Images with Transformers

Chuanrui Zhang *  ·  Yingshuang Zou *  ·  Zhuoling Li  ·  Minmin Yi  ·  Haoqian Wang †

AAAI 2025

Installation

a. Create a conda virtual environment and activate it.

conda create --name transplat -y python=3.10.14
conda activate transplat
conda install -y pip

b. Install PyTorch and torchvision.

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
# Recommended torch==2.1.2

c. Install mmcv.

pip install openmim
mim install mmcv==2.1.0

d. Install other requirements.

pip install -r requirements.txt

Acquiring Datasets

RealEstate10K and ACID

We use the same training datasets as pixelSplat and MVSplat. Below we quote pixelSplat's detailed instructions on getting datasets.

pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created datasets folder in the project root directory.

If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.

DTU (For Testing Only)

We use the same testing datasets as MVSplat. Below we quote MVSplat's detailed instructions on getting datasets.

  • Download the preprocessed DTU data dtu_training.rar.
  • Convert DTU to chunks by running python src/scripts/convert_dtu.py --input_dir PATH_TO_DTU --output_dir datasets/dtu
  • [Optional] Generate the evaluation index by running python src/scripts/generate_dtu_evaluation_index.py --n_contexts=N, where N is the number of context views. (For N=2 and N=3, we have already provided our tested version under /assets.)

Running the Code

Evaluation

For inference, first prepare pretrained models.

  • get the pretrained models of transplat, and save them to /checkpoints

  • get the pretrained models of Depth-Anything-V2-Base, and save them to /checkpoints

  • run the following:

# re10k
python -m src.main +experiment=re10k \
checkpointing.load=./checkpoints/re10k.ckpt \
mode=test \
dataset/view_sampler=evaluation \
test.compute_scores=true 

# acid
python -m src.main +experiment=acid \
checkpointing.load=./checkpoints/acid.ckpt \
mode=test \
dataset/view_sampler=evaluation \
dataset.view_sampler.index_path=assets/evaluation_index_acid.json \
test.compute_scores=true 
  • the rendered novel views will be stored under outputs/test

You can find more running commands (eg. Cross-Dataset Generalization) in run.sh.

Training

Run the following:

# download the backbone pretrained weight from unimatch and save to 'checkpoints/'
wget 'https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-resumeflowthings-scannet-5d9d7964.pth' -P checkpoints
# train mvsplat
CUDA_VISIBLE_DEVICES=0,1,2,4,5,6,7 python -m src.main +experiment=re10k data_loader.train.batch_size=2 wandb.mode=run wandb.name=transplat-re10k 2>&1 | tee transplat-re10k.log

Our models are trained with 7 RTX3090 (24GB) GPU.

BibTeX

@article{zhang2024transplat,
  title={Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers},
  author={Zhang, Chuanrui and Zou, Yingshuang and Li, Zhuoling and Yi, Minmin and Wang, Haoqian},
  journal={arXiv preprint arXiv:2408.13770},
  year={2024}
}

Acknowledgements

The project is largely based on pixelSplat, MVSplat and has incorporated numerous code snippets from UniMatch, Depth-Anything-V2 from Depth-Anything-V2 and transformer architecture from mmdetection3d. Many thanks to these four projects for their excellent contributions!

About

(AAAI 2025) TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0