Official implementation of Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass, CVPR 2025
Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli
# clone project
git clone https://github.com/facebookresearch/fast3r
cd fast3r
# create conda environment
conda create -n fast3r python=3.11 cmake=3.14.0 -y
conda activate fast3r
# install PyTorch (adjust cuda version according to your system)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 nvidia/label/cuda-12.4.0::cuda-toolkit -c pytorch -c nvidia
# install requirements
pip install -r requirements.txt
# install fast3r as a package (so you can import fast3r and use it in your own project)
pip install -e .
Note: Please make sure to NOT install the cuROPE module like in DUSt3R - it would mess up Fast3R's prediction.
Use the following command to run the demo:
python fast3r/viz/demo.py
This will automatically download the pre-trained model weights and config from Hugging Face Model.
The demo is a Gradio interface where you can upload images or a video and visualize the 3D reconstruction and camera pose estimation.
fast3r/viz/demo.py
also serves as an example of how to use the model for inference.
To use Fast3R in your own project, you can import the Fast3R
class from fast3r.models.fast3r
and use it as a regular PyTorch model.
import torch
from fast3r.dust3r.utils.image import load_images
from fast3r.dust3r.inference_multiview import inference
from fast3r.models.fast3r import Fast3R
from fast3r.models.multiview_dust3r_module import MultiViewDUSt3RLitModule
# --- Setup ---
# Load the model from Hugging Face
model = Fast3R.from_pretrained("jedyang97/Fast3R_ViT_Large_512") # If you have networking issues, try pre-download the HF checkpoint dir and change the path here to a local directory
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
# Create a lightweight lightning module wrapper for the model.
# This provides functions to estimate camera poses, evaluate 3D reconstruction, etc.
lit_module = MultiViewDUSt3RLitModule.load_for_inference(model)
# Set model to evaluation mode
model.eval()
lit_module.eval()
# --- Load Images ---
# Provide a list of image file paths. Images can come from different cameras and aspect ratios.
filelist = ["path/to/image1.jpg", "path/to/image2.jpg", "path/to/image3.jpg"]
images = load_images(filelist, size=512, verbose=True)
# --- Run Inference ---
# The inference function returns a dictionary with predictions and view information.
output_dict, profiling_info = inference(
images,
model,
device,
dtype=torch.float32, # or use torch.bfloat16 if supported
verbose=True,
profiling=True,
)
# --- Estimate Camera Poses ---
# This step estimates the camera-to-world (c2w) poses for each view using PnP.
poses_c2w_batch, estimated_focals = MultiViewDUSt3RLitModule.estimate_camera_poses(
output_dict['preds'],
niter_PnP=100,
focal_length_estimation_method='first_view_from_global_head'
)
# poses_c2w_batch is a list; the first element contains the estimated poses for each view.
camera_poses = poses_c2w_batch[0]
# Print camera poses for all views.
for view_idx, pose in enumerate(camera_poses):
print(f"Camera Pose for view {view_idx}:")
print(pose.shape) # np.array of shape (4, 4), the camera-to-world transformation matrix
# --- Extract 3D Point Clouds for Each View ---
# Each element in output_dict['preds'] corresponds to a view's point map.
for view_idx, pred in enumerate(output_dict['preds']):
point_cloud = pred['pts3d_in_other_view'].cpu().numpy()
print(f"Point Cloud Shape for view {view_idx}: {point_cloud.shape}") # shape: (1, 368, 512, 3), i.e., (1, Height, Width, XYZ)
Train model with chosen experiment configuration from configs/experiment/
python fast3r/train.py experiment=super_long_training/super_long_training
You can override any parameter from command line following Hydra override syntax:
python fast3r/train.py experiment=super_long_training/super_long_training trainer.max_epochs=20 trainer.num_nodes=2
To submit a multi-node training job with Slurm, use the following command:
python scripts/slurm/submit_train.py --nodes=<NODES> --experiment=<EXPERIMENT>
After training, you can run the demo with a lightning checkpoint with the following command:
python fast3r/viz/demo.py --is_lightning_checkpoint --checkpoint_dir=/path/to/super_long_training_999999
To evaluate on 3D reconstruction or camera pose estimation tasks, run:
python fast3r/eval.py eval=<eval_config>
<eval_config>
can be any of the evaluation configurations in configs/eval/. For example:
ablation_recon_better_inference_hp/ablation_recon_better_inference_hp
evaluates the 3D reconstruction on DTU, 7-Scenes and Neural-RGBD datasets.eval_cam_pose/eval_cam_pose_10views
evaluates the camera pose estimation on 10 views on CO3D dataset.
To evaluate camera poses on RealEstate10K dataset, run:
python scripts/fast3r_re10k_pose_eval.py --subset_file scripts/re10k_test_1800.txt
To evaluate multi-view depth estimation on Tanks and Temples, ETH-3D, DTU, and ScanNet datasets, follow the data download and preparation guide of robustmvd, install that repo's requirements.txt
into the current conda environment, and run:
python scripts/robustmvd_eval.py
Please follow DUSt3R's data preprocessing instructions to prepare the data for training and evaluation. The pre-processed data is compatible with the multi-view dataloaders in this repo.
For preprocessing the DTU, 7-Scene, and NRGBD datasets for evaluation, we follow Spann3r's data processing instructions.
- Q:
httpcore.ConnectError: All connection attempts failed
when launching the demo?- See #34. Download the example videos into a local directory.
- Q: Data pre-processing for BlendedMVS,
train_list.txt
is missing?- See #33.
- Q: Loading checkpoint to fine-tune Fast3R?
- See #25
- Q: Running demo on Windows? (TypeError: cannot pickle '_thread.RLock' object)
- See #28. It seems that some more work is needed to make the demo compatible with Windows - we hope the community could contribute a PR!
- Q: Completely messed-up point cloud output?
- See #21. Please make sure the cuROPE module is NOT installed.
- Q: My GPU doesn't support FlashAttention /
No available kernel. Aborting execution
?- See #17. Use
attn_implementation=pytorch_auto
option instead.
- See #17. Use
- Q:
TypeError: Fast3R.__init__() missing 3 required positional arguments: 'encoder_args', 'decoder_args', and 'head_args'
- See See #7. It is caused by a networking issue with downloading the model from Huggingface in some countries (e.g., China) - please pre-download the model checkpoint with a working networking configuration, and use a local path to load the model instead.
The code and models are licensed under the FAIR NC Research License.
See contributing and the code of conduct.
@InProceedings{Yang_2025_Fast3R,
title={Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass},
author={Jianing Yang and Alexander Sax and Kevin J. Liang and Mikael Henaff and Hao Tang and Ang Cao and Joyce Chai and Franziska Meier and Matt Feiszli},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month={June},
year={2025},
}
Fast3R is built upon a foundation of remarkable open-source projects. We deeply appreciate the contributions of these projects and their communities, whose efforts have significantly advanced the field and made this work possible.