8000 GitHub - facebookresearch/fast3r: [CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

facebookresearch/fast3r

Repository files navigation

⚡️Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

${{\color{Red}\Huge{\textsf{ CVPR\ 2025\ }}}}$

Paper Project Website Gradio Demo Hugging Face Model

Teaser Image

Official implementation of Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass, CVPR 2025

Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli

Installation

# clone project
git clone https://github.com/facebookresearch/fast3r
cd fast3r

# create conda environment
conda create -n fast3r python=3.11 cmake=3.14.0 -y
conda activate fast3r

# install PyTorch (adjust cuda version according to your system)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 nvidia/label/cuda-12.4.0::cuda-toolkit -c pytorch -c nvidia

# install requirements
pip install -r requirements.txt

# install fast3r as a package (so you can import fast3r and use it in your own project)
pip install -e .

Note: Please make sure to NOT install the cuROPE module like in DUSt3R - it would mess up Fast3R's prediction.

Demo

Use the following command to run the demo:

python fast3r/viz/demo.py

This will automatically download the pre-trained model weights and config from Hugging Face Model.

The demo is a Gradio interface where you can upload images or a video and visualize the 3D reconstruction and camera pose estimation.

fast3r/viz/demo.py also serves as an example of how to use the model for inference.

Demo GIF 1 Demo GIF 2
Left: Upload a video. Right: Visualize the 3D Reconstruction
Click here to see example of: visualize confidence heatmap + play frame by frame + render a GIF
Demo GIF 3

Using Fast3R in Your Own Project

To use Fast3R in your own project, you can import the Fast3R class from fast3r.models.fast3r and use it as a regular PyTorch model.

import torch
from fast3r.dust3r.utils.image import load_images
from fast3r.dust3r.inference_multiview import inference
from fast3r.models.fast3r import Fast3R
from fast3r.models.multiview_dust3r_module import MultiViewDUSt3RLitModule

# --- Setup ---
# Load the model from Hugging Face
model = Fast3R.from_pretrained("jedyang97/Fast3R_ViT_Large_512")  # If you have networking issues, try pre-download the HF checkpoint dir and change the path here to a local directory
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Create a lightweight lightning module wrapper for the model.
# This provides functions to estimate camera poses, evaluate 3D reconstruction, etc.
lit_module = MultiViewDUSt3RLitModule.load_for_inference(model)

# Set model to evaluation mode
model.eval()
lit_module.eval()

# --- Load Images ---
# Provide a list of image file paths. Images can come from different cameras and aspect ratios.
filelist = ["path/to/image1.jpg", "path/to/image2.jpg", "path/to/image3.jpg"]
images = load_images(filelist, size=512, verbose=True)

# --- Run Inference ---
# The inference function returns a dictionary with predictions and view information.
output_dict, profiling_info = inference(
    images,
    model,
    device,
    dtype=torch.float32,  # or use torch.bfloat16 if supported
    verbose=True,
    profiling=True,
)

# --- Estimate Camera Poses ---
# This step estimates the camera-to-world (c2w) poses for each view using PnP.
poses_c2w_batch, estimated_focals = MultiViewDUSt3RLitModule.estimate_camera_poses(
    output_dict['preds'],
    niter_PnP=100,
    focal_length_estimation_method='first_view_from_global_head'
)
# poses_c2w_batch is a list; the first element contains the estimated poses for each view.
camera_poses = poses_c2w_batch[0]

# Print camera poses for all views.
for view_idx, pose in enumerate(camera_poses):
    print(f"Camera Pose for view {view_idx}:")
    print(pose.shape)  # np.array of shape (4, 4), the camera-to-world transformation matrix

# --- Extract 3D Point Clouds for Each View ---
# Each element in output_dict['preds'] corresponds to a view's point map.
for view_idx, pred in enumerate(output_dict['preds']):
    point_cloud = pred['pts3d_in_other_view'].cpu().numpy()
    print(f"Point Cloud Shape for view {view_idx}: {point_cloud.shape}")  # shape: (1, 368, 512, 3), i.e., (1, Height, Width, XYZ)

Training

Train model with chosen experiment configuration from configs/experiment/

python fast3r/train.py experiment=super_long_training/super_long_training

You can override any parameter from command line following Hydra override syntax:

python fast3r/train.py experiment=super_long_training/super_long_training trainer.max_epochs=20 trainer.num_nodes=2

To submit a multi-node training job with Slurm, use the following command:

python scripts/slurm/submit_train.py --nodes=<NODES> --experiment=<EXPERIMENT>

After training, you can run the demo with a lightning checkpoint with the following command:

python fast3r/viz/demo.py --is_lightning_checkpoint --checkpoint_dir=/path/to/super_long_training_999999

Evaluation

To evaluate on 3D reconstruction or camera pose estimation tasks, run:

python fast3r/eval.py eval=<eval_config>

<eval_config> can be any of the evaluation configurations in configs/eval/. For example:

  • ablation_recon_better_inference_hp/ablation_recon_better_inference_hp evaluates the 3D reconstruction on DTU, 7-Scenes and Neural-RGBD datasets.
  • eval_cam_pose/eval_cam_pose_10views evaluates the camera pose estimation on 10 views on CO3D dataset.

To evaluate camera poses on RealEstate10K dataset, run:

python scripts/fast3r_re10k_pose_eval.py  --subset_file scripts/re10k_test_1800.txt

To evaluate multi-view depth estimation on Tanks and Temples, ETH-3D, DTU, and ScanNet datasets, follow the data download and preparation guide of robustmvd, install that repo's requirements.txt into the current conda environment, and run:

python scripts/robustmvd_eval.py

Dataset Preprocessing

Please follow DUSt3R's data preprocessing instructions to prepare the data for training and evaluation. The pre-processed data is compatible with the multi-view dataloaders in this repo.

For preprocessing the DTU, 7-Scene, and NRGBD datasets for evaluation, we follow Spann3r's data processing instructions.

FAQ

  • Q: httpcore.ConnectError: All connection attempts failed when launching the demo?
    • See #34. Download the example videos into a local directory.
  • Q: Data pre-processing for BlendedMVS, train_list.txt is missing?
  • Q: Loading checkpoint to fine-tune Fast3R?
  • Q: Running demo on Windows? (TypeError: cannot pickle '_thread.RLock' object)
    • See #28. It seems that some more work is needed to make the demo compatible with Windows - we hope the community could contribute a PR!
  • Q: Completely messed-up point cloud output?
    • See #21. Please make sure the cuROPE module is NOT installed.
  • Q: My GPU doesn't support FlashAttention / No available kernel. Aborting execution?
    • See #17. Use attn_implementation=pytorch_auto option instead.
  • Q: TypeError: Fast3R.__init__() missing 3 required positional arguments: 'encoder_args', 'decoder_args', and 'head_args'
    • See See #7. It is caused by a networking issue with downloading the model from Huggingface in some countries (e.g., China) - please pre-download the model checkpoint with a working networking configuration, and use a local path to load the model instead.

License

The code and models are licensed under the FAIR NC Research License.

Contributing

See contributing and the code of conduct.

Citation

@InProceedings{Yang_2025_Fast3R,
    title={Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass},
    author={Jianing Yang and Alexander Sax and Kevin J. Liang and Mikael Henaff and Hao Tang and Ang Cao and Joyce Chai and Franziska Meier and Matt Feiszli},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month={June},
    year={2025},
}

Acknowledgement

Fast3R is built upon a foundation of remarkable open-source projects. We deeply appreciate the contributions of these projects and their communities, whose efforts have significantly advanced the field and made this work possible.

Star History

Star History Chart

About

[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0