Structure from Motion

Overview

We will implement an entire incremental Structure-from-Motion (SfM) pipeline. We will start by implementing an initialization technique for incremental SfM. Next, given known intrinsics and extrinsics we will implement the incremental SfM over the given 4 images. Finally, utilizing the off-the-shelf tools such as COLMAP on our own sequences and generate the reconstructions.

Baseline Reconstruction

Implementing initialization for incremental SfM, i.e., reconstruction on two views.

Dataset

Two "real world" images in data/monument folder, which also contains "noisy" keypoint matches.

Visualization of Reconstruction

View #1	View #2

Extrinsics `R` and `t` of view 2

Rotation: [[-0.99120836, -0.00829782, -0.13204971] [-0.07078976, -0.80991137, 0.58226487] [-0.11178009, 0.58649358, 0.80220352]]

Translation: [-0.00779614, -0.18063356, -1. ]

Implementation

Use the eight point algorithm to estimate the fundamental matrix F. The correspondences can be noisy. Hence, use RANSAC.
Compute Essential matrix using $E = K_2^T F K_1$. Set the first two singular values to be equal to the mean of first two singular values and the third one to be 0. This will singularize E.
Initialize the first camera to be at world centre and aligned.
Decompose the E to find 4 possible extrisincs as $[U W V^T | u_3], [U W V^T | -u_3], [U W^T V^T | u_3], [U W^T V^T | -u_3]$ where W= [[0 -1 0], [1 0 0], [0 0 1]]
Choose the matrix for which all points lie in front of both cameras. Equivalently, you can find the camera that minimizes reprojection loss.
Use the obtained camera matrix to triangulate and get 3D points.

Incremental Structure-from-Motion

Incremental sfM for 4 images. "clean" 2D keypoint correspondences across all pairs of these images are in the folder data/data_cow/correspondences/. Starting from 2 images (whose extrinsincs are provided) and assuming that the intrinsics remain fixed for all the four images, we will incrementally add each of the other 2 images.

Dataset

Images: data/data_cow/images Correspondences (all pairs): data/data_cow/correspondences Cameras: data/data_cow/cameras

Visualization of reconstruction

Using Camera #1 and #2	After adding Camera #3	After adding Camera #4

Extrinsics `R` and `t` of Camera #3 and Camera #4.

Camera #3	Camera #4
`Rotation :[[ 0.98466192 0.00646281 0.17435346] [-0.00986351 0.99977751 0.0186452 ] [-0.17419416 -0.02007895 0.98450659]]`, `Translation: [-3.01585037e-03, -6.43987804e-03, 9.97564903e+00]`	`Rotation :[[ 0.99030413 0.0111717 -0.13846633][-0.00780679 0.99966144 0.02482063][ 0.13869674 -0.023499 0.99005607]]` , `Translation [-5.82749044e-04, -5.34493938e-03, 9.99912505e+00]`

Implementation

Triangulate between 2D correspondences of image 1 and 2 (known to be exact) using the camera matrices of camera 1 and 2. Thi will give the first set of 3D points.
Find 2D points in image 3 corresponding to 2D points in image 1 and image 2. These are the points in image 3 for which 3D points are now available.
Use newly obtained 2D-3D correspondences for camera 3 to solve for pose estimation problem. This gives the extrinsics of camera 3.
Triangulate between points correspondences between images 1 and 3 and images 2 and 3 to obtain new set of 3D points.
Repeat steps 2 through 4 for to find extrinsics of camera 4 and add new points. Include correspondences from images 1, 2 and 3.

Reconstruct your own scene!

For this part, we will run an off-the-shelf incremental SfM toolbox such as COLMAP and COLMAP GUI on our own captured multi-view images.

Generate reconstructions

For this reconstruction, we choose our own data. This data could either be a sequence having rigid objects, any object (for e.g. a mug or a vase in your vicinity), or any scene we wish to reconstruct in 3D.

Datasets:

bunny (36 views)
sacre_coeur (100 views)

Example Multi-view images -- bunny	Output

Example Multi-view images -- sacre_coeur	Output

Stress testing hyperparameters of COLMAP

What happens if we reduce number of input images?

When we reduce the number of images such that the overlap between the views reduces, the reconstruction starts to fail. This happens because the feature extraction cannot find matching features between views and therefore the downstream pipeline breaks. Here we show results with varying number of views on the bunny dataset. With 4 and 9 views the reconstruction fails entirely. With 18 views, (i) only the cameras with sufficient overlap are reconstructed, (ii) the reconstruction is sparse and (iii) the cameras are estimated with poor accuracy With 36, views the reconstructions seems to be accurate with all cameras predicted with high accuracy.

Number of views	Output
4	Failed with `No good initial image pair found.`
9	Failed with `No good initial image pair found.`
18 (large overlap b/w views)
18 (small overlap b/w views)
36

When does the reconstruction pipeline breaks?

The pipelines breaks with insufficient views with little to no overlap as shown in above. Here, we also test how COLMAP performs when the input views are texture-less. In such cases, feature matching fails feature matches. As tested on the paper roll dataset, the reconstruction fails to reconstruct the paper roll and or the cameras accurately. The reconstructed points are very sparse with no shape. The paper roll has a flat texture and the floor has repeating patterns. Therefore, COLMAP is sensitive to the texture of the input views.

Example Multi-view images	Output

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
out		out
src		src
.gitignore		.gitignore
README.md		README.md
install_colmap.txt		install_colmap.txt
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Structure from Motion

Overview

Baseline Reconstruction

Dataset

Visualization of Reconstruction

Extrinsics `R` and `t` of view 2

Implementation

Incremental Structure-from-Motion

Dataset

Visualization of reconstruction

Extrinsics `R` and `t` of Camera #3 and Camera #4.

Implementation

Reconstruct your own scene!

Generate reconstructions

Datasets:

Stress testing hyperparameters of COLMAP

What happens if we reduce number of input images?

When does the reconstruction pipeline breaks?

About

Uh oh!

Releases

Packages

Languages

xifeng205/SfMf

Folders and files

Latest commit

History

Repository files navigation

Structure from Motion

Overview

Baseline Reconstruction

Dataset

Visualization of Reconstruction

Extrinsics R and t of view 2

Implementation

Incremental Structure-from-Motion

Dataset

Visualization of reconstruction

Extrinsics R and t of Camera #3 and Camera #4.

Implementation

Reconstruct your own scene!

Generate reconstructions

Datasets:

Stress testing hyperparameters of COLMAP

What happens if we reduce number of input images?

When does the reconstruction pipeline breaks?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Extrinsics `R` and `t` of view 2

Extrinsics `R` and `t` of Camera #3 and Camera #4.

Packages