Could use some guidance on Reconstruction using known camera poses and 3d Pointcloud data

Hello,

I am trying to use COLMAP for sparse reconstruction of scenes captured via my robot. It appears there is alot of confusion (me included) on what the best way to perform 3d reconstruction with priori (poses, pointcloud, extrinsics, intriniscs, etc)

Some details on my data capture/setup:

My robot has a camera rig comprised of 4 cameras, (front, left, back, right) with known intrinsic and extrinsic calibration:

# Camera list with one line of data per camera:
#   CAMERA_ID, MODEL, WIDTH, HEIGHT, PARAMS[]
1 PINHOLE 848 480 310.487854 310.625031 211.092224 120.834244
2 PINHOLE 848 480 308.943939 309.020142 214.595245 119.040802
3 PINHOLE 848 480 308.943939 309.020142 214.595245 119.040802
4 PINHOLE 848 480 310.487854 310.625031 211.092224 120.834244

We captured a total of 2136 images during this recording, 534 images per camera.

# Image list with one line of data per image:
#   IMAGE_ID, QW, QX, QY, QZ, TX, TY, TZ, CAMERA_ID, NAME
#   POINTS2D[] as (X, Y, POINT3D_ID)
1 0.694648 0.000000 0.719350 0.000000 3.317731 0.000000 6.073411 1 images/camera_1/snapshot_000001.png

2 0.896736 0.442221 0.015666 0.007726 -6.173411 -2.296332 1.635991 2 images/camera_2/snapshot_000001.png

3 -0.000000 0.694648 0.000000 0.719350 3.317731 0.000000 -6.273410 3 images/camera_3/snapshot_000001.png

4 -0.015666 -0.007726 0.896736 0.442221 6.173411 2.967934 -2.403423 4 images/camera_4/snapshot_000001.png

5 0.694565 0.000000 0.719430 0.000000 3.317392 0.000000 6.071439 1 images/camera_1/snapshot_000002.png

6 0.896734 0.442220 0.015770 0.007777 -6.171439 -2.296064 1.635785 2 images/camera_2/snapshot_000002.png

7 -0.000000 0.694565 0.000000 0.719430 3.317393 0.000000 -6.271439 3 images/camera_3/snapshot_000002.png

8 -0.015770 -0.007777 0.896734 0.442220 6.171439 2.967665 -2.403217 4 images/camera_4/snapshot_000002.png

9 0.694607 0.000000 0.719390 0.000000 3.316566 0.000000 6.068971 1 images/camera_1/snapshot_000003.png

10 0.896735 0.442221 0.015717 0.007751 -6.168971 -2.295408 1.635281 2 images/camera_2/snapshot_000003.png

11 -0.000000 0.694607 0.000000 0.719390 3.316566 0.000000 -6.268971 3 images/camera_3/snapshot_000003.png

12 -0.015717 -0.007751 0.896735 0.442221 6.168971 2.967009 -2.402713 4 images/camera_4/snapshot_000003.png

Camera images are arranged in subfolders corresponding to each camera.

(base) ubuntu@ubuntu:~$ ls $DATASET_DIR/processed_colmap/images/
camera_1  camera_2  camera_3  camera_4

My process:

colmap feature_extractor \
    --database_path $DATASET_DIR/database.db \
    --image_path $DATASET_DIR/processed_colmap/images \
    --ImageReader.camera_model PINHOLE \
    --SiftExtraction.use_gpu 1 \
    --ImageReader.single_camera_per_folder 1 \
    --ImageReader.camera_params "310.487854,310.625031,211.092224,120.834244"

colmap sequential_matcher \
    --database_path $DATASET_DIR/database.db \
    --SiftMatching.use_gpu 1 \
    --SiftMatching.max_num_matches 10000

(I would like to use the vocab_tree_matcher here for loop detection, however i am running into the issues described here: #2720, #527, #681)

I then run a custom python script to update the database with pose data. Relevant part is here:

cursor.execute('''
    UPDATE images 
    SET qw=?, qx=?, qy=?, qz=?, tx=?, ty=?, tz=?
    WHERE name=?
''', (*img_data['quat'], *img_data['trans'], db_name))

This updated the qvec and tvec for the database images.

Now from here, COLMAP docs recommend running point_triangulator. First we need to convert to .bin

colmap model_converter \
    --input_path $DATASET_DIR/processed_colmap \
    --output_path $DATASET_DIR/converted \
    --output_type BIN

Now we have the bin files needed for the point triangulator. As a sanity check, i visualize the converted model in the GUI:

Poses for all 4 cameras at each "snapshot" appear valid, distances between "snapshots" appear valid. I believe relative trajectory and scale is valid here, but if anyone has any insights, please lmk.

Now from here, COLMAP docs recommend running point_triangulator:

colmap point_triangulator \
    --database_path $DATASET_DIR/database.db \
    --image_path $DATASET_DIR/processed_colmap \
    --input_path $DATASET_DIR/converted \
    --output_path $DATASET_DIR/triangulated

point_triangulator runs without error, however the resulting pointcloud seemingly has alot of invalid 3D points:

Many of these points appear invalid, my belief is that these points are way too far down (as if they are in the floor). The recording path is directly adjacent some tall walls, i would expect this structure to show in the resulting pointcloud.

For reference, here is the pointcloud captured by the robot during the recording. (These lidar scans are cropped, excluding anything outside 1m to 4m in height, as we wish to omit the floor and ceiling, and capture the wall). Appears very different than the point_triangulator output:

My thoughts:

Is there a mis match between coordinate system conventions here? How might i go about validating this outside of visual verification?
My goal is to use this colmap output for 3DGS. Given that i already have ground truth camera poses, camera intrinsics, camera extrinsics, and lidar scan data is it possible to create these files myself and provide this to 3DGS? More specifically, every image in images.txt needs a corresponding "# POINTS2D[] as (X, Y, POINT3D_ID)" line. How might one achieve this? Index every point, For every image, filter out points which lie outside the camera frustrum, record x/y and ID of non-filtered points?

Any assistance here is appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions