Description
Hello,
I am trying to use COLMAP for sparse reconstruction of scenes captured via my robot. It appears there is alot of confusion (me included) on what the best way to perform 3d reconstruction with priori (poses, pointcloud, extrinsics, intriniscs, etc)
Some details on my data capture/setup:
My robot has a camera rig comprised of 4 cameras, (front, left, back, right) with known intrinsic and extrinsic calibration:
# Camera list with one line of data per camera:
# CAMERA_ID, MODEL, WIDTH, HEIGHT, PARAMS[]
1 PINHOLE 848 480 310.487854 310.625031 211.092224 120.834244
2 PINHOLE 848 480 308.943939 309.020142 214.595245 119.040802
3 PINHOLE 848 480 308.943939 309.020142 214.595245 119.040802
4 PINHOLE 848 480 310.487854 310.625031 211.092224 120.834244
We captured a total of 2136 images during this recording, 534 images per camera.
# Image list with one line of data per image:
# IMAGE_ID, QW, QX, QY, QZ, TX, TY, TZ, CAMERA_ID, NAME
# POINTS2D[] as (X, Y, POINT3D_ID)
1 0.694648 0.000000 0.719350 0.000000 3.317731 0.000000 6.073411 1 images/camera_1/snapshot_000001.png
2 0.896736 0.442221 0.015666 0.007726 -6.173411 -2.296332 1.635991 2 images/camera_2/snapshot_000001.png
3 -0.000000 0.694648 0.000000 0.719350 3.317731 0.000000 -6.273410 3 images/camera_3/snapshot_000001.png
4 -0.015666 -0.007726 0.896736 0.442221 6.173411 2.967934 -2.403423 4 images/camera_4/snapshot_000001.png
5 0.694565 0.000000 0.719430 0.000000 3.317392 0.000000 6.071439 1 images/camera_1/snapshot_000002.png
6 0.896734 0.442220 0.015770 0.007777 -6.171439 -2.296064 1.635785 2 images/camera_2/snapshot_000002.png
7 -0.000000 0.694565 0.000000 0.719430 3.317393 0.000000 -6.271439 3 images/camera_3/snapshot_000002.png
8 -0.015770 -0.007777 0.896734 0.442220 6.171439 2.967665 -2.403217 4 images/camera_4/snapshot_000002.png
9 0.694607 0.000000 0.719390 0.000000 3.316566 0.000000 6.068971 1 images/camera_1/snapshot_000003.png
10 0.896735 0.442221 0.015717 0.007751 -6.168971 -2.295408 1.635281 2 images/camera_2/snapshot_000003.png
11 -0.000000 0.694607 0.000000 0.719390 3.316566 0.000000 -6.268971 3 images/camera_3/snapshot_000003.png
12 -0.015717 -0.007751 0.896735 0.442221 6.168971 2.967009 -2.402713 4 images/camera_4/snapshot_000003.png
Camera images are arranged in subfolders corresponding to each camera.
(base) ubuntu@ubuntu:~$ ls $DATASET_DIR/processed_colmap/images/
camera_1 camera_2 camera_3 camera_4
My process:
colmap feature_extractor \
--database_path $DATASET_DIR/database.db \
--image_path $DATASET_DIR/processed_colmap/images \
--ImageReader.camera_model PINHOLE \
--SiftExtraction.use_gpu 1 \
--ImageReader.single_camera_per_folder 1 \
--ImageReader.camera_params "310.487854,310.625031,211.092224,120.834244"
colmap sequential_matcher \
--database_path $DATASET_DIR/database.db \
--SiftMatching.use_gpu 1 \
--SiftMatching.max_num_matches 10000
(I would like to use the vocab_tree_matcher here for loop detection, however i am running into the issues described here: #2720, #527, #681)
I then run a custom python script to update the database with pose data. Relevant part is here:
cursor.execute('''
UPDATE images
SET qw=?, qx=?, qy=?, qz=?, tx=?, ty=?, tz=?
WHERE name=?
''', (*img_data['quat'], *img_data['trans'], db_name))
This updated the qvec and tvec for the database images.
Now from here, COLMAP docs recommend running point_triangulator. First we need to convert to .bin
colmap model_converter \
--input_path $DATASET_DIR/processed_colmap \
--output_path $DATASET_DIR/converted \
--output_type BIN
Now we have the bin files needed for the point triangulator. As a sanity check, i visualize the converted model in the GUI:
Poses for all 4 cameras at each "snapshot" appear valid, distances between "snapshots" appear valid. I believe relative trajectory and scale is valid here, but if anyone has any insights, please lmk.
Now from here, COLMAP docs recommend running point_triangulator:
colmap point_triangulator \
--database_path $DATASET_DIR/database.db \
--image_path $DATASET_DIR/processed_colmap \
--input_path $DATASET_DIR/converted \
--output_path $DATASET_DIR/triangulated
point_triangulator runs without error, however the resulting pointcloud seemingly has alot of invalid 3D points:
Many of these points appear invalid, my belief is that these points are way too far down (as if they are in the floor). The recording path is directly adjacent some tall walls, i would expect this structure to show in the resulting pointcloud.
For reference, here is the pointcloud captured by the robot during the recording. (These lidar scans are cropped, excluding anything outside 1m to 4m in height, as we wish to omit the floor and ceiling, and capture the wall). Appears very different than the point_triangulator output:
My thoughts:
-
Is there a mis match between coordinate system conventions here? How might i go about validating this outside of visual verification?
-
My goal is to use this colmap output for 3DGS. Given that i already have ground truth camera poses, camera intrinsics, camera extrinsics, and lidar scan data is it possible to create these files myself and provide this to 3DGS? More specifically, every image in images.txt needs a corresponding "# POINTS2D[] as (X, Y, POINT3D_ID)" line. How might one achieve this? Index every point, For every image, filter out points which lie outside the camera frustrum, record x/y and ID of non-filtered points?
Any assistance here is appreciated.