Abstract
When a precise 3D reconstruction of an object or person is attempted, one typically starts from a multi-view setup with cameras spread out all around the investigation area. A triangulation of the matching joints is then performed to retrieve the 3D coordinates. However, calibrating such a setup typically requires dedicated equipment and elaborated test procedures. In this paper, we will demonstrate a calibration method based only on the detection of one or more people walking through the field of view. This, in effect, allows the calibration to happen simultaneously with the measurements being taken, which is practical when dealing with uncontrolled environments. We will also show that this calibration procedure is more accurate than a typical incremental calibration procedure using a chessboard. Conceptually, the novelty that we propose is in using semantic information (e.g. the position of the left shoulder) rather than appearance-based information to drive the calibration, as this type of information is less viewpoint dependent. Note that here we use human pose keypoints but for larger outdoor scenes, car keypoints could be used as well.
Similar content being viewed by others
Notes
Present in the OpenCV-library function cv2.solvePnP.
Downloaded from https://www.epfl.ch/labs/cvlab/data/data-pom-index-php
Some thresholds were increased tenfold (i.e. five_point_algo_threshold, triangulation_threshold, resection_threshold and bundle_outlier_fixed_threshold) to account for joint detection error and retriangulation_ratio and bundle_new_points_ratio were set to 0.8. No extensive tuning was needed for these parameters.
Performed using DLT from [21]
References
Claeys, A., Hoedt, S., Domken, C., Aghezzaf, E., Claeys, D., Cottyn, J.: Methodology to integrate ergonomics information in contextualized digital work instructions, In: 9th CIRP Conference on Assembly Technology and Systems, Procedia CIRP, vol. 106, pp. 168-173, (2022)
Tripicchio, P., D’Avella, S., Camacho-Gonzalez, G., Landolfi, L., Baris, G., Avizzano, C.A., Filippeschi, A.: Multi-camera extrinsic calibration for real-time tracking in large outdoor environments. J. Sens. Actuator Netw. 11, 40 (2022)
Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: An accurate o(n) solution to the PnP problem. Int. J. Comput. Vision 81, 155–166 (2009)
Cefalu, A., Haala, N., Fritsch, D.: Structureless bundle adjustment with self-calibration using accumulated constraints, In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. III-3, (2016)
Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S., Szeliski, R.: Building rome in a day. ICCV, (2009)
Svoboda, T., Martinec, D., Pajdla, T.: A convenient multicamera self-calibration for virtual environments. Presence 14(4), 407–422 (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Xie, T., Dai, K., Wang, K., Li, R., Zhao, L.: Deepmatcher: a deep transformer-based network for robust and accurate local feature matching, arxiv: 2301.02993v1, (2021)
Fleuret, F., Berclaz, J., Lengagne, R., Fua, P.: Multi-camera people tracking with a probabilistic occupancy map. IEEE Trans Pattern Anal Machine Intell 30(2), 267–282 (2008). https://doi.org/10.1109/TPAMI.2007.1174
Puwein, J., Ballan, L., Ziegler, R., Pollefeys, M.: Joint camera pose estimation and 3d human pose estimation in a multi-camera setup. In: Proceeding of Asian Conference on Computer Vision, pp. 473–487. Springer, (2014)
Takahashi, K., Mikami, D., Isogawa, M., Kimata, H.: Human pose as calibration pattern; 3d human pose estimation with multiple unsynchronized and uncalibrated cameras. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPR), (2018)
Xu, Y., Li, YJ., Weng, X., Kitani, K.: Wide-baseline multi-camera calibration using person re-identification. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPR), (2021)
Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946, (2015)
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Higher-hrnet: Scale-aware representation learning for bottom-up human pose estimation. arxiv:1908.10357v3, (2019)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, ISBN 1-57735-004-9, pp. 226-231, (1996)
Dehaeck, S., Domken, C., Bey-Temsamani, A., Abedrabbo, G.: A strong geometric baseline for cross-view matching of multi-person 3D pose estimation from multi-view images. In: Image Analysis and Processing–ICIAP 2022, ISBN: 978-3-031-06430-2, pp.77-88, (2022)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Dong, J., Jiang, W., Huang, Q., Bao, H., Zhou, X.: Fast and robust multi-person 3d pose estimation from multiple views. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPR), (2019)
Tanke, J., Gall, J.: Iterative greedy matching for 3d human pose tracking from multiple views. In: German conference on Pattern Recognition, (2019)
Gendreau, M., Potvin, J.: Handbook of Metaheuristics, Springer, ISBN 978-3-319-91085-7, (2019)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge, England (2004)
Song, X., Wang, P., Zhou, D., Zhu, R., Guan, C., Dai, Y., Su, H., Li, H., Yang R.: ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving, arXiv:1811.12222, (2018)
Acknowledgements
We would like to acknowledge the help of E. Kikken for the many interesting discussions. We also thank E. Hage for the construction of the Unity dataset. This research received funding from the Flanders Make ‘2018-134-Ergo-Eyehand-CONV-ICON’ project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dehaeck, S., Domken, C., Bey-Temsamani, A. et al. Wide-baseline multi-camera calibration from a room filled with people. Machine Vision and Applications 34, 45 (2023). https://doi.org/10.1007/s00138-023-01395-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01395-1