Abstract
In this paper, we propose a method for absolute 3D human pose estimation (HPE) with an uncalibrated monocular camera. In the case of analyzing workers’ movement with an existing uncalibrated camera, the previous method cannot estimate absolute human pose due to the lack of information on camera parameters. Our proposed method overcomes this limitation by determining the position and scale of humans based on the pose of surrounding objects. Specifically, we predict the intrinsic and extrinsic parameters of the camera through user-guided manual manipulation. Subsequently, the estimated human pose is transformed from local coordinates to global coordinates for each frame. This absolute coordinate representation allows for real-time prediction of human movements relative to objects. To assess the efficacy of our method, we conducted three kinds of experiments. A user study revealed that the proposed user-guided method archives accurate estimation of camera parameters. Quantitative evaluation using a public dataset demonstrated that our method can predict human pose with practical accuracy, providing a benchmark for future enhancements. Qualitative evaluation with a unique dataset showed that our method could easily generate digital twin representations across diverse environments and camera positions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alpala, L.O., Quiroga-Parra, D.J., Torres, J.C., Peluffo-Ordóñez, D.H.: Smart factory using virtual reality and online multi-user: towards a metaverse for experimental frameworks. Appl. Sci. 12(12) (2022)
Benzine, A., Chabot, F., Luvison, B., Pham, Q.C., Achard, C.: PandaNet: anchor-based single-shot multi-person 3D pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6855–6864 (2020)
Chen, C.H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5759–5767 (2017)
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
CMU Motion Capture Database. https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture. Accessed 1 Jan 2024
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding YOLO series in 2021. arXiv:2107.08430 (2021)
von Gioi, R.G., Jakubowicz, J., Morel, J.M., Randall, G.: LSD: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 32, 722–732 (2010)
Han, P., Zhao, G.: Line-based initialization method for mobile augmented reality in aircraft assembly. Vis. Comput. 33, 1185–1196 (2017)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
JustWithJoints: Body controller with joint locations (2022). https://assetstore.unity.com/packages/3d/animations/justwithjoints-body-controller-with-joint-locations-127172. Accessed 1 Jan 2024
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1530–1538 (2017)
Konishi, Y., Hanzawa, Y., Kawade, M., Hashimoto, M.: Fast 6D pose estimation from a monocular image using hierarchical pose trees. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 398–413. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_24
Kritzinger, W., Karner, M., Traar, G., Henjes, J., Sihn, W.: Digital twin in manufacturing: a categorical literature review and classification. IFAC-PapersOnLine 51(11), 1016–1022 (2018)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 155–166 (2009)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6) (2015)
MakeHuman. http://www.makehumancommunity.org. Accessed 1 Jan 2024
Mardia, K., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, Cambridge (1979)
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2659–2668 (2017)
Marullo, G., Tanzi, L., Piazzolla, P., Vezzetti, E.: 6D object position estimation from 2D images: a literature review. Multimedia Tools Appl. 82(16), 24605–24643 (2022)
Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: International Conference on Computer Vision, pp. 10132–10141 (2019)
Moteki, A., Saito, H.: Object pose estimation using edge images synthesized from shape information. Sensors 22(24), 9610 (2022)
Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5632–5640 (2017)
Orghidan, R., Salvi, J., Gordan, M., Orza, B.: Camera calibration using two or three vanishing points. In: 2012 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 123–130 (2012)
Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-DoF object pose from semantic keypoints. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2011–2018 (2017)
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1263–1272 (2017)
Peng, S., Zhou, X., Liu, Y., Lin, H., Huang, Q., Bao, H.: PVNet: pixel-wise voting network for 6DoF object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 44(06), 3212–3223 (2022)
Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net++: multi-person 2D and 3D pose detection in natural images. IEEE Trans. Pattern Anal. Mach. Intell. 42(05), 1146–1161 (2020)
Shan, W., Lu, H., Wang, S., Zhang, X., Gao, W.: Improving robustness and accuracy via relative information encoding in 3D human pose estimation. In: ACM International Conference on Multimedia, pp. 3446–3454 (2021)
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: International Conference on Computer Vision, pp. 2621–2630 (2017)
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301 (2018)
Ulrich, M., Wiedemann, C., Steger, C.: Combining scale-space and similarity-based aspect graphs for fast 3D object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1902–1914 (2012)
Unity Asset Store. https://assetstore.unity.com. Accessed 1 Jan 2024
Wang, C., Li, J., Liu, W., Qian, C., Lu, C.: HMOR: hierarchical multi-person ordinal relations for monocular multi-person 3D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 242–259. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_15
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 16611–16621 (2021)
Wu, J., et al.: Real-time object pose estimation with pose interpreter networks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6798–6805 (2018)
Xu, C., Zhang, L., Cheng, L., Koch, R.: Pose estimation from line correspondences: a complete analysis and a series of solutions. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1209–1222 (2017)
Yang, Z., Yu, X., Yang, Y.: DSC-PoseNet: learning 6DoF object pose estimation via dual-scale consistency. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3906–3915 (2021)
Zhan, Y., Li, F., Weng, R., Choi, W.: Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 13106–13115 (2022)
Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Zheng, C., et al.: Deep learning-based human pose estimation: a survey. ACM Comput. Surv. 56(1), 1–37 (2023)
Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., Ding, Z.: 3D human pose estimation with spatial and temporal transformers. In: International Conference on Computer Vision, pp. 11656–11665 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Moteki, A., Hirai, Y., Suzuki, G., Saito, H. (2024). Monocular Absolute 3D Human Pose Estimation with an Uncalibrated Fixed Camera. In: Irie, G., Shin, C., Shibata, T., Nakamura, K. (eds) Frontiers of Computer Vision. IW-FCV 2024. Communications in Computer and Information Science, vol 2143. Springer, Singapore. https://doi.org/10.1007/978-981-97-4249-3_5
Download citation
DOI: https://doi.org/10.1007/978-981-97-4249-3_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-4248-6
Online ISBN: 978-981-97-4249-3
eBook Packages: Computer ScienceComputer Science (R0)