Monocular Absolute 3D Human Pose Estimation with an Uncalibrated Fixed Camera

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2143))

Included in the following conference series:

International Workshop on Frontiers of Computer Vision

142 Accesses

Abstract

In this paper, we propose a method for absolute 3D human pose estimation (HPE) with an uncalibrated monocular camera. In the case of analyzing workers’ movement with an existing uncalibrated camera, the previous method cannot estimate absolute human pose due to the lack of information on camera parameters. Our proposed method overcomes this limitation by determining the position and scale of humans based on the pose of surrounding objects. Specifically, we predict the intrinsic and extrinsic parameters of the camera through user-guided manual manipulation. Subsequently, the estimated human pose is transformed from local coordinates to global coordinates for each frame. This absolute coordinate representation allows for real-time prediction of human movements relative to objects. To assess the efficacy of our method, we conducted three kinds of experiments. A user study revealed that the proposed user-guided method archives accurate estimation of camera parameters. Quantitative evaluation using a public dataset demonstrated that our method can predict human pose with practical accuracy, providing a benchmark for future enhancements. Qualitative evaluation with a unique dataset showed that our method could easily generate digital twin representations across diverse environments and camera positions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Method of Constructing Fine-Grained Pose Evaluation Model

Consensus-Based Optimization for 3D Human Pose Estimation in Camera Coordinates

Article 02 February 2022

Real-Time Hand Pose Estimation Using Depth Camera

References

Alpala, L.O., Quiroga-Parra, D.J., Torres, J.C., Peluffo-Ordóñez, D.H.: Smart factory using virtual reality and online multi-user: towards a metaverse for experimental frameworks. Appl. Sci. 12(12) (2022)
Google Scholar
Benzine, A., Chabot, F., Luvison, B., Pham, Q.C., Achard, C.: PandaNet: anchor-based single-shot multi-person 3D pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6855–6864 (2020)
Google Scholar
Chen, C.H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5759–5767 (2017)
Google Scholar
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
Google Scholar
CMU Motion Capture Database. https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture. Accessed 1 Jan 2024
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding YOLO series in 2021. arXiv:2107.08430 (2021)
von Gioi, R.G., Jakubowicz, J., Morel, J.M., Randall, G.: LSD: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 32, 722–732 (2010)
Article Google Scholar
Han, P., Zhao, G.: Line-based initialization method for mobile augmented reality in aircraft assembly. Vis. Comput. 33, 1185–1196 (2017)
Article Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Google Scholar
JustWithJoints: Body controller with joint locations (2022). https://assetstore.unity.com/packages/3d/animations/justwithjoints-body-controller-with-joint-locations-127172. Accessed 1 Jan 2024
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1530–1538 (2017)
Google Scholar
Konishi, Y., Hanzawa, Y., Kawade, M., Hashimoto, M.: Fast 6D pose estimation from a monocular image using hierarchical pose trees. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 398–413. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_24
Chapter Google Scholar
Kritzinger, W., Karner, M., Traar, G., Henjes, J., Sihn, W.: Digital twin in manufacturing: a categorical literature review and classification. IFAC-PapersOnLine 51(11), 1016–1022 (2018)
Article Google Scholar
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 155–166 (2009)
Article Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6) (2015)
Google Scholar
MakeHuman. http://www.makehumancommunity.org. Accessed 1 Jan 2024
Mardia, K., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, Cambridge (1979)
Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2659–2668 (2017)
Google Scholar
Marullo, G., Tanzi, L., Piazzolla, P., Vezzetti, E.: 6D object position estimation from 2D images: a literature review. Multimedia Tools Appl. 82(16), 24605–24643 (2022)
Article Google Scholar
Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: International Conference on Computer Vision, pp. 10132–10141 (2019)
Google Scholar
Moteki, A., Saito, H.: Object pose estimation using edge images synthesized from shape information. Sensors 22(24), 9610 (2022)
Article Google Scholar
Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5632–5640 (2017)
Google Scholar
Orghidan, R., Salvi, J., Gordan, M., Orza, B.: Camera calibration using two or three vanishing points. In: 2012 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 123–130 (2012)
Google Scholar
Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-DoF object pose from semantic keypoints. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2011–2018 (2017)
Google Scholar
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1263–1272 (2017)
Google Scholar
Peng, S., Zhou, X., Liu, Y., Lin, H., Huang, Q., Bao, H.: PVNet: pixel-wise voting network for 6DoF object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 44(06), 3212–3223 (2022)
Article Google Scholar
Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net++: multi-person 2D and 3D pose detection in natural images. IEEE Trans. Pattern Anal. Mach. Intell. 42(05), 1146–1161 (2020)
Google Scholar
Shan, W., Lu, H., Wang, S., Zhang, X., Gao, W.: Improving robustness and accuracy via relative information encoding in 3D human pose estimation. In: ACM International Conference on Multimedia, pp. 3446–3454 (2021)
Google Scholar
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: International Conference on Computer Vision, pp. 2621–2630 (2017)
Google Scholar
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301 (2018)
Google Scholar
Ulrich, M., Wiedemann, C., Steger, C.: Combining scale-space and similarity-based aspect graphs for fast 3D object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1902–1914 (2012)
Article Google Scholar
Unity Asset Store. https://assetstore.unity.com. Accessed 1 Jan 2024
Wang, C., Li, J., Liu, W., Qian, C., Lu, C.: HMOR: hierarchical multi-person ordinal relations for monocular multi-person 3D pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 242–259. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_15
Chapter Google Scholar
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 16611–16621 (2021)
Google Scholar
Wu, J., et al.: Real-time object pose estimation with pose interpreter networks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6798–6805 (2018)
Google Scholar
Xu, C., Zhang, L., Cheng, L., Koch, R.: Pose estimation from line correspondences: a complete analysis and a series of solutions. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1209–1222 (2017)
Article Google Scholar
Yang, Z., Yu, X., Yang, Y.: DSC-PoseNet: learning 6DoF object pose estimation via dual-scale consistency. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3906–3915 (2021)
Google Scholar
Zhan, Y., Li, F., Weng, R., Choi, W.: Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 13106–13115 (2022)
Google Scholar
Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Article Google Scholar
Zheng, C., et al.: Deep learning-based human pose estimation: a survey. ACM Comput. Surv. 56(1), 1–37 (2023)
Article MathSciNet Google Scholar
Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., Ding, Z.: 3D human pose estimation with spatial and temporal transformers. In: International Conference on Computer Vision, pp. 11656–11665 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Science and Technology, Keio University, Yokohama, Japan
Atsunori Moteki & Hideo Saito
Artificial Intelligence Laboratory, Fujitsu Limited, Kawasaki, Japan
Atsunori Moteki, Yukio Hirai & Genta Suzuki

Authors

Atsunori Moteki
View author publications
You can also search for this author in PubMed Google Scholar
Yukio Hirai
View author publications
You can also search for this author in PubMed Google Scholar
Genta Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Hideo Saito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Atsunori Moteki .

Editor information

Editors and Affiliations

Tokyo University of Science, Tokyo, Japan
Go Irie
Chonnam National University, Gwangju, Korea (Republic of)
Choonsung Shin
NEC Corporation, Kawasaki, Kanagawa, Japan
Takashi Shibata
Tokyo University of Science, Tokyo, Japan
Kazuaki Nakamura

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moteki, A., Hirai, Y., Suzuki, G., Saito, H. (2024). Monocular Absolute 3D Human Pose Estimation with an Uncalibrated Fixed Camera. In: Irie, G., Shin, C., Shibata, T., Nakamura, K. (eds) Frontiers of Computer Vision. IW-FCV 2024. Communications in Computer and Information Science, vol 2143. Springer, Singapore. https://doi.org/10.1007/978-981-97-4249-3_5

Download citation

DOI: https://doi.org/10.1007/978-981-97-4249-3_5
Published: 30 June 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-4248-6
Online ISBN: 978-981-97-4249-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics