Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-view Geometry

He Chen¹²,
Pengfei Guo¹²,
Pengfei Li¹²,
Gim Hee Lee¹³ &
…
Gregory Chirikjian^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12348))

Included in the following conference series:

European Conference on Computer Vision

5293 Accesses
34 Citations

Abstract

Epipolar constraints are at the core of feature matching and depth estimation in current multi-person multi-camera 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances mainly due to two sources of ambiguity. The first is the mismatch of human joints resulting from the simple cues provided by the Euclidean distances between joints and epipolar lines. The second is the lack of robustness from the naive formulation of the problem as a least squares minimization. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation. Our method consists of two key components: a graph model for fast cross-view matching, and a maximum a posteriori (MAP) estimator for the reconstruction of the 3D human poses. We demonstrate the effectiveness and superiority of our proposed method on four benchmark datasets. Our code is available at: https://github.com/HeCraneChen/3D-Crowd-Pose-Estimation-Based-on-MVG.

H. Chen and P. Guo—Equal first author contribution.

G. H. Lee and G. Chirikjian—Jointly supervised this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views

3D pedestrian localization using multiple cameras: a generalizable approach

Article 08 July 2022

Crowd flow estimation from calibrated cameras

Article 15 October 2020

References

Baqué, P., Fleuret, F., Fua, P.: Deep occlusion reasoning for multi-camera multi-target detection. In: Proceedings of the ICCV, pp. 271–279 (2017)
Google Scholar
Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: Proceedings of the CVPR, pp. 1669–1676 (2014)
Google Scholar
Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures revisited: Multiple human pose estimation. IEEE Trans. PAMI 38(10), 1929–1942 (2015)
Article Google Scholar
Belagiannis, V., Wang, X., Schiele, B., Fua, P., Ilic, S., Navab, N.: Multiple human pose estimation with temporally consistent 3D pictorial structures. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 742–754. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16178-5_52
Chapter Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar
Campbell, D., Petersson, L., Kneip, L., Li, H.: Globally-optimal inlier set maximisation for simultaneous camera pose and feature correspondence. In: Proceedings of ICCV, pp. 1–10 (2017)
Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. In: arXiv preprint arXiv:1812.08008 (2018)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the CVPR, pp. 7291–7299 (2017)
Google Scholar
Chavdarova, T., et al.: WILDTRACK: a multi-camera HD dataset for dense unscripted pedestrian detection. In: Proceedings of the CVPR, pp. 5030–5039 (2018)
Google Scholar
Chen, C.H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: Proceedings of the CVPR, pp. 7035–7043 (2017)
Google Scholar
Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods, vol. 1. SIAM, Philadelphia (2000)
Book Google Scholar
Dinesh Reddy, N., Vo, M., Narasimhan, S.G.: CarFusion: combining point tracking and part detection for dynamic 3D reconstruction of vehicles. In: Proceedings CVPR, pp. 1906–1915 (2018)
Google Scholar
Dong, J., Jiang, W., Huang, Q., Bao, H., Zhou, X.: Fast and robust multi-person 3D pose estimation from multiple views. In: Proceedings of the CVPR, pp. 7792–7801 (2019)
Google Scholar
Duff, T., Kohn, K., Leykin, A., Pajdla, T.: PLMP-point-line minimal problems in complete multi-view visibility. In: Proceedings of the ICCV, pp. 1675–1684 (2019)
Google Scholar
Ess, A., Leibe, B., Schindler, K., Gool, L.V.: Robust multiperson tracking from a mobile platform. IEEE Trans. PAMI 31, 1831–1846 (2009)
Article Google Scholar
Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking. In: Proceedings of the CVPR, pp. 1–8. IEEE (2008)
Google Scholar
Fernando, T., Denman, S., Sridharan, S., Fookes, C.: Tracking by prediction: a deep generative model for mutli-person localisation and tracking. In: Proceedings of the WACV, pp. 1122–1132. IEEE (2018)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Harris, C.G., Stephens, M., et al.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, pp. 10–5244. Citeseer (1988)
Google Scholar
Iskakov, K., Burkov, E., Lempitsky, V., Malkov, Y.: Learnable triangulation of human pose. In: Proceedings of the ICCV, pp. 7718–7727 (2019)
Google Scholar
Jahangiri, E., Yuille, A.L.: Generating multiple diverse hypotheses for human 3D pose consistent with 2D joint detections. In: Proceedings of the ICCVW, pp. 805–814 (2017)
Google Scholar
Jonker, R., Volgenant, A.: A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4), 325–340 (1987)
Article MathSciNet Google Scholar
Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: Proceedings of the ICCV (2015)
Google Scholar
Kadkhodamohammadi, A., Padoy, N.: A generalizable approach for multi-view 3D human pose regression. arXiv preprint arXiv:1804.10462 (2018)
Korman, S., Milam, M., Soatto, S.: OATM: occlusion aware template matching by consensus set maximization. In: Proceedings of the CVPR, pp. 2675–2683 (2018)
Google Scholar
Kubo, H., Jayasuriya, S., Iwaguchi, T., Funatomi, T., Mukaigawa, Y., Narasimhan, S.G.: Programmable non-epipolar indirect light transport: Capture and analysis. IEEE Trans. VCG (2019)
Google Scholar
Li, C., Lee, G.H.: Generating multiple hypotheses for 3D human pose estimation with mixture density network. In: Proceedings of the CVPR, pp. 9887–9895 (2019)
Google Scholar
Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., Lu, C.: Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the CVPR, pp. 10863–10872 (2019)
Google Scholar
Li, Y., Agustsson, E., Gu, S., Timofte, R., Van Gool, L.: CARN: convolutional anchored regression network for fast and accurate single image super-resolution. In: Proceedings of the ECCV, p. 0 (2018)
Google Scholar
Li, Y., Gu, S., Mayer, C., Van Gool, L., Timofte, R.: Group sparsity: the hinge between filter pruning and decomposition for network compression. In: Proceedings of CVPR (2020)
Google Scholar
Li, Y., Tsiminaki, V., Timofte, R., Pollefeys, M., Van Gool, L.: 3D appearance super-resolution with deep learning. In: Proceedings of the CVPR, pp. 9671–9680 (2019)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, X., et al.: Extremely dense point correspondences using a learned feature descriptor. In: Proceedings of the CVPR (2020)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the ICCV, vol. 2, pp. 1150–1157 (1999)
Google Scholar
Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: Proceedings of the ICCVW, pp. 2174–2182 (2017)
Google Scholar
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the CVPR (2017)
Google Scholar
Reddy, N.D., Vo, M., Narasimhan, S.G.: Occlusion-Net: 2D/3D occluded keypoint localization using graph networks. In: Proceedings of the CVPR, pp. 7326–7335 (2019)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Sindagi, V.A., Patel, V.M.: Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the ICCV, pp. 1002–1012 (2019)
Google Scholar
Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of Gaussians body model. In: Proceedings of the ICCV, pp. 951–958. IEEE (2011)
Google Scholar
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33
Chapter Google Scholar
Vo, M., Yumer, E., Sunkavalli, K., Hadap, S., Sheikh, Y., Narasimhan, S.G.: Self-supervised multi-view person association and its applications. IEEE Trans. PAMI (2020)
Google Scholar
Wang, C., Wang, Y., Lin, Z., Yuille, A.L.: Robust 3D human pose estimation from single images or video sequences. IEEE Trans. PAMI 41(5), 1227–1241 (2018)
Article Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the CVPR, pp. 4724–4732 (2016)
Google Scholar
Windheuser, T., Cremers, D.: A convex solution to spatially-regularized correspondence problems. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 853–868. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_52
Chapter Google Scholar
Xin, S., Nousias, S., Kutulakos, K.N., Sankaranarayanan, A.C., Narasimhan, S.G., Gkioulekas, I.: A theory of fermat paths for non-line-of-sight shape reconstruction. In: Proceedings of the CVPR, pp. 6800–6809 (2019)
Google Scholar
Theobald, S., Schmitt, A., Diebold, P.: Comparing scaling agile frameworks based on underlying practices. In: Hoda, R. (ed.) XP 2019. LNBIP, vol. 364, pp. 88–96. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30126-2_11
Chapter Google Scholar

Download references

Acknowledgements

The authors would like to thank Yawei Li and Weixiao Liu for useful discussion. This work is supported in parts by the Office of Naval Research Award N00014-17-1-2142 and the Singapore MOE Tier 1 grant R-252-000-A65-114.

Author information

Authors and Affiliations

The Johns Hopkins University, Baltimore, USA
He Chen, Pengfei Guo, Pengfei Li & Gregory Chirikjian
National University of Singapore, Singapore, Singapore
Gim Hee Lee & Gregory Chirikjian

Authors

He Chen
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Li
View author publications
You can also search for this author in PubMed Google Scholar
Gim Hee Lee
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Chirikjian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to He Chen .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, H., Guo, P., Li, P., Lee, G.H., Chirikjian, G. (2020). Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-view Geometry. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12348. Springer, Cham. https://doi.org/10.1007/978-3-030-58580-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-58580-8_32
Published: 03 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58579-2
Online ISBN: 978-3-030-58580-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-view Geometry

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views

3D pedestrian localization using multiple cameras: a generalizable approach

Crowd flow estimation from calibrated cameras

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-view Geometry

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views

3D pedestrian localization using multiple cameras: a generalizable approach

Crowd flow estimation from calibrated cameras

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation