[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Motion Capture from Internet Videos

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12347))

Included in the following conference series:

  • 6843 Accesses

Abstract

Recent advances in image-based human pose estimation make it possible to capture 3D human motion from a single RGB video. However, the inherent depth ambiguity and self-occlusion in a single view prohibit the recovery of as high-quality motion as multi-view reconstruction. While multi-view videos are not common, the videos of a celebrity performing a specific action are usually abundant on the Internet. Even if these videos were recorded at different time instances, they would encode the same motion characteristics of the person. Therefore, we propose to capture human motion by jointly analyzing these Internet videos instead of using single videos separately. However, this new task poses many new challenges that cannot be addressed by existing methods, as the videos are unsynchronized, the camera viewpoints are unknown, the background scenes are different, and the human motions are not exactly the same among videos. To address these challenges, we propose a novel optimization-based framework and experimentally demonstrate its ability to recover much more precise and detailed motion from multiple videos, compared against monocular motion capture methods.

J. Dong and Q. Shuai—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. In: ACM transactions on graphics (TOG), pp. 408–416 (2005)

    Google Scholar 

  2. Bo, L., Sminchisescu, C.: Twin gaussian processes for structured prediction. Int. J. Comput. Vis. 87, 28 (2010). https://doi.org/10.1007/s11263-008-0204-y

    Article  Google Scholar 

  3. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  4. Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: CVPR, pp. 3618–3625 (2013)

    Google Scholar 

  5. Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: Open pose: real time multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

    Google Scholar 

  6. Caspi, Y., Irani, M.: Spatio-temporal alignment of sequences. IEEE Trans. Pattern Anal. Mach. Intell. 24(11), 1409–1424 (2002)

    Article  Google Scholar 

  7. Chen, C.H., Ramanan, D.: 3D human pose estimation= 2D pose estimation + matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7035–7043 (2017)

    Google Scholar 

  8. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded Pyramid Network for Multi-Person Pose Estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7103–7112 (2018)

    Google Scholar 

  9. Dong, J., Jiang, W., Huang, Q., Bao, H., Zhou, X.: Fast and robust multi-person 3D pose estimation from multiple views. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7792–7801 (2019)

    Google Scholar 

  10. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Temporal cycle-consistency learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1801–1810 (2019)

    Google Scholar 

  11. Elhayek, A., et al.: Efficient convnet-based marker-less motion capture in general scenes with a low number of cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3810–3818 (2015)

    Google Scholar 

  12. Elhayek, A., Stoll, C., Hasler, N., Kim, K.I., Seidel, H.P., Theobalt, C.: Spatio-temporal motion tracking with unsynchronized cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1870–1877. IEEE (2012)

    Google Scholar 

  13. Elhayek, A., Stoll, C., Kim, K.I., Theobalt, C.: Outdoor human motion capture by simultaneous optimization of pose and camera parameters. Comput. Graph. Forum 34(6), 86–98 (2015)

    Article  Google Scholar 

  14. Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343 (2017)

    Google Scholar 

  15. Gall, J., Rosenhahn, B., Brox, T., Seidel, H.P.: Optimization and filtering for human motion capture. Int. J. Comput. Vis. 87(1–2), 75 (2010). https://doi.org/10.1007/s11263-008-0173-1

    Article  Google Scholar 

  16. Guan, P., Weiss, A., Balan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1381–1388. IEEE (2009)

    Google Scholar 

  17. Guler, R.A., Kokkinos, I.: Holopose: holistic 3D human reconstruction in-the-wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10884–10894 (2019)

    Google Scholar 

  18. Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 224–231 IEEE (2009)

    Google Scholar 

  19. Huang, Q.X., Guibas, L.: Consistent shape maps via semidefinite programming. Comput. Graph. Forum 32(5), 177–186 (2013)

    Article  Google Scholar 

  20. Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 2017 international conference on 3D vision (3DV), pp. 421–430. IEEE (2017)

    Google Scholar 

  21. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human 3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)

    Article  Google Scholar 

  22. Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8320–8329 (2018)

    Google Scholar 

  23. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)

    Google Scholar 

  24. Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5614–5623 (2019)

    Google Scholar 

  25. Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: Closing the loop between 3d and 2d human representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6050-6059 (2017)

    Google Scholar 

  26. Lee, C.S., Elgammal, A.: Coupled visual and kinematic manifold models for tracking. Int. J. Comput. Vis. 87(1–2), 118 (2010)

    Article  Google Scholar 

  27. Li, R., Tian, T.P., Sclaroff, S., Yang, M.H.: 3D human motion tracking with a coordinated mixture of factor analyzers. Int. J. Comput. Vis. 87(1–2), 170 (2010)

    Article  Google Scholar 

  28. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM transactions on graphics (TOG). 34(6), pp. 1–16 (2015)

    Google Scholar 

  29. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)

    Google Scholar 

  30. Moreno-Noguer, F.: 3D human pose estimation from a single image via distance matrix regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2823–2832 (2017)

    Google Scholar 

  31. Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In 2018 international conference on 3D vision (3DV), pp. 484–494. IEEE (2018)

    Google Scholar 

  32. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)

    Google Scholar 

  33. Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7307–7316 (2018)

    Google Scholar 

  34. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Harvesting multiple views for marker-less 3D human pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6988–6997 (2017)

    Google Scholar 

  35. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)

    Google Scholar 

  36. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7753-7762 (2019)

    Google Scholar 

  37. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (ToG) 36(6), 245 (2017)

    Article  Google Scholar 

  38. Saini, N., et al.: Markerless outdoor human motion capture using multiple autonomous micro aerial vehicles. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 823–832 (2019)

    Google Scholar 

  39. Sermanet, P., et al.: Time-contrastive networks: self-supervised learning from video. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1134-1141. IEEE (2018)

    Google Scholar 

  40. Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in neural information processing systems, pp. 1337–1344 (2008)

    Google Scholar 

  41. Sigal, L., Isard, M., Haussecker, H., Black, M.J.: Loose-limbed people: estimating 3D human pose and motion using non-parametric belief propagation. Int. J. Comput. Vis. 98(1), 15–48 (2012)

    Article  MathSciNet  Google Scholar 

  42. Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2602–2611 (2017)

    Google Scholar 

  43. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 529–545 (2018)

    Google Scholar 

  44. Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Learning to fuse 2D and 3D image cues for monocular body pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3941–3950 (2017)

    Google Scholar 

  45. Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2500–2509 (2017)

    Google Scholar 

  46. Tuytelaars, T., Van Gool, L.: Synchronizing video sequences. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, pp. 1–1. IEEE (2004)

    Google Scholar 

  47. Ukrainitz, Y., Irani, M.: Aligning sequences and actions by maximizing space-time correlations. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 538–550. Springer, Heidelberg (2006). https://doi.org/10.1007/11744078_42

    Chapter  Google Scholar 

  48. Wang, O., Schroers, C., Zimmer, H., Gross, M., Sorkine-Hornung, A.: Videosnapping: interactive synchronization of multiple videos. ACM Trans. Graph. (TOG) 33(4), 1–10 (2014)

    Google Scholar 

  49. Wang, Y., Liu, Y., Tong, X., Dai, Q., Tan, P.: Outdoor markerless motion capture with sparse handheld video cameras. IEEE Trans. Visual. Comput. Graph. 24(5), 1856–1866 (2017)

    Article  Google Scholar 

  50. Wolf, L., Zomet, A.: Wide baseline matching between unsynchronized video sequences. Int. J. Comput. Vis. 68(1), 43–52 (2006)

    Article  Google Scholar 

  51. Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10965–10974 (2019)

    Google Scholar 

  52. Xu, X., Dunn, E.: Discrete laplace operator estimation for dynamic 3D reconstruction. arXiv preprint arXiv:1908.11044 (2019)

  53. Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3D pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157 (2018)

    Google Scholar 

  54. Zanfir, A., Marinoiu, E., Zanfir, M., Popa, A.I., Sminchisescu, C.: Deep network for the integrated 3D sensing of multiple people in natural images. In: Advances in Neural Information Processing Systems, pp. 8410–8419 (2018)

    Google Scholar 

  55. Zheng, E., Ji, D., Dunn, E., Frahm, J.M.: Sparse dynamic 3D reconstruction from unsynchronized videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4435–4443 (2015)

    Google Scholar 

  56. Zhou, X., Zhu, M., Daniilidis, K.: Multi-image matching via fast alternating minimization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4032–4040 (2015)

    Google Scholar 

  57. Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3d human pose estimation from monocular video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4966-4975 (2016)

    Google Scholar 

  58. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)

    Google Scholar 

Download references

Acknowledgement

The authors would like to acknowledge support from NSFC (No. 61806176) and Fundamental Research Funds for the Central Universities (2019QNA5022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hujun Bao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dong, J., Shuai, Q., Zhang, Y., Liu, X., Zhou, X., Bao, H. (2020). Motion Capture from Internet Videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12347. Springer, Cham. https://doi.org/10.1007/978-3-030-58536-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58536-5_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58535-8

  • Online ISBN: 978-3-030-58536-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics