Abstract
Local optimization and filtering have been widely applied to model-based 3D human motion capture. Global stochastic optimization has recently been proposed as promising alternative solution for tracking and initialization. In order to benefit from optimization and filtering, we introduce a multi-layer framework that combines stochastic optimization, filtering, and local optimization. While the first layer relies on interacting simulated annealing and some weak prior information on physical constraints, the second layer refines the estimates by filtering and local optimization such that the accuracy is increased and ambiguities are resolved over time without imposing restrictions on the dynamics. In our experimental evaluation, we demonstrate the significant improvements of the multi-layer framework and provide quantitative 3D pose tracking results for the complete HumanEva-II dataset. The paper further comprises a comparison of global stochastic optimization with particle filtering, annealed particle filtering, and local optimization.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58.
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., & Davis, J. (2005). Scape: shape completion and animation of people. ACM Transactions on Graphics, 24(3), 408–416.
Balan, A., Sigal, L., & Black, M. (2005). A quantitative evaluation of video-based 3D person tracking. In IEEE workshop on VS-PETS (pp. 349–356).
Balan, A., Sigal, L., Black, M., Davis, J., & Haussecker, H. (2007). Detailed human shape and pose from images. In IEEE conference on computer vision and pattern recognition.
Borgefors, G. (1986). Distance transformations in digital images. Computer Vision, Graphics, and Image Processing, 34(3).
Bray, M., Kohli, P., & Torr, P. (2006). Posecut: simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In European conference on computer vision (pp. 642–655).
Bray, M., Koller-Meier, E., & Gool, L. V. (2007). Smart particle filtering for high-dimensional tracking. Computer Vision and Image Understanding, 106(1), 116–129.
Bregler, C. (1997). Learning and recognizing human dynamics in video sequences. In IEEE conference on computer vision and pattern recognition.
Bregler, C., & Malik, J. (1998). Tracking people with twists and exponential maps. In IEEE conference on computer vision and pattern recognition (pp. 8–15).
Bregler, C., Malik, J., & Pullen, K. (2004). Twist based acquisition and tracking of animal and human kinematics. International Journal of Computer Vision, 56(3), 179–194.
Brox, T., Rousson, M., Deriche, R., & Weickert, J. (2003). Unsupervised segmentation incorporating colour, texture, and motion. In Lecture notes in computer science : Vol. 2756. Computer analysis of images and patterns (pp. 353–360). Berlin: Springer.
Brox, T., Rosenhahn, B., & Weickert, J. (2005). Three-dimensional shape knowledge for joint image segmentation and pose estimation. In Lecture notes in computer science : Vol. 3663. Pattern recognition (DAGM) (pp. 109–116). Berlin: Springer.
Brox, T., Rosenhahn, B., Kersting, U., & Cremers, D. (2006). Nonparametric density estimation for human pose tracking. In Lecture notes in computer science : Vol. 4174. Pattern recognition (DAGM) (pp. 546–555). Berlin: Springer.
Cheung, K., Baker, S., & Kanade, T. (2005). Shape-from-silhouette across time, part Ii: applications to human modeling and markerless motion tracking. International Journal of Computer Vision, 63(3), 225–245.
Choo, K., & Fleet, D. (2001). People tracking using hybrid Monte Carlo filtering. In International conference on Computer vision (pp. 321–328).
CMU (2007). Graphics lab motion capture database. http://mocap.cs.cmu.edu.
Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision, 61(2), 185–205.
Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 1144–1149).
Douc, R., Cappe, O., & Moulines, E. (2005). Comparison of resampling schemes for particle filtering. In International symposium on image and signal processing and analysis (pp. 64–69).
Doucet, A., de Freitas, N., & Gordon, N. (Eds.) (2001). Sequential Monte Carlo methods in practice. New York: Springer.
Fossati, A., Dimitrijevic, M., Lepetit, V., & Fua, P. (2007). Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Gall, J., Brox, T., Rosenhahn, B., & Seidel, H. P. (2007a). Global stochastic optimization for robust and accurate human motion capture. (Tech. Rep. MPI-I-2007-4-008). Max-Planck-Institut für Informatik, Germany.
Gall, J., Potthoff, J., Schnoerr, C., Rosenhahn, B., & Seidel, H. P. (2007b). Interacting and annealing particle filters: mathematics and a recipe for applications. Journal of Mathematical Imaging and Vision, 28(1), 1–18.
Gall, J., Rosenhahn, B., & Seidel, H. P. (2007c). Clustered stochastic optimization for object recognition and pose estimation. In Lecture notes in computer science : Vol. 4713. Pattern recognition (pp. 32–41). Berlin: Springer.
Gall, J., Rosenhahn, B., & Seidel, H. P. (2008). Drift-free tracking of rigid and articulated objects. In IEEE conference on computer vision and pattern recognition.
Gavrila, D., & Davis, L. (1996). 3D model-based tracking of humans in action: a multi-view approach. In IEEE conference on computer vision and pattern recognition (pp. 73–80).
Hogg, D. (1983). Model-based vision: a program to see a walking person. Image and Vision Computing, 1(1), 5–20.
Isard, M., & Blake, A. (1996). Contour tracking by stochastic propagation of conditional density. In European conference on computer vision (pp. 343–356).
Isard, M., & Blake, A. (1998). A smoothing filter for condensation. In European conference on computer vision (pp. 767–781).
Kakadiaris, I., & Metaxas, D. (1996). Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In IEEE conference on computer vision and pattern recognition (pp. 81–87).
Kalman, R. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME—Journal of Basic Engineering, 82(Series D), 35–45.
Kehl, R., Bray, M., & Gool, L. V. (2005). Full body tracking from multiple views using stochastic sampling. In IEEE conference on computer vision and pattern recognition (pp. 129–136).
Lee, M., & Nevatia, R. (2006). Human pose tracking using multi-level structured models. In European conference on computer vision (pp. 368–381).
Moeslund, T., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126.
Moon, K., & Pavlovic, V. (2006). Impact of dynamics on subspace embedding and tracking of sequences. In IEEE conference on computer vision and pattern recognition (pp. 198–205).
Moral, P. D. (2004). Feynman-Kac formulae. Genealogical and interacting particle systems with applications. New York: Springer.
Mundermann, L., Corazza, S., & Andriacchi, T. (2007). Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models. In Computer vision and pattern recognition (pp. 1–6).
Pennec, X., & Ayache, N. (1998). Uniform distribution, distance and expectation problems for geometric features processing. Journal of Mathematical Imaging and Vision, 9(1), 49–67.
Puzicha, J., Buhmann, J. M., Rubner, Y., & Tomasi, C. (1999). Empirical evaluation of dissimilarity measures for color and texture. In International conference on computer vision (pp. 1165–1172).
Ramanan, D., Forsyth, D., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 65–81.
Rosenhahn, B., Brox, T., Smith, D., Gurney, J., & Klette, R. (2006). A system for marker-less human motion estimation. Künstliche Intelligenz, 1, 45–51.
Rosenhahn, B., Brox, T., & Seidel, H. P. (2007a). Scaled motion dynamics for markerless motion capture. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Rosenhahn, B., Brox, T., & Weickert, J. (2007b). Three-dimensional shape knowledge for joint image segmentation and pose tracking. International Journal of Computer Vision, 73(3), 243–262.
Rosenhahn, B., Klette, R., & Metaxas, D. (Eds.) (2008). Computational imaging and vision : Vol. 36. Human motion—understanding, modelling, capture and animation. Netherlands: Springer.
Schraudolph, N. (1999). Local gain adaptation in stochastic gradient descent. In International conference on artificial neural networks (pp. 569–574).
Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision (pp. 702–718).
Sigal, L., & Black, M. (2006). Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion (Tech. Rep. CS-06-08). Brown University.
Sigal, L., Bhatia, S., Roth, S., Black, M., & Isard, M. (2004). Tracking loose-limbed people. In IEEE conference on computer vision and pattern recognition (pp. 421–428).
Sminchisescu, C., & Triggs, B. (2003). Estimating articulated human motion with covariance scaled sampling. The International Journal of Robotics Research, 22(6), 371–391.
Stolfi, J. (1991). Oriented projective geometry: a framework for geometric computation. Boston: Academic Press.
Urtasun, R., & Fua, P. (2004). 3D human body tracking using deterministic temporal motion models. In European conference on computer vision (pp. 92–106).
Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with Gaussian process dynamical models. In IEEE conference on computer vision and pattern recognition (pp. 238–245).
Weickert, J., ter Haar Romeny, B., & Viergever, M. (1998). Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Transactions on Image Processing, 7, 398–410.
Williams, C., & Rasmussen, C. (1996). Gaussian processes for regression. In Advances in neural information processing systems.
Zhang, Z. (1994). Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision, 13(2), 119–152.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Gall, J., Rosenhahn, B., Brox, T. et al. Optimization and Filtering for Human Motion Capture. Int J Comput Vis 87, 75–92 (2010). https://doi.org/10.1007/s11263-008-0173-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-008-0173-1