More Web Proxy on the site http://driver.im/

article

Tracking in object action space

Authors:

Volker KrüGer,

Dennis HerzogAuthors Info & Claims

Computer Vision and Image Understanding, Volume 117, Issue 7

Pages 764 - 789

https://doi.org/10.1016/j.cviu.2013.02.002

Published: 01 July 2013 Publication History

Abstract

In this paper we focus on the joint problem of tracking humans and recognizing human action in scenarios such as a kitchen scenario or a scenario where a robot cooperates with a human, e.g., for a manufacturing task. In these scenarios, the human directly interacts with objects physically by using/manipulating them or by, e.g., pointing at them such as in ''Give me that...''. To recognize these types of human actions is difficult because (a) they ought to be recognized independent of scene parameters such as viewing direction and (b) the actions are parametric, where the parameters are either object-dependent or as, e.g., in the case of a pointing direction convey important information. One common way to achieve recognition is by using 3D human body tracking followed by action recognition based on the captured tracking data. For the kind of scenarios considered here we would like to argue that 3D body tracking and action recognition should be seen as an intertwined problem that is primed by the objects on which the actions are applied. In this paper, we are looking at human body tracking and action recognition from a object-driven perspective. Instead of the space of human body poses we consider the space of the object affordances, i.e., the space of possible actions that are applied on a given object. This way, 3D body tracking reduces to action tracking in the object (and context) primed parameter space of the object affordances. This reduces the high-dimensional joint-space to a low-dimensional action space. In our approach, we use parametric hidden Markov models to represent parametric movements; particle filtering is used to track in the space of action parameters. We demonstrate its effectiveness on synthetic and on real image sequences using human-upper body single arm actions that involve objects.

References

[1]

Camera Calibration Toolbox for Matlab, 2011. <http://www.vision.caltech.edu/bouguetj/calib_doc/>.

[2]

OpenSG, 2011. <http://www.opensg.org>.

[3]

Vicon, 2011. <http://www.vicon.com>.

[4]

Aksoy, Eren Erdal, Abramov, Alexey, Dörr, Johannes, Ning, Kejun, Dellen, Babette and Wörgötter, Florentin, Learning the semantics of object-action relations by observation. Int. J. Robot. Res. v30 i10. 1229-1249.

[5]

Tamin Asfour, Kai Welke, Aleš Ude, Pedram Azad, Jan Hoeft, Rüdiger Dillmann, Perceiving objects and movements to generate actions on a humanoid robot, in: Proc. of International Conference on Robotics and Automation (ICRA), Workshop: From Features to Actions-Unifying Perspectives in Computational and Robot Vision, Rome, Italy, April 2007.

[6]

P. Azad, T. Asfour, R. Dillmann, Robust real-time stereo-based markerless human motion capture, in: IEEE/RAS International Conference on Humanoid Robots (Humanoids), Daejeon, Korea, December 2008.

[7]

M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes, in: Tenth IEEE International Conference on Computer Vision, vol. 2, ICCV 2005, October 2005, pp. 1395-1402.

[8]

Bobick, A. and Krüger, V., Visual analysis of humans. In: Moeslund, T., Hilton, A., Krüger, V., Segal, L. (Eds.), chapter on Human Action, Springer. pp. 279-288.

[9]

Joseph Bray, Markerless based Human Motion Capture: A Survey, Technical Report, Vision and VR Group Dept Systems Engineering, Brunel University, Uxbridge UB8 3PH, 2003.

[10]

Bregler, Christoph and Malik, Jitendra, Tracking people with twists and exponential maps. Comput. Vision Pattern Recog. 8-15.

[11]

Bub, Daniel N. and Masson, Michael E.J., Gestural knowledge evoked by objects as part of conceptual representations. Aphasiology. v20 i9. 1112-1124.

[12]

Jixu Chen, Minyoung Kim, Yu Wang, Qiang Ji, Switching Gaussian process dynamic models for simultaneous composite motion tracking and recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, 20-25 2009, pp. 2655-2662.

[13]

Comport, Andrew I., Marchand, Eric, Pressigout, Muriel and Chaumette, Franois, Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Trans. Visual. Comput. Graph. v12 i4. 615-628.

[14]

Jonathan Deutscher, Andrew Blake, Ian Reid, Articulated body motion capture by annealed particle filtering, in: Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 13-15 June 2000, pp. 126-133.

[15]

Drummond, Tom, IEEE Computer Society, and Cipolla, Roberto, Real-time visual tracking of complex structures. IEEE Trans. Pattern Anal. Mach. Intell. v24. 932-946.

Digital Library

[16]

Elgammal, Ahmed and Lee, Chan-Su, Inferring 3d body pose from silhouettes using activity manifold learning. Comput. Vision Pattern Recog. v2. 681-688.

[17]

J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, H.-P. Seidel, Motion capture using joint skeleton tracking and surface estimation, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, 20-25 2009, pp. 1746-1753.

[18]

Gall, Jürgen, Potthoff, Jürgen, Schnörr, Christoph, Rosenhahn, Bodo and Seidel, Hans-Peter, Interacting and annealing particle filters: mathematics and a recipe for applications. J. Math. Imag. Vis. v28 i1. 1-18.

[19]

Gibson, J.J., The theory of affordances. In: Perceiving, Acting, and Knowing. Towards an Ecological Psychology, John Wiley Sons Inc., Hoboken, NJ.

[20]

J. Graf, S. Puls, H. Woern, Recognition and understanding situations and activities with description logics for safe human robot cooperation, in: The Second International Conferences on Advanced Service Computing, November 21-26, 2010, Lisbon, Portugal.

[21]

Gratal, X., Romero, J., Bohg, J. and Kragic, D., Visual servoing on unknown objects. Mechatronics. v22 i4. 423-435.

[22]

Daniel Grest, Jan Woetzel, and Reinhard Koch. Nonlinear body pose estimation from depth images, in: Proc. of 27th Annual Symposium of the German Association for Pattern Recognition (DAGM) 2005, pages 285-292, Vienna, Austria, September 2005.

Digital Library

[23]

Gutemberg Guerra-Filho, Yiannis Aloimonos, A sensory-motor language for human activity understanding, in: Proc. 6th IEEE-RAS International Conference on Humanoid Robots, 4-6 December 2006, pp. 69-75.

[24]

Guerra-Filho, Gutemberg and Aloimonos, Yiannis, A language for human action. Computer. v40 i5. 42-51.

[25]

Abhinav Gupta, Larry S. Davis, Objects in action: an approach for combining action understanding and object perception, in: Proc. IEEE Conference on Computer Vision and Pattern Recognition CVPR '07, 17-22 June 2007, pp. 1-8.

[26]

Gupta, Abhinav, Mittal, Anurag and Davis, Larry S., Constraint integration for efficient multiview pose estimation with self-occlusions. IEEE Trans. Pattern Anal. Mach. Intell. v30 i3. 493-506.

[27]

N. Hasler, B. Rosenhahn, T. Thormahlen, M. Wand, J. Gall, H.-P. Seidel, Markerless motion capture with unsynchronized moving cameras, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, 20-25 2009, pp. 224-231.

[28]

Helbig, Hannah B., Graf, Markus and Kiefer, M., The role of action representations in visual object recognition. Exp. Brain Res. v174 i2. 221-228.

[29]

Dennis Herzog, Aleš Ude, Volker Krüger, Motion imitation and recognition using parametric hidden Markov models, in: Proc. 8th IEEE-RAS International Conference on Humanoid Robots 2008, Daejeon, Korea, South, December 2008, pp. 339-346.

[30]

Isard, Michael and Blake, Andrew, Condensation: conditional density propagation for visual tracking. Int. J. Comput. Vision. v29 i1. 5-28.

[31]

Hedvig Kjellström, Javier Romero, David Martı¿nez Mercado, Danica Kragic, Simultaneous visual recognition of manipulation actions and manipulated objects, in: Proc. of 10th European Conference on Computer Vision (ECCV), vol. 2, 2008, pp. 336-349.

Digital Library

[32]

Krueger, V., Sanmohan, Herzog, D., Ude, A. and Kragic, D., Learning actions from observations. IEEE Robot. Automat. Mag. v17 i2. 30-43.

[33]

Krüger, Volker, Kragic, Danica, Ude, Aleš and Geib, Christopher, The meaning of action: a review on action recognition and mapping. Adv. Robot. v21 i13. 1473-1501.

[34]

Kulic, D., Kragic, D. and Krüger, V., Learning action primitives. In: Visual Analysis of Humans: Looking at People, Springer. pp. 333-353.

[35]

V. Kyrki, I.S. Vicente, D. Kragic, J.-O. Eklundh, Action recognition and understanding using motor primitives, in: The 16th IEEE International Symposium on Robot and Human Interactive Communication, 2007. RO-MAN 2007, pp. 1113-1118.

[36]

On space-time interest points. Int. J. Comput. Vision. v64 i2-3. 107-123.

[37]

Lee, Mun Wai and Nevatia, Ramakant, Human pose tracking in monocular sequence using multilevel structured models. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI). v31 i1. 27-38.

[38]

Zhengdong Lu, Migual A. Carreira-Perpinan, Cristian Sminchisescu, People tracking with the Laplacian eigenmaps latent variable model, in: NIPS, 2007.

[39]

F.J. Lv, R. Nevatia, Recognition and segmentation of 3-d human action using hmm and multi-class adaboost, in: European Conference on Computer Vision, vol. 4, 2006, pp. 359-372.

[40]

J. MacCormick, Probabilistic Modelling and Stochastic Algorithms for Visual Localisation and Tracking, PhD Thesis, University of Oxford, 2000.

[41]

Moeslund, Thomas B. and Granum, Erik, A survey of computer vision-based human motion capture. Comput. Vision Image Understand. (CVIU). v81 i3. 231-268.

[42]

Moeslund, Thomas B., Hilton, Adrian and Krüger, Volker, A survey of advances in vision-based human motion capture and analysis. Comput. Vision Image Understand. (CVIU). v104 i2-3. 90-126.

[43]

H. Moon, R. Chellappa, A. Rosenfeld, 3D object tracking using shape-encoded particle propagation, in: Proc. Eighth IEEE International Conference on Computer Vision (ICCV) 2001, vol. 2, July 2001, pp. 307-314.

[44]

Newtson, Darren, Engquist, Gretchen A. and Bois, Joyce, The objective basis of behavior units. J. Pers. Soc. Psychol. v35 i12. 847-862.

[45]

Juan Carlos Niebles, Hongcheng Wang, Li Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, in: BMVC, 2006.

[46]

Park, Chang-Beom and Lee, Seong-Whan, Real-time 3d pointing gesture recognition for mobile robots with cascade HMM and particle filter. Image VIsion Comput. v29. 51-63.

[47]

Vladimir Pavlovic, James M. Rehg, Tat-Jen Cham, Kevin P. Murphy, A dynamic bayesian network approach to figure tracking using learned dynamic models, IEEE International Conference on Computer Vision, vol. 1, no. 94, 1999.

[48]

R. Plankers, P. Fua. Model-based silhouette extraction for accurate people tracking, in: European Conference on Computer Vision, Copenhagen, Denmark, May 2002.

[49]

. In: Pons-Moll, G., Rosenhahn, B., Moeslund, T.B. (Eds.), Chapter Model-Based Pose Estimation, Springer. pp. 139-170.

[50]

Rabiner, Lawrence R., A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE. v77 i2. 257-285.

[51]

Pradeep R. Ramana, Daniel Grest, Volker Krüger, Human action recognition in table-top scenarios: an HMM-based analysis to optimize the performance, in: Proceedings of Computer Analysis of Images and Patterns, Vienna, Austria, 2007, pp. 101-108.

[52]

Raskin, Leonid, Rivlin, Ehud and Rudzsky, Michael, Using gaussian process annealing particle filter for 3d human tracking. EURASIP J. Adv. Sig. Process. 592081

[53]

Haibing Ren, Guangyou Xu, SeokCheol Kee, Subject-independent natural action recognition, in: Proc. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, May 2004, pp. 523-528.

[54]

Rizzolatti, G., Fogassi, L. and Gallese, V., Neurophysiological mechanisms underlying the understanding and imitation of action. Nat. Rev. v2. 661-670.

[55]

Martin J. Russel, Anneliese E. Cook, Experimental evaluation of duration modeling techniques for automatic speech recognition, in: Proceedings of IEEE ICASSP, ICASSP 87, Dallas, USA, 1997, pp. 2376-2379.

[56]

J. Saboune, F. Charpillet, Using interval particle filtering for marker less 3d human motion capture, in: 17th IEEE International Conference on Tools with Artificial Intelligence, 2005. ICTAI 05, pp.-627 (16 2005).

[57]

C. Schuldt, I. Laptev, B. Caputo, Recognizing human actions: a local svm approach, in: Proceedings of the 17th International Conference on Pattern Recognition. ICPR 2004, August 2004, vol. 3, pp. 32-36.

[58]

Hedvig Sidenbladh, Michael J. Black, and D.J. Fleet. Stochastic tracking of 3d human figures using 2d image motion, in: European Conference on Computer Vision, 2000, pp. 702-718.

Digital Library

[59]

Hedvig Sidenbladh, Michael J. Black, Leonid Sigal, Implicit probabilistic models of human motion for synthesis and tracking, in: European Conference on Computer Vision, 2002, pp. 784-800.

Digital Library

[60]

C. Sminchisescu, B. Triggs, Covariance scaled sampling for monocular 3d body tracking, in: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2001, vol. 1, 2001, pp. 447-454.

[61]

Cristian Smincisescu, Bill Tiggs, Kinematic jump processes for monocular 3d human tracking, in: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Proceedings, 18-20, 2003.

[62]

Richard Souvenir, Justin Babbs, Learning the viewpoint manifold for action recognition, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1-7, 2008.

[63]

Graham W. Taylor, Leonid Sigal, David J. Fleet, Geoffrey E. Hinton, Dynamic binary latent variable models for 3d human pose tracking, in: IEEE Conference on Computer Vision and Pattern Recognition, 2010, CVPR 2010.

[64]

Trucco, E. and Verri, A., Introductory Techniques for 3-D Computer Vision. 1998. Prentice Hall.

[65]

Turaga, P., Chellappa, R., Subrahmanian, V.S. and Udrea, O., Machine recognition of human activities: a survey. IEEE Trans. Circ. Syst. Video Technol. v18 i11. 1473-1488.

[66]

R. Urtasun, D.J. Fleet, P. Fua, 3d people tracking with gaussian process dynamical models, in: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 17-22, 2006, pp. 238-245.

Digital Library

[67]

Raquel Urtasun, David J. Fleet, Aaron Hertzmann, Pascal Fua, Priors for people tracking from small training sets, in: ICCV '05: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05), vol. 1, Washington, DC, USA, 2005. IEEE Computer Society, pp. 403-410.

Digital Library

[68]

Raquel Urtasun, Pascal Fua, 3d human body tracking using deterministic temporal motion models, in: ECCV (3), 2004, pp. 92-106.

[69]

Wang, J.M., Fleet, D.J. and Hertzmann, A., Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. v30 i2. 283-298.

[70]

Wilson, Andrew D. and Bobick, Aaron F., Parametric hidden Markov models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI). v21 i9. 884-900.

[71]

Xiang, T. and Gong, S., Beyond tracking: modelling action and understanding behavior. Int. J. Comput. Vision. v67 i1. 21-51.

[72]

Alper Yilmaz, Mubarak Shah, Actions sketch: a novel action representation, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2005, CVPR 2005, vol. 1, pp. 984-989.

Cited By

Astolfi GRezende FPorto JMatsubara EPistori H(2021)Syntactic Pattern Recognition in Computer VisionACM Computing Surveys10.1145/344724154:3(1-35)Online publication date: 17-Apr-2021
https://dl.acm.org/doi/10.1145/3447241
Zech PRenaudo EHaller SZhang XPiater J(2019)Action representations in roboticsInternational Journal of Robotics Research10.1177/027836491983502038:5(518-562)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1177/0278364919835020

Tracking in object action space
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Human pose estimation and its application to action recognition: A survey
Highlights
- We provide a comprehensive survey of recent human pose estimation methods.
- We ...
Abstract
Human pose estimation aims at predicting the poses of human body parts in images or videos. Since pose motions are often driven by some specific human actions, knowing the body pose of a human is critical for action recognition. This ...
Tracking in action space
ECCV'10: Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I

The recognition of human actions such as pointing at objects ("Give me that...") is difficult because they ought to be recognized independent of scene parameters such as viewing direction. Furthermore, the parameters of the action, such as pointing ...
Action recognition from a distributed representation of pose and appearance
CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

We present a distributed representation of pose and appearance of people called the "poselet activation vector". First we show that this representation can be used to estimate the pose of people defined by the 3D orientations of the head and torso in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Vision and Image Understanding

Computer Vision and Image Understanding Volume 117, Issue 7

July, 2013

71 pages

ISSN:1077-3142

Issue’s Table of Contents

Copyright © Elsevier Inc. © 2013.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 July 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Astolfi GRezende FPorto JMatsubara EPistori H(2021)Syntactic Pattern Recognition in Computer VisionACM Computing Surveys10.1145/344724154:3(1-35)Online publication date: 17-Apr-2021
https://dl.acm.org/doi/10.1145/3447241
Zech PRenaudo EHaller SZhang XPiater J(2019)Action representations in roboticsInternational Journal of Robotics Research10.1177/027836491983502038:5(518-562)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1177/0278364919835020

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents