More Web Proxy on the site http://driver.im/

article

A unified tree-based framework for joint action localization, recognition and segmentation

Authors:

Larry S. DavisAuthors Info & Claims

Computer Vision and Image Understanding, Volume 117, Issue 10

Pages 1345 - 1355

https://doi.org/10.1016/j.cviu.2012.09.008

Published: 01 October 2013 Publication History

Abstract

A unified tree-based framework for joint action localization, recognition and segmentation is proposed. An action is represented as a sequence of joint hog-flow descriptors extracted independently from each frame. During training, a set of action prototypes is first learned based on k-means clustering, and then a binary tree model is constructed from the set of action prototypes based on hierarchical k-means clustering. Each tree node is characterized by a hog-flow descriptor and a rejection threshold, and an initial action segmentation mask is defined for leaf nodes (corresponding to a prototype). During testing, an action is localized by mapping each test frame to its nearest neighbor prototype using a fast tree search method, followed by local search based tracking and global filtering based location refinement. An action is recognized by maximizing the sum of the joint probabilities of the action category and action prototype given an input sequence. An action pose from a test frame can be segmented by GrabCut algorithm using the initial segmentation mask from the matched leaf node as the user labeling. Our approach does not rely on background subtraction, and enables action localization and recognition in realistic and challenging conditions (such as crowded backgrounds). Experimental results show that our approach achieves start-of-art performances on the Weizmann dataset, CMU action dataset and UCF sports action dataset.

References

[1]

Actions as space-time shapes. ICCV.

[2]

Liu, J., Ali, S. and Shah, M., Recognizing human actions using multiple features. CVPR.

[3]

Li, F. and Nevatia, R., Single view human action recognition using key pose matching and viterbi path searching. CVPR.

[4]

Bobick, A. and Davis, J., The recognition of human movement using temporal templates. IEEE Trans. PAMI. v23 i3. 257-267.

[5]

Ke, Y., Sukthankar, R. and Hebert, M., Event detection in crowded videos. ICCV.

[6]

Laptev, I., Marszalek, M., Schmid, C. and Rozenfeld, B., Learning realistic human actions from movies. CVPR.

[7]

Mikolajczyk, K. and Uemura, H., Action recognition with motion-appearance vocabulary forest. CVPR.

[8]

Marszalek, M., Laptev, I. and Schmid, C., Actions in context. CVPR.

[9]

Lin, Z., Jiang, Z. and Davis, L.S., Recognizing actions by shape-motion prototype trees. ICCV.

[10]

Weinland, D., Ozuysal, M. and Fua, P., Making action recognition robust to occlusions and viewpoint changes. ECCV.

[11]

Liu, J., Shah, M., Kuipers, B. and Savarese, S., Cross-view action recognition via view knowledge transfer. CVPR.

[12]

Liu, J., Kuipers, B. and Savarese, S., Recognizing human actions by attributes. CVPR.

[13]

Qiu, Q., Jiang, Z. and Chellappa, R., Sparse dictionary-based representation and recognition of action attributes. ICCV.

[14]

Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y. and Huang, T.S., Action detection in complex scenes with spatial and temporal ambiguities. ICCV.

[15]

Seo, H. and Milanfar, P., Detection of human actions from a single example. ICCV.

[16]

Ke, Y., Sukthankar, R. and Hebert, M., Efficient visual event detection using volumetric features. ICCV.

[17]

Laptev, I. and Perez, P., Retrieving actions in movies. ICCV.

[18]

Lampert, C., Blaschko, M. and Hofmann, T., Beyond sliding windows: object localization by efficient subwindow search. CVPR.

[19]

Yuan, J., Liu, Z. and Wu, Y., Discriminative subvolume search for efficient action detection. CVPR.

[20]

Ali, S., Basharat, A. and Shah, M., Chaotic invariants for human action recognition. ICCV.

[21]

Efros, A.A., Berg, A.C., Mori, G. and Malik, J., Recognizing action at a distance. ICCV.

[22]

Fathi, A. and Mori, G., Action recognition by learning mid-level motion features. CVPR.

[23]

Sheikh, Y., Sheikh, M. and Shah, M., Exploring the space of a human action. ICCV.

[24]

Li, W., Zhang, Z. and Liu, Z., Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Trans. Circ. Syst. Video Technol. v18 i11. 1499-1510.

[25]

Niebles, J.C., Wang, H. and Fei-Fei, L., Unsupervised learning of human action categories using spatial-temporal words. Int'l J. Comput. Vision. v79 i3. 299-318.

[26]

Nowozin, S., Bakir, G. and Tsuda, K., Discriminative subsequence mining for action classification. ICCV.

[27]

Dollar, P., Rabaud, V., Cottrell, G. and Belongie, S., Behavior recognition via sparse spatio-temporal features. VS-PETS.

[28]

Jhuang, H., Serre, T., Wolf, L. and Poggio, T., A biologically inspired system for action recognition. ICCV.

[29]

Schindler, K. and Gool, L.V., Action snippets: how many frames does human action recognition require?. CVPR.

[30]

Yao, B. and Zhu, S., Learning deformable action templates from cluttered videos. ICCV.

[31]

Thurau, C. and Hlavac, V., Pose primitive based human action recognition in videos or still images. CVPR.

[32]

Weinland, D. and Boyer, E., Action recognition using exemplar-based embedding. CVPR.

[33]

Learning human actions via information maximization. CVPR.

[34]

Schuldt, C., Laptev, I. and Caputo, B., Recognizing human actions: a local SVM approach. ICPR.

[35]

Elgammal, A., Shet, V., Yacoob, Y. and Davis, L.S., Learning dynamics for exemplar-based gesture recognition. CVPR.

[36]

Veeraraghavan, A., Chellappa, R. and Roy-Chowdhury, A.K., The function space of an activity. CVPR.

[37]

Yao, A., Gall, J. and Gool, L.V., A hough transform-based voting framework for action recognition. CVPR.

[38]

Parameswaran, V. and Chellappa, R., View invariance for human action recognition. Int. J. Comput. Vision. v66 i1. 83-101.

[39]

Rodriguez, M., Ahmed, J. and Shah, M., A spatio-temporal maximum average correlation height filter for action recognition. CVPR.

[40]

Leibe, B., Leonardis, A. and Schiele, B., Robust object detection with interleaved categorization and segmentation. Int'l J. Comput. Vision. v77 i1. 259-289.

[41]

Barinova, O., Lempitsky, V. and Kohli, P., on detection of multiple object instances using hough transforms. CVPR.

[42]

Torralba, A., Murphy, K.P. and Freeman, W.T., Sharing visual features for multiclass and multiview object detection. IEEE Trans. PAMI. v29 i5. 854-869.

[43]

Dalal, N. and Triggs, B., Histograms of oriented gradients for human detection. CVPR.

[44]

Nister, D. and Stewenius, H., Scalable recognition with a vocabulary tree. CVPR.

[45]

Rother, C., Kolmogorov, V. and Blake, A., Interactive foreground extraction using iterated graph cuts. SIGGRAPH.

[46]

A. Kläser, M. Marszałek, I. Laptev, C. Schmid, Will person detection help bag-of-features action recognition? INRIA Tech. Report RR-7373, 2010.

[47]

Yeffet, L. and Wolf, L., Local trinary patterns for human action recognition. ICCV.

Cited By

Brun LPercannella GSaggese AVento M(2018)Action recognition by using kernels on aclets sequencesComputer Vision and Image Understanding10.1016/j.cviu.2015.09.003144:C(3-13)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1016/j.cviu.2015.09.003
Wang LLi RFang Y(2017)Power difference template for action recognitionMachine Vision and Applications10.1007/s00138-017-0848-028:5-6(463-473)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s00138-017-0848-0
Wang LLi RFang Y(2016)Gradient-layer feature transform for action detection and recognitionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2016.06.02340:PA(159-167)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.jvcir.2016.06.023
Show More Cited By

A unified tree-based framework for joint action localization, recognition and segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks

Recommendations

Weakly-supervised action localization based on seed superpixels
Abstract
In this paper, we present action localization based on weak supervision with seed superpixels. In order to benefit from the superpixel segmentation and to learn a priori knowledge we select the seed superpixels from the action and non-action areas ...
Space-Time Tree Ensemble for Action Recognition and Localization

Human actions are, inherently, structured patterns of body movements. We explore ensembles of hierarchical spatio-temporal trees, discovered directly from training data, to model these structures for action recognition and spatial localization. ...
Simultaneous Action Recognition and Localization Based on Multi-view Hough Voting
ACPR '13: Proceedings of the 2013 2nd IAPR Asian Conference on Pattern Recognition

One problem of conventional action recognition is that it requires both human detection and human tracking before recognition. Human pose and motion vary depending on the person's action, and such variances can complicate detection and tracking. To ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Vision and Image Understanding

Computer Vision and Image Understanding Volume 117, Issue 10

October, 2013

344 pages

ISSN:1077-3142

Issue’s Table of Contents

Copyright © Elsevier Inc. © 2012.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Brun LPercannella GSaggese AVento M(2018)Action recognition by using kernels on aclets sequencesComputer Vision and Image Understanding10.1016/j.cviu.2015.09.003144:C(3-13)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1016/j.cviu.2015.09.003
Wang LLi RFang Y(2017)Power difference template for action recognitionMachine Vision and Applications10.1007/s00138-017-0848-028:5-6(463-473)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s00138-017-0848-0
Wang LLi RFang Y(2016)Gradient-layer feature transform for action detection and recognitionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2016.06.02340:PA(159-167)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.jvcir.2016.06.023
Jin CLi SDo TKim H(2015)Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance CamerasProceedings, Part II, of the 16th Pacific-Rim Conference on Advances in Multimedia Information Processing -- PCM 2015 - Volume 931510.1007/978-3-319-24078-7_33(330-339)Online publication date: 16-Sep-2015
https://dl.acm.org/doi/10.1007/978-3-319-24078-7_33

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents