[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A unified tree-based framework for joint action localization, recognition and segmentation

Published: 01 October 2013 Publication History

Abstract

A unified tree-based framework for joint action localization, recognition and segmentation is proposed. An action is represented as a sequence of joint hog-flow descriptors extracted independently from each frame. During training, a set of action prototypes is first learned based on k-means clustering, and then a binary tree model is constructed from the set of action prototypes based on hierarchical k-means clustering. Each tree node is characterized by a hog-flow descriptor and a rejection threshold, and an initial action segmentation mask is defined for leaf nodes (corresponding to a prototype). During testing, an action is localized by mapping each test frame to its nearest neighbor prototype using a fast tree search method, followed by local search based tracking and global filtering based location refinement. An action is recognized by maximizing the sum of the joint probabilities of the action category and action prototype given an input sequence. An action pose from a test frame can be segmented by GrabCut algorithm using the initial segmentation mask from the matched leaf node as the user labeling. Our approach does not rely on background subtraction, and enables action localization and recognition in realistic and challenging conditions (such as crowded backgrounds). Experimental results show that our approach achieves start-of-art performances on the Weizmann dataset, CMU action dataset and UCF sports action dataset.

References

[1]
Actions as space-time shapes. ICCV.
[2]
Liu, J., Ali, S. and Shah, M., Recognizing human actions using multiple features. CVPR.
[3]
Li, F. and Nevatia, R., Single view human action recognition using key pose matching and viterbi path searching. CVPR.
[4]
Bobick, A. and Davis, J., The recognition of human movement using temporal templates. IEEE Trans. PAMI. v23 i3. 257-267.
[5]
Ke, Y., Sukthankar, R. and Hebert, M., Event detection in crowded videos. ICCV.
[6]
Laptev, I., Marszalek, M., Schmid, C. and Rozenfeld, B., Learning realistic human actions from movies. CVPR.
[7]
Mikolajczyk, K. and Uemura, H., Action recognition with motion-appearance vocabulary forest. CVPR.
[8]
Marszalek, M., Laptev, I. and Schmid, C., Actions in context. CVPR.
[9]
Lin, Z., Jiang, Z. and Davis, L.S., Recognizing actions by shape-motion prototype trees. ICCV.
[10]
Weinland, D., Ozuysal, M. and Fua, P., Making action recognition robust to occlusions and viewpoint changes. ECCV.
[11]
Liu, J., Shah, M., Kuipers, B. and Savarese, S., Cross-view action recognition via view knowledge transfer. CVPR.
[12]
Liu, J., Kuipers, B. and Savarese, S., Recognizing human actions by attributes. CVPR.
[13]
Qiu, Q., Jiang, Z. and Chellappa, R., Sparse dictionary-based representation and recognition of action attributes. ICCV.
[14]
Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y. and Huang, T.S., Action detection in complex scenes with spatial and temporal ambiguities. ICCV.
[15]
Seo, H. and Milanfar, P., Detection of human actions from a single example. ICCV.
[16]
Ke, Y., Sukthankar, R. and Hebert, M., Efficient visual event detection using volumetric features. ICCV.
[17]
Laptev, I. and Perez, P., Retrieving actions in movies. ICCV.
[18]
Lampert, C., Blaschko, M. and Hofmann, T., Beyond sliding windows: object localization by efficient subwindow search. CVPR.
[19]
Yuan, J., Liu, Z. and Wu, Y., Discriminative subvolume search for efficient action detection. CVPR.
[20]
Ali, S., Basharat, A. and Shah, M., Chaotic invariants for human action recognition. ICCV.
[21]
Efros, A.A., Berg, A.C., Mori, G. and Malik, J., Recognizing action at a distance. ICCV.
[22]
Fathi, A. and Mori, G., Action recognition by learning mid-level motion features. CVPR.
[23]
Sheikh, Y., Sheikh, M. and Shah, M., Exploring the space of a human action. ICCV.
[24]
Li, W., Zhang, Z. and Liu, Z., Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Trans. Circ. Syst. Video Technol. v18 i11. 1499-1510.
[25]
Niebles, J.C., Wang, H. and Fei-Fei, L., Unsupervised learning of human action categories using spatial-temporal words. Int'l J. Comput. Vision. v79 i3. 299-318.
[26]
Nowozin, S., Bakir, G. and Tsuda, K., Discriminative subsequence mining for action classification. ICCV.
[27]
Dollar, P., Rabaud, V., Cottrell, G. and Belongie, S., Behavior recognition via sparse spatio-temporal features. VS-PETS.
[28]
Jhuang, H., Serre, T., Wolf, L. and Poggio, T., A biologically inspired system for action recognition. ICCV.
[29]
Schindler, K. and Gool, L.V., Action snippets: how many frames does human action recognition require?. CVPR.
[30]
Yao, B. and Zhu, S., Learning deformable action templates from cluttered videos. ICCV.
[31]
Thurau, C. and Hlavac, V., Pose primitive based human action recognition in videos or still images. CVPR.
[32]
Weinland, D. and Boyer, E., Action recognition using exemplar-based embedding. CVPR.
[33]
Learning human actions via information maximization. CVPR.
[34]
Schuldt, C., Laptev, I. and Caputo, B., Recognizing human actions: a local SVM approach. ICPR.
[35]
Elgammal, A., Shet, V., Yacoob, Y. and Davis, L.S., Learning dynamics for exemplar-based gesture recognition. CVPR.
[36]
Veeraraghavan, A., Chellappa, R. and Roy-Chowdhury, A.K., The function space of an activity. CVPR.
[37]
Yao, A., Gall, J. and Gool, L.V., A hough transform-based voting framework for action recognition. CVPR.
[38]
Parameswaran, V. and Chellappa, R., View invariance for human action recognition. Int. J. Comput. Vision. v66 i1. 83-101.
[39]
Rodriguez, M., Ahmed, J. and Shah, M., A spatio-temporal maximum average correlation height filter for action recognition. CVPR.
[40]
Leibe, B., Leonardis, A. and Schiele, B., Robust object detection with interleaved categorization and segmentation. Int'l J. Comput. Vision. v77 i1. 259-289.
[41]
Barinova, O., Lempitsky, V. and Kohli, P., on detection of multiple object instances using hough transforms. CVPR.
[42]
Torralba, A., Murphy, K.P. and Freeman, W.T., Sharing visual features for multiclass and multiview object detection. IEEE Trans. PAMI. v29 i5. 854-869.
[43]
Dalal, N. and Triggs, B., Histograms of oriented gradients for human detection. CVPR.
[44]
Nister, D. and Stewenius, H., Scalable recognition with a vocabulary tree. CVPR.
[45]
Rother, C., Kolmogorov, V. and Blake, A., Interactive foreground extraction using iterated graph cuts. SIGGRAPH.
[46]
A. Kläser, M. Marszałek, I. Laptev, C. Schmid, Will person detection help bag-of-features action recognition? INRIA Tech. Report RR-7373, 2010.
[47]
Yeffet, L. and Wolf, L., Local trinary patterns for human action recognition. ICCV.

Cited By

View all
  1. A unified tree-based framework for joint action localization, recognition and segmentation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Computer Vision and Image Understanding
      Computer Vision and Image Understanding  Volume 117, Issue 10
      October, 2013
      344 pages

      Publisher

      Elsevier Science Inc.

      United States

      Publication History

      Published: 01 October 2013

      Author Tags

      1. Action localization
      2. Action recognition
      3. Action segmentation
      4. Hierarchical k-means clustering
      5. Silhouette-based segmentation mask
      6. Tree classifier

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Action recognition by using kernels on aclets sequencesComputer Vision and Image Understanding10.1016/j.cviu.2015.09.003144:C(3-13)Online publication date: 31-Dec-2018
      • (2017)Power difference template for action recognitionMachine Vision and Applications10.1007/s00138-017-0848-028:5-6(463-473)Online publication date: 1-Aug-2017
      • (2016)Gradient-layer feature transform for action detection and recognitionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2016.06.02340:PA(159-167)Online publication date: 1-Oct-2016
      • (2015)Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance CamerasProceedings, Part II, of the 16th Pacific-Rim Conference on Advances in Multimedia Information Processing -- PCM 2015 - Volume 931510.1007/978-3-319-24078-7_33(330-339)Online publication date: 16-Sep-2015

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media