More Web Proxy on the site http://driver.im/

Article

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

Authors:

Johanna Carvajal,

Conrad SandersonAuthors Info & Claims

Revised Selected Papers of the PAKDD 2016 Workshops on Trends and Applications in Knowledge Discovery and Data Mining - Volume 9794

Pages 115 - 127

https://doi.org/10.1007/978-3-319-42996-0_10

Published: 19 April 2016 Publication History

Abstract

We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. Aï źgiven video containing several consecutive actions is processed via a sequence of overlapping temporal windows. Each frame in a temporal window is represented through selective low-level spatio-temporal features which efficiently capture relevant local dynamics. Features from each window are represented as a Fisher vector, which captures first and second order statistics. Instead of directly classifying each Fisher vector, it is converted into a vector of class probabilities. The final classification decision for each frame is then obtained by integrating the class probabilities at the frame level, which exploits the overlapping of the temporal windows. Experiments were performed on two datasets: s-KTH aï źstitched version of the KTH dataset to simulate multi-actions, and the challenging CMU-MMAC dataset. On s-KTH, the proposed approach achieves an accuracy of 85.0ï ź%, significantly outperforming two recent approaches based on GMMs and HMMs which obtained 78.3ï ź% and 71.2ï ź%, respectively. On CMU-MMAC, the proposed approach achieves an accuracy of 40.9ï ź%, outperforming the GMM and HMM approaches which obtained 33.7ï ź% and 38.4ï ź%, respectively. Furthermore, the proposed system is on average 40 times faster than the GMM based approach.

References

[1]

Buchsbaum, D., Canini, K.R., Griffiths, T.: Segmenting and recognizing human action using low-level video features. In: Annual Conference of the Cognitive Science Society 2011

[2]

Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR, pp. 3265---3272 2011

Digital Library

[3]

Shi, Q., Wang, L., Cheng, L., Smola, A.: Discriminative human action segmentation and recognition using semi-Markov model. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR, pp. 1---8 2008

[4]

Cheng, Y., Fan, Q., Pankanti, S., Choudhary, A.: Temporal sequence modeling for video event detection. In: Conference on Computer Vision and Pattern Recognition CVPR, pp. 2235---2242 2014

Digital Library

[5]

Borzeshi, E., Perez Concha, O., Xu, R., Piccardi, M.: Joint action segmentation and classification by an extended hidden Markov model. IEEE Sig. Process. Lett. 20, 1207---1210 2013

[6]

Carvajal, J., Sanderson, C., McCool, C., Lovell, B.C.: Multi-action recognition via stochastic modelling of optical flow and gradients. In: Workshop on Machine Learning for Sensory Data Analysis MLSDA, pp. 19---24. ACM 2014. http://dx.doi.org/10.1145/2689746.2689748

Digital Library

[7]

Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Adv. Neural Inf. Process. Syst. 11, 487---493 1998

Digital Library

[8]

Lasserre, J., Bishop, C.M.: Generative or discriminative? Getting the best of both worlds. In: Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A., West, M. eds. Bayesian Statistics, vol. 8, pp. 3---24. Oxford University Press, Oxford 2007

[9]

Csurka, G., Perronnin, F.: Fisher vectors: beyond bag-of-visual-words image representations. In: Richard, P., Braz, J. eds. VISIGRAPP 2010. CCIS, vol. 229, pp. 28---42. Springer, Heidelberg 2011

[10]

Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput. Vis. 105, 222---245 2013

Digital Library

[11]

Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. Br. Mach. Vis. Conf. BMVC 1241---124, 11 2009

[12]

Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with Fisher vectors on a compact feature set. In: International Conference on Computer Vision ICCV, pp. 1817---1824 2013

Digital Library

[13]

Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision ICCV 2013

Digital Library

[14]

Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64, 107---123 2005

Digital Library

[15]

Cao, L., Tian, Y., Liu, Z., Yao, B., Zhang, Z., Huang, T.: Action detection using multiple spatial-temporal interest point features. In: International Conference on Multimedia and Expo ICME, pp. 340---345 2010

[16]

Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. eds. ECCV 2012, Part VI. LNCS, vol. 7577, pp. 256---269. Springer, Heidelberg 2012

Digital Library

[17]

Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32, 288---303 2010

Digital Library

[18]

Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22, 2479---2494 2013

Digital Library

[19]

Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York 2006

Digital Library

[20]

Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265---292 2001

Digital Library

[21]

Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10, 61---74 1999

[22]

Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Int. Conf. Pattern Recogn. ICPR 3, 32---36 2004

Digital Library

[23]

De la Torre, F., Hodgins, J.K., Montano, J., Valcarcel, S.: Detailed human data acquisition of kitchen activities: the CMU-multimodal activity database CMU-MMAC. In: CHI Workshop on Developing Shared Home Behavior Datasets to Advance HCI and Ubiquitous Computing Research 2009

[24]

Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. eds. HBU 2011. LNCS, vol. 7065, pp. 29---39. Springer, Heidelberg 2011

Digital Library

[25]

Spriggs, E.H., Torre, F.D.L., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: IEEE Workshop on Egocentric Vision, CVPR 2009

[26]

Simonyan, K., Vedaldi, A., Zisserman, A.: Deep Fisher networks for large-scale image classification. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. eds. Advances in Neural Information Processing Systems, vol. 26, pp. 163---171 2013

[27]

Parkhi, O.M., Simonyan, K., Vedaldi, A., Zisserman, A.: A compact and discriminative face track descriptor. In: Conference on Computer Vision and Pattern Recognition CVPR 2014

Digital Library

Cited By

Zhou LNagahashi H(2017)Real-time Action Recognition Based on Key Frame DetectionProceedings of the 9th International Conference on Machine Learning and Computing10.1145/3055635.3056569(272-277)Online publication date: 24-Feb-2017
https://dl.acm.org/doi/10.1145/3055635.3056569

Recommendations

Spatio-temporal segments attention for skeleton-based action recognition
Abstract
Capturing the dependencies between joints is critical in skeleton-based action recognition. However, the existing methods cannot effectively capture the correlation of different joints between frames, which is very useful since ...
Temporal Localization of Actions with Actoms

We address the problem of localizing actions, such as opening a door, in hours of challenging video data. We propose a model based on a sequence of atomic action units, termed "actoms," that are semantically meaningful and characteristic for the action. ...
A discriminative structural model for joint segmentation and recognition of human actions

Achieving joint segmentation and recognition of continuous actions in a long-term video is a challenging task due to the varying durations of actions and the complex transitions of multiple actions. In this paper, a novel discriminative structural model ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Revised Selected Papers of the PAKDD 2016 Workshops on Trends and Applications in Knowledge Discovery and Data Mining - Volume 9794

April 2016

269 pages

ISBN:9783319429953

Editors:
Huiping Cao
New Mexico State University, Las Cruces, USA
,
Jinyan Li
University of Technology Sydney, Sydney, Australia
,
Ruili Wang
Massey University, Auckland, New Zealand

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 19 April 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou LNagahashi H(2017)Real-time Action Recognition Based on Key Frame DetectionProceedings of the 9th International Conference on Machine Learning and Computing10.1145/3055635.3056569(272-277)Online publication date: 24-Feb-2017
https://dl.acm.org/doi/10.1145/3055635.3056569

View Options

View options

Media

Figures

Other

Tables

View Table of Contents