Abstract
This paper presents “Action-Gons”, a middle level representation for action recognition in videos. Actions in videos exhibit a reasonable level of regularity seen in human behavior, as well as a large degree of variation. One key property of action, compared with image scene, might be the amount of interaction among body parts, although scenes also observe structured patterns in 2D images. Here, we study high-order statistics of the interaction among regions of interest in actions and propose a mid-level representation for action recognition, inspired by the Julesz school of n-gon statistics. We propose a systematic learning process to build an over-complete dictionary of “Action-Gons”. We first extract motion clusters, named as action units, then sequentially learn a pool of action-gons with different granularities modeling different degree of interactions among action units. We validate the discriminative power of our learned action-gons on three challenging video datasets and show evident advantages over the existing methods.
This work was done when Yuwang Wang was an intern at Micrsoft Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code is available from: http://research.microsoft.com/en-us/downloads/dad6c31e-2c04-471f-b724-ded18bf70fe3.
- 2.
Code is based on http://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html.
References
Wang, H., Klser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103, 60–79 (2013)
Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: CVPR (2012)
Sadanand, S., Corso, J.: Action bank: a high-level representation of activity in video. In: CVPR (2012)
Wang, L., Qiao, Y., Tang, X.: Motionlets: mid-level 3D parts for human motion recognition. In: CVPR 2013 (2013)
Gaidon, A., Harchaoui, Z., Schmid, C.: Temporal localization of actions with actoms. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2782–2795 (2013)
Yuan, F., Xia, G.S., Sahbi, H., Prinet, V.: Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn. 45, 4182–4191 (2012)
Julesz, B., Gilbert, E.N., Victor, J.D.: Visual discrimination of texture with identical third-order statistics. Biol. Cybern. 31, 137–140 (1978)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: ICCV (2009)
Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2003–2010 (2011)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)
Le, Q., Zou, W., Yeung, S., Ng, A.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D gradients. In: BMVC (2008)
Bilinski, P., Bremond, F.: Contextual statistics of space-time ordered features for human action recognition. In: Proceedings of the 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2012, pp. 228–233 (2012)
Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 508–521. Springer, Heidelberg (2010)
Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, CVPR 2009, pp. 2004–2011 (2009)
Zhu, J., Wang, B., Yang, X., Zhang, W., Zhuowen, T.: Action recognition with actons. In: ICCV (2013)
Si, Z., Pei, M., Yao, Z., Zhu, S.C.: Unsupervised learning of event and-or grammar and semantics from video. In: ICCV (2011)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision, Sydney, Australia (2013)
Tabatabaei, S.S., Coates, M., Rabbat, M.G.: Ganc: greedy agglomerative normalized cut. CoRR abs/1105.0974 (2011)
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1568–1583 (2006)
Yuille, A., Rangarajan, A.: The concave-convex procedure (CCCP). Neural Comput. 15, 915–936 (2003)
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: IEEE Intenational Conference on Computer Vision (ICCV), Sydney, Australia (2013)
Michael Sapienza, F.C., Torr, P.H.: Learning discriminative space-time actions from weakly labelled videos. In: BMVC (2012)
Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: CVPR 2013 (2013)
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR 2013 (2013)
Brendel, W., Todorovic, S.: Activities as time series of human postures. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 721–734. Springer, Heidelberg (2010)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2046–2053 (2010)
Wu, X., Xu, D., Duan, L., Luo, J., Jia, Y.: Action recognition using multilevel features and latent structural SVM. IEEE Trans. Circ. Syst. Video Technol. 23, 1422–1431 (2013)
Acknowledge
Zhuowen Tu is supported by NSF IIS-1216528(IIS-1360566) and NSF award IIS-0844566(IIS-1360568).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, Y., Wang, B., Yu, Y., Dai, Q., Tu, Z. (2015). Action-Gons: Action Recognition with a Discriminative Dictionary of Structured Elements with Varying Granularity. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-16814-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)