Abstract
A survey on very recent and efficient space-time methods for action recognition is presented. We select the methods with highest accuracy achieved on the challenging datasets such as: HMDB51, UCF101 and Hollywood2. This research focuses on two main space-time based approaches, namely the hand-crafted and deep learning features. We intuitively explain the selected pipelines and review good practices used in state-of-the-art methods including the best descriptors, encoding methods, deep architectures and classifiers. The best methods were chosen and some of them were explained in more details. Furthermore, we conclude how to improve the methods in speed as well as in accuracy and propose directions for further work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jain, M., Gemert, J.C., Snoek, C.G.M.: What do 15,000 object categories tell us about classifying and localizing actions? In: CVPR, pp. 46–55 (2015)
Peng, X., Zou, C.Q., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 581–595. Springer, Heidelberg (2014)
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp. 4305–4314 (2015)
Ni, B., Moulin, P., Yang, X., Yan, S.: Motion part regularization: improving action recognition via trajectory group selection. In: CVPR, pp. 3698–3706 (2015)
Lan, Z., Lin, X., Li, X., Hauptmann, A.G., Raj, B.: Beyond gaussian pyramid: multi-skip feature stacking for action recognition. In: CVPR, pp. 204–212 (2015)
Fernando, B., Gavves, E., Oramas, J., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: CVPR, pp. 5378–5387 (2015)
Shi, F., Laganiere, R., Petriu, E.: Gradient boundary histograms for action recognition. In: WACV, pp. 1107–1114 (2015)
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. In: arXiv preprint arxiv:1405.4506 [cs.CV] (2014)
Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. In: arXiv preprint arxiv:1504.05524 [cs.CV] (2015)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)
Liu, L., Shao, L., Li, X., Lu, K.: Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans. Cybern. 46, 158–170 (2015)
Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)
Wang, L., Xiong, Y., Wang, Z., Qiao, Y.: Towards good practices for very deep two-stream ConvNets. In: arXiv preprint arxiv:1507.02159 [cs.CV] (2015)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: arXiv preprint arxiv:1412.0767 [cs.CV] (2015)
Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: CVPR, pp. 1798–1807 (2015)
Xu, Z., Zhu, L., Yang, Y., Hauptmann, A.G.: UTS-CMU at THUMOS2015. In: THUMOS challenge 2015 (2015)
Gorban, A., Idrees, H., Jiang, Y.-G., Roshan Zamir, A., Laptev, I., Shah, M., Sukthankar, R.: THUMOS challenge: action recognition with large number of classes (2015). http://www.thumos.info/
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16 (2011)
Ke, S.-R., Thuc, H.L.U., Lee, Y.-J., Hwang, J.-N., Yoo, J.-H., Choi, K.-H.: A review on video-based human activity recognition. Computers 2(2), 88–131 (2013)
Cheng, G., Wan, Y., Saudagar, A.N., Namuduri, K., Buckles, B.P.: Advances in human action recognition: a survey. In: arXiv preprint arxiv:1501.05964 [cs.CV] (2015)
Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, pp. 124.1–124.11 (2009)
Uijlings, J., Duta, I.C., Sangineto, E., Sebe, N.: Video classification with densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off. IJMIR 4(1), 33–44 (2014)
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: ICCV, pp. 1817–1824 (2013)
Kantorov, V., Laptev, I.: Efficient feature extraction, encoding and classification for action recognition. In: CVPR, pp. 2593–2600 (2014)
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)
Shi, F., Petriu, E.M., Cordeiro, A.: Human action recognition from local part model. In: Proceedings of the IEEE International Haptic Audio Visual Environments and Games (HAVE) Workshop, pp. 35–38 (2011)
Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: CVPR, pp. 2595–2602 (2013)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, T., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732 (2014)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV, pp. 1550–5499 (2011)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936 (2009)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. In: arXiv preprint arxiv:1212.0402 [cs.CV] (2012)
Berg, A., Deng, J., Fei-Fei, L.: Large scale visual recognition challenge (ILSVRC) (2010). http://www.image-net.org/challenges/LSVRC/2010
Acknowledgments
This work has been supported by the National Centre for Research and Development (project UOD-DEM-1-183/001 “Intelligent video analysis system for behavior and event recognition in surveillance networks”).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wojciechowski, S. et al. (2016). Selected Space-Time Based Methods for Action Recognition. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9622. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49390-8_41
Download citation
DOI: https://doi.org/10.1007/978-3-662-49390-8_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49389-2
Online ISBN: 978-3-662-49390-8
eBook Packages: Computer ScienceComputer Science (R0)