Abstract
The explosive growth of surveillance cameras and its 7 * 24 recording period brings massive surveillance videos data. Therefore how to efficiently retrieve the rare but important event information inside the videos is eager to be solved. Recently deep convolutinal networks shows its outstanding performance in event recognition on general videos. Hence we study the characteristic of surveillance video context and propose a very competitive ConvNets approach for real-time event recognition on surveillance videos. Our approach adopts two-steam ConvNets to respectively recognition spatial and temporal information of one action. In particular, we propose to use fast feature cascades and motion history image as the template of spatial and temporal stream. We conducted our experiments on UCF-ARG and UT-interaction dataset. The experimental results show that our approach acquires superior recognition accuracy and runs in real-time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004)
UCF-ARG Data Set. http://crcv.ucf.edu/data/UCF-ARG.php. Accessed 10 Nov 2015
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011)
Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: CVPR (2012)
Bilinski, P., Bremond, F.: Statistics of pairwise co-occurring local spatio-temporal features for human action recognition. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part I. LNCS, vol. 7583, pp. 311–320. Springer, Heidelberg (2012)
Bradski, G.: Dr. Dobb’s J. Softw. Tools (2000). Article ID 2236121
Cropley, J.: Top video surveillance trends for 2016, February 2016. https://technology.ihs.com/api/binary/572252
Davis, J.W., Bobick, A.E.: The representation and recognition of human movement using temporal templates. In: IEEE Computer Society Conference on CVPR, 1997, pp. 928–934. IEEE (1997)
Dollár, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC, vol. 2, p. 7. Citeseer (2010)
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp. 2625–2634 (2015)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACMMM, pp. 675–678. ACM (2014)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on CVPR, pp. 1725–1732. IEEE (2014)
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on CVPR, pp. 1–8. IEEE (2008)
Ryoo, M.S., Chen, C.-C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 270–285. Springer, Heidelberg (2010)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Wang, L., Xiong, Y., Wang, Z., Qiao, Y.: Towards good practices for very deep two-stream convnets. CoRR abs/1507.02159 (2015)
Wu, Z., Wang, X., Jiang, Y.G., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: Proceedings of the 23rd ACM MM, pp. 461–470. ACM (2015)
Ye, H., Wu, Z., Zhao, R.W., Wang, X., Jiang, Y.G., Xue, X.: Evaluating two-stream CNN for video classification. In: ICMR 2015, pp. 435–442. ACM (2015)
Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: 2015 IEEE Conference on CVPR, pp. 4694–4702 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Luo, S., Yang, H., Wang, C., Che, X., Meinel, C. (2016). Real-Time Action Recognition in Surveillance Videos Using ConvNets. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_58
Download citation
DOI: https://doi.org/10.1007/978-3-319-46675-0_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46674-3
Online ISBN: 978-3-319-46675-0
eBook Packages: Computer ScienceComputer Science (R0)