Abstract
This paper examines the impact that the choice of local descriptor has on human action classifier performance in the presence of static occlusion. This question is important when applying human action classification to surveillance video that is noisy, crowded, complex and incomplete. In real-world scenarios, it is natural that a human can be occluded by an object while carrying out different actions. However, it is unclear how the performance of the proposed action descriptors are affected by the associated loss of information. In this paper, we evaluate and compare the classification performance of the state-of-art human local action descriptors in the presence of varying degrees of static occlusion. We consider four different local action descriptors: Trajectory (TRAJ), Histogram of Orientation Gradient (HOG), Histogram of Orientation Flow (HOF) and Motion Boundary Histogram (MBH). These descriptors are combined with a standard bag-of-features representation and a Support Vector Machine classifier for action recognition. We investigate the performance of these descriptors and their possible combinations with respect to varying amounts of artificial occlusion in the KTH action dataset. This preliminary investigation shows that MBH in combination with TRAJ has the best performance in the case of partial occlusion while TRAJ in combination with MBH achieves the best results in the presence of heavy occlusion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liao, M.Y., Chen, D.Y., Sua, C.W., Tyan, H.R.: Real-time event detection and its application to surveillance systems. In: International Symposium on Circuits and Systems. IEEE (2006)
Direkoǧlu, C., O’Connor, N.E.: Team activity recognition in sports. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 69–83. Springer, Heidelberg (2012)
Over, P., Awad, G., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A.F., Kraaij, W., Quéenot, G.: An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2011-TREC Video Retrieval Evaluation Online (2011)
Little, S., Jargalsaikhan, I., Clawson, K., Nieto, M., Li, H., Direkoglu, C., O’Connor, N.E., Smeaton, A.F., Scotney, B., Wang, H., Liu, J.: An information retrieval approach to identifying infrequent events in surveillance video. In: Proceedings of the 3rd ACM International Conference on Multimedia Retrieval. ACM (2013)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence (2001)
Yilmaz, A., Shah, M.: A differential geometric approach to representing the human actions. Computer Vision and Image Understanding (2008)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. IEEE (2005)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition. IEEE (2008)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition. IEEE (2005)
Wang, H., Klaser, A., Schmid, C., Liu, C.: Action recognition by dense trajectories. In: IEEE CVPR (2011)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 635–648. Springer, Heidelberg (2010)
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: A benchmark. In: Conference on Computer Vision and Pattern Recognition. IEEE (2009)
Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing (2010)
Ballan, L., Bertini, M., Del Bimbo, A., Seidenari, L., Serra, G.: Event detection and recognition for semantic annotation of video. Multimedia Tools and Applications (2011)
Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. In: Proceedings of the Nonrigid and Articulated Motion Workshop. IEEE (1997)
Laptev, I.: On space-time interest points. International Journal of Computer Vision (2005)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision. IEEE (2005)
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. ACM (2010)
Chaquet, J.M., Carmona, E.J., Fernández-Caballero, A.: A survey of video datasets for human action and activity recognition. Computer Vision and Image Understanding (2013)
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning Realistic Human Actions from Movies. In: IEEE Conference on Computer Vision & Pattern Recognition (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jargalsaikhan, I., Direkoglu, C., Little, S., O’Connor, N.E. (2014). An Evaluation of Local Action Descriptors for Human Action Classification in the Presence of Occlusion. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds) MultiMedia Modeling. MMM 2014. Lecture Notes in Computer Science, vol 8326. Springer, Cham. https://doi.org/10.1007/978-3-319-04117-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-04117-9_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04116-2
Online ISBN: 978-3-319-04117-9
eBook Packages: Computer ScienceComputer Science (R0)