Abstract
In this paper, an action recognition system was invented by proposing a compact 3D descriptor to represent action information, and employing self-organizing map (SOM) to learn and recognize actions. Histogram Of Gradient 3D (HOG3D) performed better among currently used descriptors for action recognition. However, the calculation of the descriptor is quite complex. Furthermore, it used a vector with 960 elements to describe one interest point. Therefore, we proposed a compact descriptor, which shortened the support region of interest points, combined symmetric bins after orientation quantization. In addition, the top value bin of quantized vector was kept instead of setting threshold experimentally. Comparing with HOG3D, our descriptor used 80 bins to describe a point, which reduced much computation complexity. The compact descriptor was used to learn and recognize actions considering the probability of local features in SOM, and the results showed that our system outperformed others both on KTH and Hollywood datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Harris, C., Stephens, M.: A combined corner and edge detector. In: 4th Alvey Vision Conference. Elsevier North-Holland, The Netherlands (1988)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Laptev, I., Lindeberg, T.: On Space-time interest points. In: 6th IEEE International Conference on Computer Vision, pp. 432–439 (2003)
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006)
Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scaleinvariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
FeiFei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: 15th IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 524–531 (2005)
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: 8th IEEE International Conference on Computer Vision, pp. 604–610 (2005)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 18th IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D gradients. In: 19th British Machine Vision Conference, pp. 995–1004. British Machine Vision Association, Worcs (2008)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: 15th ACM International Conference on Multimedia, pp. 357–360. ACM, New York (2007)
Shimada, A., Taniguchi, R.: Gesture recognition using sparse code of hierarchical SOM. In: 18th International Conference on Pattern Recognition (2008)
Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995)
Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features. In: 12th IEEE International Conference on computer Vision (2009)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: 14th International Conference on Pattern Recognition, pp. 32–36 (2004)
Heng, W., Muhammad, M.U., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference, pp. 127–137 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ji, Y., Shimada, A., Taniguchi, Ri. (2010). Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Models and Applications. ICONIP 2010. Lecture Notes in Computer Science, vol 6444. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17534-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-17534-3_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17533-6
Online ISBN: 978-3-642-17534-3
eBook Packages: Computer ScienceComputer Science (R0)