[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Coloring Action Recognition in Still Images

Published: 01 December 2013 Publication History

Abstract

In this article we investigate the problem of human action recognition in static images. By action recognition we intend a class of problems which includes both action classification and action detection (i.e. simultaneous localization and classification). Bag-of-words image representations yield promising results for action classification, and deformable part models perform very well object detection. The representations for action recognition typically use only shape cues and ignore color information. Inspired by the recent success of color in image classification and object detection, we investigate the potential of color for action classification and detection in static images. We perform a comprehensive evaluation of color descriptors and fusion approaches for action recognition. Experiments were conducted on the three datasets most used for benchmarking action recognition in still images: Willow, PASCAL VOC 2010 and Stanford-40. Our experiments demonstrate that incorporating color information considerably improves recognition performance, and that a descriptor based on color names outperforms pure color descriptors. Our experiments demonstrate that late fusion of color and shape information outperforms other approaches on action recognition. Finally, we show that the different color---shape fusion approaches result in complementary information and combining them yields state-of-the-art performance for action classification.

References

[1]
Benavente, R., Vanrell, M., & Baldrich, R. (2008). Parametric fuzzy sets for automatic color naming. Journal of the Optical Society of America A, 25(10), 2582-2593.
[2]
Berlin, B., & Kay, P. (1969). Basic color terms: Their universality and evolution. Berkeley, CA: University of California Press.
[3]
Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via plsa. In Proceedings of the European conference on computer vision.
[4]
Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712-727.
[5]
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition.
[6]
Delaitre, V., Laptev, I., & Sivic, J. (2010). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In Proceedings of the British machine vision conference.
[7]
Delaitre, V., Sivic, J., & Laptev, I. (2011). Learning person-object interactions for action recognition in still images. In Advances in neural information processing systems.
[8]
Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In Proceedings of the European conference on computer vision.
[9]
Elfiky, N., Khan, F. S., van deWeijer, J., & Gonzalez, J. (2012). Discriminative compact pyramids for object and scene recognition. Pattern Recognition, 45(4), 1627-1636.
[10]
Everingham, M., Gool, L.V., Williams, C.K.I., JWinn, Zisserman A. (2009). The pascal visual object classes challenge 2009 (VOC2009) results.
[11]
Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303-338.
[12]
Felsberg, M., & Hedborg, J. (2007). Real-time view-based pose recognition and interpolation for tracking initialization. Journal of Real-Time Image Processing, 2(3), 103-115.
[13]
Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627-1645.
[14]
Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). Actom sequence models for efficient action detection. In Conference on computer vision and pattern recognition.
[15]
Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In Proceedings of IEEE international conference on computer vision.
[16]
Geusebroek, J. M., van den Boomgaard, R., Smeulders, A. W. M., & Geerts, H. (2001). Color invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12), 1338-1350.
[17]
Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In European conference on computer vision.
[18]
Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., & Huang, T. S. (2009). Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of IEEE international conference on computer vision.
[19]
Khan, F. S., van de Weijer, J., Bagdanov, A. D., & Vanrell, M. (2011). Portmanteau vocabularies for multi-cue image representations. In Advances in neural information processing systems.
[20]
Khan, F. S., Anwer, R. M., van deWeijer, J., Bagdanov, A. D., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Conference on computer vision and pattern recognition.
[21]
Khan, F. S., van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision, 98(1), 49-64.
[22]
Lan, Z. Z., Bao, L., Yu, S. I., Liu, W., & Hauptmann, A. G. (2012). Double fusion for multimedia event detection. In Multimedia Modeling.
[23]
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision & pattern recognition.
[24]
Lenz, R., Bui, T. H., & Hernandez-Andres, J. (2005). Group theoretical structure of spectral spaces. Journal of Mathematical Imaging and Vision, 23(3), 297-313.
[25]
Li, L. J., Su, H., Xing, E. P., & Li, F. F. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in neural information processing systems.
[26]
Lowe, D. G. (2004). Distinctive image features from scale-invariant points. International Journal of Computer Vision, 60(2), 91-110.
[27]
Maji, S., Bourdev, L. D., & Malik, J. (2011). Action recognition from a distributed representation of pose and appearance. In Computer vision and pattern recognition.
[28]
Mullen, K. T. (1985). The contrast sensitivity of human colour vision to red-green and blue-yellow chromatic gratings. The Journal of Physiology, 359, 381-400.
[29]
Pagani, A., Stricker, D., & Felsberg, M. (2009). Integral p-channels for fast and robust region matching. In Proceedings of international consortium for intergenerational programmes.
[30]
Prest, A., Schmid, C., & Ferrari, V. (2012). Weakly supervised learning of interactions between humans and objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 601-614.
[31]
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582-1596.
[32]
Shapovalova, N., Gong, W., Pedersoli, M., Roca, F. X., & Gonzalez, J. (2011). On importance of interactions and context in human action recognition. In Iberian conference on pattern recognition and image analysis.
[33]
Sharma, G., Jurie, F., & Schmid, C. (2012). Discriminative spatial saliency for image classification. In Conference on computer vision and pattern recognition.
[34]
Sharma, G., Jurie, F., & Schmid, C. (2013). Expanded parts model for human attribute and action recognition in still images. In Conference on computer vision and pattern recognition.
[35]
Tran, D., & Yuan, J. (2012). Max-margin structured output regression for spatio-temporal action localization. In Advances in neural information processing systems.
[36]
Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In Proceedings of IEEE international conference on computer vision.
[37]
Vigo, D. A. R., Khan, F. S., van de Weijer, J. & Gevers, T. (2010). The impact of color on bag-of-words based object recognition. In Indian council of philosophical research.
[38]
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. S., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Conference on computer vision and pattern recognition.
[39]
van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In Proceedings of the European conference on computer vision.
[40]
van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In International consortium for intergenerational programmes.
[41]
van de Weijer, J., Schmid, C., Verbeek, J. J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transaction in Image Processing (TIP), 18(7), 1512-1524.
[42]
Yao, B., & Li, F. F. (2012). Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1691-1703.
[43]
Yao, B., Jiang, X., Khosla, A., Lin, A. L., Guibas, L. J., & Li, F. F. (2011). Human action recognition by learning bases of action attributes and parts. In Proceedings of IEEE international conference on computer vision.
[44]
Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1728-1743.
[45]
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object catergories: An in-depth study. A comprehensive study. International Journal of Computer Vision, 73(2), 213-218.
[46]
Zhang, J., Huang, K., Yu, Y., & Tan, T. (2010). Boosted local structured hog-lbp for object localization. In IEEE conference on computer vision & pattern recognition.

Cited By

View all
  • (2023)Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2020.303184114:1(549-562)Online publication date: 1-Jan-2023
  • (2022)Anomaly Analysis in Images and Videos: A Comprehensive ReviewACM Computing Surveys10.1145/354401455:7(1-37)Online publication date: 15-Dec-2022
  • (2021)Two-stream spatiotemporal feature fusion for human action recognitionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01940-337:7(1821-1835)Online publication date: 1-Jul-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Computer Vision
International Journal of Computer Vision  Volume 105, Issue 3
December 2013
111 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2013

Author Tags

  1. Action recognition
  2. Color features
  3. Image representation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2020.303184114:1(549-562)Online publication date: 1-Jan-2023
  • (2022)Anomaly Analysis in Images and Videos: A Comprehensive ReviewACM Computing Surveys10.1145/354401455:7(1-37)Online publication date: 15-Dec-2022
  • (2021)Two-stream spatiotemporal feature fusion for human action recognitionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01940-337:7(1821-1835)Online publication date: 1-Jul-2021
  • (2020)Deep Ensemble Learning for Human Action Recognition in Still ImagesComplexity10.1155/2020/94286122020Online publication date: 1-Jan-2020
  • (2020)SITUP: Scale Invariant Tracking Using Average Peak-to-Correlation EnergyIEEE Transactions on Image Processing10.1109/TIP.2019.296269429(3546-3557)Online publication date: 1-Jan-2020
  • (2020)Neighborhood min distance descriptor for kinship verificationMultimedia Tools and Applications10.1007/s11042-020-08906-679:29-30(20861-20880)Online publication date: 1-Aug-2020
  • (2019)Multi-script text versus non-text classification of regions in scene imagesJournal of Visual Communication and Image Representation10.1016/j.jvcir.2019.04.00762:C(23-42)Online publication date: 1-Jul-2019
  • (2018)A New Probabilistic Representation of Color Image Pixels and Its ApplicationsIEEE Transactions on Image Processing10.1109/TIP.2018.288358028:4(2037-2050)Online publication date: 11-Dec-2018
  • (2018)Kernel correlation filters for visual tracking with adaptive fusion of heterogeneous cuesNeurocomputing10.1016/j.neucom.2018.01.068286:C(109-120)Online publication date: 19-Apr-2018
  • (2018)Beyond Eleven Color Names for Image UnderstandingMachine Vision and Applications10.1007/s00138-017-0902-y29:2(361-373)Online publication date: 1-Feb-2018
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media