[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

The current state-of-the-art in video classification is based on Bag-of-Words using local visual descriptors. Most commonly these are histogram of oriented gradients (HOG), histogram of optical flow (HOF) and motion boundary histograms (MBH) descriptors. While such approach is very powerful for classification, it is also computationally expensive. This paper addresses the problem of computational efficiency. Specifically: (1) We propose several speed-ups for densely sampled HOG, HOF and MBH descriptors and release Matlab code; (2) We investigate the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method; (3) We investigate the trade-off between accuracy and computational efficiency for computing the feature vocabulary, using and comparing most of the commonly adopted vector quantization techniques: \(k\)-means, hierarchical \(k\)-means, Random Forests, Fisher Vectors and VLAD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://homepages.inf.ed.ac.uk/juijling/index.php#page=software.

  2. http://opencv.org.

  3. https://github.com/kyamagu/mexopencv.

References

  1. Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: CVPR

  2. Baker S, Scharstein D, Lewis JP, Roth S, Black MJ, Szeliski R (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92:1–31

  3. Bay H, Ess A, Tuytelaars T, Van L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110:346–359

    Article  Google Scholar 

  4. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  5. Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: ECCV, pp 25–36

  6. Brox T, Malik J (2011) Large displacement optical flow: descriptor matching in variational motion estimation. PAMI 33(3):500–513

    Article  Google Scholar 

  7. Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: ECCV

  8. Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol. http://www.csie.ntu.edu.tw/cjlin/libsvm

  9. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC

  10. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV international workshop on statistical learning in computer vision, Prague

  11. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR

  12. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: ECCV

  13. Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: VS-PETS

  14. Everts I, van Gemert J, Gevers T (2013) Evaluation of color STIPs for human action recognition. In: CVPR

  15. Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Scandinavian conference on image analysis

  16. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

    Article  MATH  Google Scholar 

  17. Horn B, Schunck B (1981) Determining optical flow. Artif Intell 17:185–203

    Article  Google Scholar 

  18. Jaakkola T, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: NIPS

  19. Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: CVPR, pp 3304–3311

  20. Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: ICCV

  21. Karaman S, Seidenari L, Bagdanov A, del Bimbo A (2013) L1-regularized logistic regression stacking and transductive CRF smoothing for action recognition in video. In: ICCV workshop on action recognition with a large number of classes

  22. Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC

  23. Kliper-Gross O, Gurovich Y, Hassner T, Wolf L (2012) Motion interchange patterns for action recognition in unconstrained videos. In: ECCV

  24. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: ICCV

  25. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR

  26. Lazebnik S, Schmid C, Ponce J (2006) Spatial pyramid matching for recognizing natural scene categories. In: CVPR. Beyond Bags of Features

  27. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60:91–110

    Article  Google Scholar 

  28. Lucas B, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: International joint conference on artificial intelligence

  29. Maji S, Berg AC, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: CVPR

  30. Moosmann F, Nowak E, Jurie F (2008) Randomized clustering forests for image classification. IEEE Trans Pattern Anal Mach Intell 9:1632–1646

    Article  Google Scholar 

  31. Perronnin F, Sanchez J, Mensink T (2010) Improving the Fisher kernel for large-scale image classification. In: ECCV

  32. Reddy K, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981

  33. Sánchez J, Perronnin F, Mensink T, Verbeek JJ (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    Article  MATH  MathSciNet  Google Scholar 

  34. Sangineto E (2013) Pose and expression independent facial landmark localization using dense-SURF and the Hausdorff distance. IEEE Trans Pattern Anal Mach Intell 35(3):624–638

  35. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: ICIP

  36. Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM MM

  37. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: ICCV

  38. Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVID. In: ACM SIGMM international workshop on multimedia information retrieval (MIR)

  39. Snoek CGM, Worring M, Gemert J, Geusebroek J, Smeulders A (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM MM

  40. Solmaz B, Assari SM, Shah M (2013) Classifying web videos using a global video descriptor. Mach Vis Appl 24(7):1473–1485

  41. Sun D, Roth S, Black M (2014) A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int J Comput Vis 106:115–137

  42. Uijlings JRR, Smeulders AWM, Scha RJH (2010) Real-time visual concept classification. IEEE Trans Multimed 12(7):665–681

  43. Vedaldi A, Fulkerson B (2010) VLFeat—an open and portable library of computer vision algorithms. In: ACM MM

  44. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Proc CVPR 1:511–518

    Google Scholar 

  45. Wang H, Kläser A, Schmid C, Liu C (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79

    Article  MathSciNet  Google Scholar 

  46. Wang H, Ullah M, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: BMVC

Download references

Acknowledgments

This work was supported by the European 7th Framework Program, under grant xLiMe (FP7-611346) and by the FIRB project S-PATTERNS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicu Sebe.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uijlings, J., Duta, I.C., Sangineto, E. et al. Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off. Int J Multimed Info Retr 4, 33–44 (2015). https://doi.org/10.1007/s13735-014-0069-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-014-0069-5

Keywords

Navigation