More Web Proxy on the site http://driver.im/

article

A review on human action analysis in videos for retrieval applications

Authors:

Mohsen Ramezani,

Farzin YaghmaeeAuthors Info & Claims

Artificial Intelligence Review, Volume 46, Issue 4

Pages 485 - 514

https://doi.org/10.1007/s10462-016-9473-y

Published: 01 December 2016 Publication History

Abstract

Today, the number of available videos on the Internet is significantly increased. Content-based video retrieval is used for finding the users' desired items among these big video data. Memorizing details of the videos and intricate relations between included objects in videos can be considered as the major challenges of this big data topic. A large portion of video data relates to the humans. Thus, human action retrieval has been introduced as a new big data topic that seeks to find video objects based on the included human action. Human action retrieval has been applicated in different domains such as video search, intelligent human---computer interaction, robotics, video surveillance and human behavior analysis. There are some challenges such as variations in rotation, scale, style and above-mentioned challenges for the big video data that can impress the retrieval accuracy. In this paper, a survey on human action retrieval studies is presented that the methodologies have been analyzed from action representation and retrieving perspectives. Moreover, limitations and common datasets of human action retrieval are introduced before describing the state-of-the-arts' methodologies.

References

[1]

Akpinar S, Alpaslan FN (2014) Video action recognition using an optical flow based representation.

[2]

Arman F, Depommier R, Hsu A, Chiu MY (1994) Content-based browsing of video sequences. In: Proceedings of the second ACM international conference on Multimedia. ACM, pp 97-103.

Digital Library

[3]

Barnachon M, Bouakaz S, Boufama B, Guillou E (2013) A real-time system for motion retrieval and interpretation. Pattern Recognit Lett 34(15):1789-1798.

Digital Library

[4]

Ben-Arie J, Wang Z, Pandit P, Rajaram S (2002) Human activity recognition using multidimensional indexing. IEEE Trans Pattern Anal Mach Intell 24(8):1091-1104.

Digital Library

[5]

Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: Computer vision and pattern recognition, 2009. CVPR 2009. IEEE conference on. IEEE, pp 1948-1955.

[6]

Bulbul MF, Jiang Y, Ma J (2015) Human action recognition based on DMMs, HOGs and contourlet transform.

[7]

Caicedo JC, González FA (2012) Multimodal fusion for image retrieval using matrix factorization. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 56.

Digital Library

[8]

Chen CY, Grauman K (2012) Efficient activity detection with max-subgraph search. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 1274-1281.

Digital Library

[9]

Choi J, Jeon WJ, Lee SC (2008) Spatio-temporal pyramid matching for sports videos. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 291-297.

Digital Library

[10]

Ciptadi A, Goodwin MS, Rehg JM (2014) Movement pattern histogram for action recognition and retrieval. In: Computer vision--ECCV 2014. Springer International Publishing, pp 695-710.

[11]

Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4(17):129-145.

Digital Library

[12]

Davis JW, Bobick AE (1997) The representation and recognition of human movement using temporal templates. In: Computer vision and pattern recognition, 1997. Proceedings., 1997 IEEE computer society conference on. IEEE, pp 928-934.

Digital Library

[13]

Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Visual surveillance and performance evaluation of tracking and surveillance, 2005. 2nd joint IEEE international workshop on. IEEE, pp 65-72.

Digital Library

[14]

Efros A, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. In: Computer vision, 2003. Proceedings. Ninth IEEE international conference on. IEEE, pp 726-733.

Digital Library

[15]

Fossati A, Dimitrijevic M, Lepetit V, Fua P (2007) Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In: Computer vision and pattern recognition, 2007. CVPR'07. IEEE conference on. IEEE, pp 1-8.

[16]

Gao Y, Wang T, Li J, Du Y, Hu W, Zhang Y, Ai H (2007) Cast indexing for videos by ncuts and page ranking. In: Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, pp 441-447.

Digital Library

[17]

Gómez-Conde I, Olivieri DN (2015) A KPCA spatio-temporal differential geometric trajectory cloud classifier for recognizing human actions in a CBVR system. Expert Syst Appl 42(13):5472-5490.

Digital Library

[18]

Gowsikhaa D, Abirami S, Baskaran R (2014) Automated human behavior analysis from surveillance videos: a survey. Artif Intell Rev 42(4):747-765.

Digital Library

[19]

Ji R, Yao H, Sun X (2011) Actor-independent action search using spatiotemporal vocabulary with appearance hashing. Pattern Recognit 44(3):624-638.

Digital Library

[20]

Jiang YG, Li Z, Chang SF (2011) Modeling scene and object contexts for human action retrieval with few examples. IEEE Trans Circuits Syst Video Technol 21(5):674-681.

Digital Library

[21]

Jones S, Shao L (2011) Action retrieval with relevance feedback on YouTube videos. In: Proceedings of the third international conference on internet multimedia computing and service. ACM, pp 42-45.

Digital Library

[22]

Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56-65.

Digital Library

[23]

Jones S, Shao L (2014) A multigraph representation for improved unsupervised/semi-supervised learning of human actions. In: Computer vision and pattern recognition (CVPR), 2014 IEEE conference on. IEEE, pp 820-826.

Digital Library

[24]

Jones S, Shao L, Du K (2014) Active learning for human action retrieval using query pool selection. Neurocomputing 124:89-96.

Digital Library

[25]

Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recognit Lett 33(4):446-452.

Digital Library

[26]

Junejo IN, Dexter E, Laptev I, Pérez P (2008) Cross-view action recognition from temporal self-similarities. Springer, Berlin Heidelberg.

Digital Library

[27]

Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172-185.

Digital Library

[28]

Kehl R, Bray M, Van Gool L (2005) Full body tracking from multiple views using stochastic sampling. In: Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on. IEEE, vol 2, pp 129-136.

Digital Library

[29]

Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British machine vision conference. British machine vision association, pp 275:1.

[30]

Kläser A, Marsza¿ek M, Schmid C, Zisserman A (2012) Human focused action localization in video. In: Trends and topics in computer vision. Springer, Berlin Heidelberg, pp 219-233.

Digital Library

[31]

Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2-3):107-123.

Digital Library

[32]

Laptev I, Lindeberg T (2005) Space-time interest points. In: Computer vision, 2003. IEEE conference on. IEEE.

Digital Library

[33]

Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. IEEE, pp 1-8.

[34]

Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl (TOMM) 2(1):1-19.

Digital Library

[35]

Li J, Allinson N, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597-3601.

Digital Library

[36]

Li R, Zickler T (2012) Discriminative virtual views for cross-view action recognition. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 2855-2862.

Digital Library

[37]

Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: Computer vision, 2009 IEEE 12th international conference on. IEEE, pp 444-451.

[38]

Liu D, Hua XS, Yang L, Wang M, Zhang HJ (2009). Tag ranking. In: Proceedings of the 18th international conference on world wide web. ACM, pp 351-360.

Digital Library

[39]

Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on. IEEE, pp 3209-3216.

Digital Library

[40]

Liu L, Bai X, Zhang H, Zhou J, Tang W (2016) Describing and learning of related parts based on latent structural model in big data. Neurocomputing 173:355-363.

Digital Library

[41]

Liu L, Shao L, Li X, Lu K (2015) Learning spatio-temporal representations for action recognition: a genetic programming approach.

[42]

Liu L, Shao L, Zheng F, Li X (2014) Realistic action recognition via sparsely-constructed Gaussian processes. Pattern Recognit 47:3819-3827.

[43]

Liu X, Yibo L (2014) Research on human action recognition based on global and local mixed features. In: International conference on mechatronics, control and electronic engineering.

[44]

Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91-110.

Digital Library

[45]

Menier C, Boyer E, Raffin B (2006). 3d skeleton-based body pose recovery. In: 3rd international symposium on 3D data processing, visualization and transmission (DPVT'06). IEEE computer society, pp 389-396.

Digital Library

[46]

Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In: Computer vision--ECCV 2002. Springer, Berlin Heidelberg, pp 128-142.

Digital Library

[47]

Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. IEEE, pp 1-8.

[48]

Paez F, Vanegas J, Gonzalez F (2013) An evaluation of NMF algorithm on human action video retrieval. In: Image, signal processing, and artificial vision (STSIVA), 2013 XVIII symposium of. IEEE, pp 1-4.

[49]

Paez F, Vanegas J, Gonzalez F (2014) Online multimodal matrix factorization for human action video indexing. In: Content-based multimedia indexing (CBMI), 2014 12th international workshop on. IEEE, pp 1-6.

[50]

Polana R, Nelson RC (1997) Detection and recognition of periodic, nonrigid motion. Int J Comput Vis 23(3):261-282.

Digital Library

[51]

Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976-990.

Digital Library

[52]

Ramezani M, Yaghmaee F (2014a) Content-based retrieval of human actions by extracting the main moving directions and their scales. In: 4th international conference on information technology management, communication and computer, Iran, Tehran.

[53]

Ramezani M, Yaghmaee F (2014b) Using the fuzzy clustering algorithm to improve the content-based action retrieval. In: 14'th Iranian conference on fuzzy systems.

[54]

Ramezani M, Yaghmaee F (2014c) Content-based retrieval of human actions by analysing the statistical information of features. In: Information and knowledge technology (IKT), 2014 6th conference on. IEEE, pp 56-60.

[55]

Ramezani M, Yaghmaee F (2014d) Content-based human actions retrieval by a novel low complex action representation. In: Computer and knowledge engineering (ICCKE), 2014 4th international econference on. IEEE, pp 204-208.

[56]

Reddy KK, Liu J, Shah M (2009, September) Incremental action recognition using feature-tree. In: Computer vision, 2009 IEEE 12th international conference on. IEEE, pp 1010-1017.

[57]

Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on. IEEE, vol 3, pp 32-36.

Digital Library

[58]

Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th international conference on multimedia. ACM, pp 357-360.

Digital Library

[59]

Shao L, Chen X (2010) Histogram of body poses and spectral regression discriminant analysis for human action categorization. In: BMVC, pp 1-11.

[60]

Shao L, Jones S, Li X (2014) Efficient search and localization of human actions in video databases. IEEE Trans Circuits Syst Video Technol 24(3):504-512.

Digital Library

[61]

Shao L, Liu L, Yu M (2015) Kernelized multiview projection for robust action recognition. Int J Comput Vis, 1-15.

Digital Library

[62]

Shao L, Wu D, Chen X (2011) Action recognition using correlogram of body poses and spectral regression. In: Image processing (ICIP), 2011 18th IEEE international conference on. IEEE, pp 209-212.

[63]

Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal Laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817-827.

[64]

Smeaton AF, Browne P (2006) A usage study of retrieval modalities for video shot retrieval. Inf Process Manag 42(5):1330-1344.

Digital Library

[65]

Sun X, Yao H, Liu T, Xu P, Liu X (2008) Place retrieval with graph-view model. In: ACM conference on multimedia information retrieval.

Digital Library

[66]

Tang J, Shao L, Zhen X (2013) Human action retrieval via efficient feature matching. In: Advanced video and signal based surveillance (AVSS), 2013 10th IEEE international conference on. IEEE, pp 306-311.

[67]

Thi TH, Zhang J, Cheng L, Wang L, Satoh S (2010) Human action recognition and localization in video using structured learning of local space-time features. In: Advanced video and signal based surveillance (AVSS), 2010 seventh IEEE international conference on. IEEE, pp 204-211.

Digital Library

[68]

Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on multimedia. ACM, pp 107-118.

Digital Library

[69]

Typke R, Wiering F, Veltkamp RC (2005). A survey of music information retrieval systems. In: ISMIR, pp 153-160.

[70]

Wang H, Zheng X, Xiao B (2015) Large-scale human action recognition with spark. In: Multimedia signal processing (MMSP), 2015 IEEE 17th international workshop on. IEEE, pp 1-6.

[71]

Wang J, Liu W, Kumar S, Chang SF (2016) Learning to Hash for indexing big data-a survey. Proc IEEE 104(1):34-57.

[72]

Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimed 14(4):975-985.

Digital Library

[73]

Wu L, Jin R, Jain AK (2013) Tag completion for image retrieval. IEEE Trans Pattern Anal Mach Intell 35(3):716-727.

Digital Library

[74]

Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Computer vision and pattern recognition, 1992. Proceedings CVPR'92., 1992 IEEE computer society conference on. IEEE, pp 379-385.

[75]

Yan R, Hauptmann AG, Jin R (2003) Negative pseudo-relevance feedback in content-based video retrieval. In: Proceedings of the eleventh ACM international conference on Multimedia. ACM, pp 343-346.

Digital Library

[76]

Yilmaz A, Shah M (2006) Matching actions in presence of camera motion. Comput Vis Image Underst 104(2):221-231.

Digital Library

[77]

Yu G, Goussies N, Yuan J, Liu Z (2011) Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans Multimed 13(3):507-517.

Digital Library

[78]

Yu G, Yuan J, Liu Z (2011) Real-time human action search using random forest based hough voting. In: Proceedings of the 19th ACM international conference on Multimedia. ACM, pp 1149-1152.

Digital Library

[79]

Yu G, Yuan J, Liu Z (2011) Unsupervised random forest indexing for fast action search. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on. IEEE, pp 865-872.

Digital Library

[80]

Yu G, Yuan J, Liu Z (2015) Unsupervised trees for human action search. In: Human action analysis with randomized trees. Springer Singapore, pp 29-56.

[81]

Yuan J, Liu Z, Wu Y (2011) Discriminative video pattern search for efficient action detection. IEEE Trans Pattern Anal Mach Intell 33(9):1728-1743.

Digital Library

[82]

Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multimed Syst 19(5):395-406.

Digital Library

[83]

Zhang HJ, Wu J, Zhong D, Smoliar SW (1997) An integrated system for content-based video retrieval and browsing. Pattern Recognit 30(4):643-658.

[84]

Zhang Z, Wang C, Xiao B, Zhou W, Liu S, Shi C (2013) Cross-view action recognition via a continuous virtual path. In: Computer vision and pattern recognition (CVPR), 2013 IEEE conference on. IEEE, pp 2690-2697.

Digital Library

[85]

Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3D depth data matching towards robust action retrieval. Neurocomputing 151:533-543.

[86]

Zhen X, Shao L, Tao D, Li X (2013) Embedding motion and structure features for action recognition. IEEE Trans Circuits Syst Video Technol 23(7):1182-1190.

Digital Library

[87]

Zhu F, Shao L (2014) Weakly-supervised cross-domain dictionary learning for visual recognition. Int J Comput Vis 109(1-2):42-59.

Digital Library

[88]

Zhu F, Shao L, Lin M (2013) Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognit Lett 34(1):20-24.

Digital Library

[89]

Zhu X, Liu Z (2011) Human behavior clustering for anomaly detection. Front Comput Sci China 5(3):279-289.

Digital Library

Cited By

Córdoba-Tlaxcalteco MBenítez-Guerrero E(2023)Human Event Recognition in Smart Classrooms Using Computer Vision: A Systematic Literature ReviewProgramming and Computing Software10.1134/S036176882308006649:8(625-642)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1134/S0361768823080066
Xia HZhan Y(2023)Deep cascaded action attention network for weakly-supervised temporal action localizationMultimedia Tools and Applications10.1007/s11042-023-14670-082:19(29769-29787)Online publication date: 15-Mar-2023
https://dl.acm.org/doi/10.1007/s11042-023-14670-0
Kong YFu Y(2022)Human Action Recognition and Prediction: A SurveyInternational Journal of Computer Vision10.1007/s11263-022-01594-9130:5(1366-1401)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11263-022-01594-9
Show More Cited By

A review on human action analysis in videos for retrieval applications
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
  2. Computer graphics
    1. Animation

Recommendations

Relevance feedback for real-world human action retrieval

Content-based video retrieval is an increasingly popular research field, in large part due to the quickly growing catalogue of multimedia data to be found online. Even though a large portion of this data concerns humans, however, retrieval of human ...
Content-based retrieval of human actions from realistic video databases

Due to the increasing amount of video data available in various databases, on the Internet and elsewhere, new methods of managing these data are required, leading to the development of content-based video retrieval systems. We explore several recently ...
Human action recognition using weighted pooling

Pooling strategies, such as max pooling and sum pooling, have been widely used to obtain the global representations for action videos. However, these pooling strategies have several disadvantages. First, they are easily affected by unwanted background ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Artificial Intelligence Review

Artificial Intelligence Review Volume 46, Issue 4

December 2016

143 pages

ISSN:0269-2821

Issue’s Table of Contents

Copyright © Copyright © 2016 Springer Science+Business Media Dordrecht.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2016

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Córdoba-Tlaxcalteco MBenítez-Guerrero E(2023)Human Event Recognition in Smart Classrooms Using Computer Vision: A Systematic Literature ReviewProgramming and Computing Software10.1134/S036176882308006649:8(625-642)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1134/S0361768823080066
Xia HZhan Y(2023)Deep cascaded action attention network for weakly-supervised temporal action localizationMultimedia Tools and Applications10.1007/s11042-023-14670-082:19(29769-29787)Online publication date: 15-Mar-2023
https://dl.acm.org/doi/10.1007/s11042-023-14670-0
Kong YFu Y(2022)Human Action Recognition and Prediction: A SurveyInternational Journal of Computer Vision10.1007/s11263-022-01594-9130:5(1366-1401)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11263-022-01594-9
Bi MLi JLiu XZhang QYang Z(2022)Action-Aware Network with Upper and Lower Limit Loss for Weakly-Supervised Temporal Action LocalizationNeural Processing Letters10.1007/s11063-022-11042-x55:4(4307-4324)Online publication date: 24-Sep-2022
https://dl.acm.org/doi/10.1007/s11063-022-11042-x
Chen MGao JYang SXu C(2022)Dual-Evidential Learning for Weakly-supervised Temporal Action LocalizationComputer Vision – ECCV 202210.1007/978-3-031-19772-7_12(192-208)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19772-7_12
Xu CLiu RZhang TCui ZYang JHu C(2021)Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/345041017:4(1-22)Online publication date: 12-Nov-2021
https://dl.acm.org/doi/10.1145/3450410
Rashmi MAshwin TGuddeti R(2021)Surveillance video analysis for student action recognition and localization inside computer laboratories of a smart campusMultimedia Tools and Applications10.1007/s11042-020-09741-580:2(2907-2929)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1007/s11042-020-09741-5
Dash SMishra SSrujan Raju KNarasimha Prasad L(2021)RETRACTED ARTICLE: Human action recognition using a hybrid deep learning heuristicSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06149-725:20(13079-13092)Online publication date: 1-Oct-2021
https://dl.acm.org/doi/10.1007/s00500-021-06149-7
Duhme MMemmesheimer RPaulus D(2021)Fusion-GCN: Multimodal Action Recognition Using Graph Convolutional NetworksPattern Recognition10.1007/978-3-030-92659-5_17(265-281)Online publication date: 28-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-92659-5_17
Roselind Johnson DUthariaraj V(2020)A Novel Parameter Initialization Technique Using RBM-NN for Human Action RecognitionComputational Intelligence and Neuroscience10.1155/2020/88524042020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/8852404
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents