[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A review on human action analysis in videos for retrieval applications

Published: 01 December 2016 Publication History

Abstract

Today, the number of available videos on the Internet is significantly increased. Content-based video retrieval is used for finding the users' desired items among these big video data. Memorizing details of the videos and intricate relations between included objects in videos can be considered as the major challenges of this big data topic. A large portion of video data relates to the humans. Thus, human action retrieval has been introduced as a new big data topic that seeks to find video objects based on the included human action. Human action retrieval has been applicated in different domains such as video search, intelligent human---computer interaction, robotics, video surveillance and human behavior analysis. There are some challenges such as variations in rotation, scale, style and above-mentioned challenges for the big video data that can impress the retrieval accuracy. In this paper, a survey on human action retrieval studies is presented that the methodologies have been analyzed from action representation and retrieving perspectives. Moreover, limitations and common datasets of human action retrieval are introduced before describing the state-of-the-arts' methodologies.

References

[1]
Akpinar S, Alpaslan FN (2014) Video action recognition using an optical flow based representation.
[2]
Arman F, Depommier R, Hsu A, Chiu MY (1994) Content-based browsing of video sequences. In: Proceedings of the second ACM international conference on Multimedia. ACM, pp 97-103.
[3]
Barnachon M, Bouakaz S, Boufama B, Guillou E (2013) A real-time system for motion retrieval and interpretation. Pattern Recognit Lett 34(15):1789-1798.
[4]
Ben-Arie J, Wang Z, Pandit P, Rajaram S (2002) Human activity recognition using multidimensional indexing. IEEE Trans Pattern Anal Mach Intell 24(8):1091-1104.
[5]
Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: Computer vision and pattern recognition, 2009. CVPR 2009. IEEE conference on. IEEE, pp 1948-1955.
[6]
Bulbul MF, Jiang Y, Ma J (2015) Human action recognition based on DMMs, HOGs and contourlet transform.
[7]
Caicedo JC, González FA (2012) Multimodal fusion for image retrieval using matrix factorization. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 56.
[8]
Chen CY, Grauman K (2012) Efficient activity detection with max-subgraph search. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 1274-1281.
[9]
Choi J, Jeon WJ, Lee SC (2008) Spatio-temporal pyramid matching for sports videos. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 291-297.
[10]
Ciptadi A, Goodwin MS, Rehg JM (2014) Movement pattern histogram for action recognition and retrieval. In: Computer vision--ECCV 2014. Springer International Publishing, pp 695-710.
[11]
Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4(17):129-145.
[12]
Davis JW, Bobick AE (1997) The representation and recognition of human movement using temporal templates. In: Computer vision and pattern recognition, 1997. Proceedings., 1997 IEEE computer society conference on. IEEE, pp 928-934.
[13]
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Visual surveillance and performance evaluation of tracking and surveillance, 2005. 2nd joint IEEE international workshop on. IEEE, pp 65-72.
[14]
Efros A, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. In: Computer vision, 2003. Proceedings. Ninth IEEE international conference on. IEEE, pp 726-733.
[15]
Fossati A, Dimitrijevic M, Lepetit V, Fua P (2007) Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In: Computer vision and pattern recognition, 2007. CVPR'07. IEEE conference on. IEEE, pp 1-8.
[16]
Gao Y, Wang T, Li J, Du Y, Hu W, Zhang Y, Ai H (2007) Cast indexing for videos by ncuts and page ranking. In: Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, pp 441-447.
[17]
Gómez-Conde I, Olivieri DN (2015) A KPCA spatio-temporal differential geometric trajectory cloud classifier for recognizing human actions in a CBVR system. Expert Syst Appl 42(13):5472-5490.
[18]
Gowsikhaa D, Abirami S, Baskaran R (2014) Automated human behavior analysis from surveillance videos: a survey. Artif Intell Rev 42(4):747-765.
[19]
Ji R, Yao H, Sun X (2011) Actor-independent action search using spatiotemporal vocabulary with appearance hashing. Pattern Recognit 44(3):624-638.
[20]
Jiang YG, Li Z, Chang SF (2011) Modeling scene and object contexts for human action retrieval with few examples. IEEE Trans Circuits Syst Video Technol 21(5):674-681.
[21]
Jones S, Shao L (2011) Action retrieval with relevance feedback on YouTube videos. In: Proceedings of the third international conference on internet multimedia computing and service. ACM, pp 42-45.
[22]
Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56-65.
[23]
Jones S, Shao L (2014) A multigraph representation for improved unsupervised/semi-supervised learning of human actions. In: Computer vision and pattern recognition (CVPR), 2014 IEEE conference on. IEEE, pp 820-826.
[24]
Jones S, Shao L, Du K (2014) Active learning for human action retrieval using query pool selection. Neurocomputing 124:89-96.
[25]
Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recognit Lett 33(4):446-452.
[26]
Junejo IN, Dexter E, Laptev I, Pérez P (2008) Cross-view action recognition from temporal self-similarities. Springer, Berlin Heidelberg.
[27]
Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172-185.
[28]
Kehl R, Bray M, Van Gool L (2005) Full body tracking from multiple views using stochastic sampling. In: Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on. IEEE, vol 2, pp 129-136.
[29]
Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British machine vision conference. British machine vision association, pp 275:1.
[30]
Kläser A, Marsza¿ek M, Schmid C, Zisserman A (2012) Human focused action localization in video. In: Trends and topics in computer vision. Springer, Berlin Heidelberg, pp 219-233.
[31]
Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2-3):107-123.
[32]
Laptev I, Lindeberg T (2005) Space-time interest points. In: Computer vision, 2003. IEEE conference on. IEEE.
[33]
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. IEEE, pp 1-8.
[34]
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl (TOMM) 2(1):1-19.
[35]
Li J, Allinson N, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597-3601.
[36]
Li R, Zickler T (2012) Discriminative virtual views for cross-view action recognition. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 2855-2862.
[37]
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: Computer vision, 2009 IEEE 12th international conference on. IEEE, pp 444-451.
[38]
Liu D, Hua XS, Yang L, Wang M, Zhang HJ (2009). Tag ranking. In: Proceedings of the 18th international conference on world wide web. ACM, pp 351-360.
[39]
Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on. IEEE, pp 3209-3216.
[40]
Liu L, Bai X, Zhang H, Zhou J, Tang W (2016) Describing and learning of related parts based on latent structural model in big data. Neurocomputing 173:355-363.
[41]
Liu L, Shao L, Li X, Lu K (2015) Learning spatio-temporal representations for action recognition: a genetic programming approach.
[42]
Liu L, Shao L, Zheng F, Li X (2014) Realistic action recognition via sparsely-constructed Gaussian processes. Pattern Recognit 47:3819-3827.
[43]
Liu X, Yibo L (2014) Research on human action recognition based on global and local mixed features. In: International conference on mechatronics, control and electronic engineering.
[44]
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91-110.
[45]
Menier C, Boyer E, Raffin B (2006). 3d skeleton-based body pose recovery. In: 3rd international symposium on 3D data processing, visualization and transmission (DPVT'06). IEEE computer society, pp 389-396.
[46]
Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In: Computer vision--ECCV 2002. Springer, Berlin Heidelberg, pp 128-142.
[47]
Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. IEEE, pp 1-8.
[48]
Paez F, Vanegas J, Gonzalez F (2013) An evaluation of NMF algorithm on human action video retrieval. In: Image, signal processing, and artificial vision (STSIVA), 2013 XVIII symposium of. IEEE, pp 1-4.
[49]
Paez F, Vanegas J, Gonzalez F (2014) Online multimodal matrix factorization for human action video indexing. In: Content-based multimedia indexing (CBMI), 2014 12th international workshop on. IEEE, pp 1-6.
[50]
Polana R, Nelson RC (1997) Detection and recognition of periodic, nonrigid motion. Int J Comput Vis 23(3):261-282.
[51]
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976-990.
[52]
Ramezani M, Yaghmaee F (2014a) Content-based retrieval of human actions by extracting the main moving directions and their scales. In: 4th international conference on information technology management, communication and computer, Iran, Tehran.
[53]
Ramezani M, Yaghmaee F (2014b) Using the fuzzy clustering algorithm to improve the content-based action retrieval. In: 14'th Iranian conference on fuzzy systems.
[54]
Ramezani M, Yaghmaee F (2014c) Content-based retrieval of human actions by analysing the statistical information of features. In: Information and knowledge technology (IKT), 2014 6th conference on. IEEE, pp 56-60.
[55]
Ramezani M, Yaghmaee F (2014d) Content-based human actions retrieval by a novel low complex action representation. In: Computer and knowledge engineering (ICCKE), 2014 4th international econference on. IEEE, pp 204-208.
[56]
Reddy KK, Liu J, Shah M (2009, September) Incremental action recognition using feature-tree. In: Computer vision, 2009 IEEE 12th international conference on. IEEE, pp 1010-1017.
[57]
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on. IEEE, vol 3, pp 32-36.
[58]
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th international conference on multimedia. ACM, pp 357-360.
[59]
Shao L, Chen X (2010) Histogram of body poses and spectral regression discriminant analysis for human action categorization. In: BMVC, pp 1-11.
[60]
Shao L, Jones S, Li X (2014) Efficient search and localization of human actions in video databases. IEEE Trans Circuits Syst Video Technol 24(3):504-512.
[61]
Shao L, Liu L, Yu M (2015) Kernelized multiview projection for robust action recognition. Int J Comput Vis, 1-15.
[62]
Shao L, Wu D, Chen X (2011) Action recognition using correlogram of body poses and spectral regression. In: Image processing (ICIP), 2011 18th IEEE international conference on. IEEE, pp 209-212.
[63]
Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal Laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817-827.
[64]
Smeaton AF, Browne P (2006) A usage study of retrieval modalities for video shot retrieval. Inf Process Manag 42(5):1330-1344.
[65]
Sun X, Yao H, Liu T, Xu P, Liu X (2008) Place retrieval with graph-view model. In: ACM conference on multimedia information retrieval.
[66]
Tang J, Shao L, Zhen X (2013) Human action retrieval via efficient feature matching. In: Advanced video and signal based surveillance (AVSS), 2013 10th IEEE international conference on. IEEE, pp 306-311.
[67]
Thi TH, Zhang J, Cheng L, Wang L, Satoh S (2010) Human action recognition and localization in video using structured learning of local space-time features. In: Advanced video and signal based surveillance (AVSS), 2010 seventh IEEE international conference on. IEEE, pp 204-211.
[68]
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on multimedia. ACM, pp 107-118.
[69]
Typke R, Wiering F, Veltkamp RC (2005). A survey of music information retrieval systems. In: ISMIR, pp 153-160.
[70]
Wang H, Zheng X, Xiao B (2015) Large-scale human action recognition with spark. In: Multimedia signal processing (MMSP), 2015 IEEE 17th international workshop on. IEEE, pp 1-6.
[71]
Wang J, Liu W, Kumar S, Chang SF (2016) Learning to Hash for indexing big data-a survey. Proc IEEE 104(1):34-57.
[72]
Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimed 14(4):975-985.
[73]
Wu L, Jin R, Jain AK (2013) Tag completion for image retrieval. IEEE Trans Pattern Anal Mach Intell 35(3):716-727.
[74]
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Computer vision and pattern recognition, 1992. Proceedings CVPR'92., 1992 IEEE computer society conference on. IEEE, pp 379-385.
[75]
Yan R, Hauptmann AG, Jin R (2003) Negative pseudo-relevance feedback in content-based video retrieval. In: Proceedings of the eleventh ACM international conference on Multimedia. ACM, pp 343-346.
[76]
Yilmaz A, Shah M (2006) Matching actions in presence of camera motion. Comput Vis Image Underst 104(2):221-231.
[77]
Yu G, Goussies N, Yuan J, Liu Z (2011) Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans Multimed 13(3):507-517.
[78]
Yu G, Yuan J, Liu Z (2011) Real-time human action search using random forest based hough voting. In: Proceedings of the 19th ACM international conference on Multimedia. ACM, pp 1149-1152.
[79]
Yu G, Yuan J, Liu Z (2011) Unsupervised random forest indexing for fast action search. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on. IEEE, pp 865-872.
[80]
Yu G, Yuan J, Liu Z (2015) Unsupervised trees for human action search. In: Human action analysis with randomized trees. Springer Singapore, pp 29-56.
[81]
Yuan J, Liu Z, Wu Y (2011) Discriminative video pattern search for efficient action detection. IEEE Trans Pattern Anal Mach Intell 33(9):1728-1743.
[82]
Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multimed Syst 19(5):395-406.
[83]
Zhang HJ, Wu J, Zhong D, Smoliar SW (1997) An integrated system for content-based video retrieval and browsing. Pattern Recognit 30(4):643-658.
[84]
Zhang Z, Wang C, Xiao B, Zhou W, Liu S, Shi C (2013) Cross-view action recognition via a continuous virtual path. In: Computer vision and pattern recognition (CVPR), 2013 IEEE conference on. IEEE, pp 2690-2697.
[85]
Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3D depth data matching towards robust action retrieval. Neurocomputing 151:533-543.
[86]
Zhen X, Shao L, Tao D, Li X (2013) Embedding motion and structure features for action recognition. IEEE Trans Circuits Syst Video Technol 23(7):1182-1190.
[87]
Zhu F, Shao L (2014) Weakly-supervised cross-domain dictionary learning for visual recognition. Int J Comput Vis 109(1-2):42-59.
[88]
Zhu F, Shao L, Lin M (2013) Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognit Lett 34(1):20-24.
[89]
Zhu X, Liu Z (2011) Human behavior clustering for anomaly detection. Front Comput Sci China 5(3):279-289.

Cited By

View all
  1. A review on human action analysis in videos for retrieval applications

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Artificial Intelligence Review
      Artificial Intelligence Review  Volume 46, Issue 4
      December 2016
      143 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 December 2016

      Author Tags

      1. Action representation
      2. Action search
      3. Big data
      4. Human action recognition
      5. Human action retrieval

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Human Event Recognition in Smart Classrooms Using Computer Vision: A Systematic Literature ReviewProgramming and Computing Software10.1134/S036176882308006649:8(625-642)Online publication date: 1-Dec-2023
      • (2023)Deep cascaded action attention network for weakly-supervised temporal action localizationMultimedia Tools and Applications10.1007/s11042-023-14670-082:19(29769-29787)Online publication date: 15-Mar-2023
      • (2022)Human Action Recognition and Prediction: A SurveyInternational Journal of Computer Vision10.1007/s11263-022-01594-9130:5(1366-1401)Online publication date: 1-May-2022
      • (2022)Action-Aware Network with Upper and Lower Limit Loss for Weakly-Supervised Temporal Action LocalizationNeural Processing Letters10.1007/s11063-022-11042-x55:4(4307-4324)Online publication date: 24-Sep-2022
      • (2022)Dual-Evidential Learning for Weakly-supervised Temporal Action LocalizationComputer Vision – ECCV 202210.1007/978-3-031-19772-7_12(192-208)Online publication date: 23-Oct-2022
      • (2021)Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/345041017:4(1-22)Online publication date: 12-Nov-2021
      • (2021)Surveillance video analysis for student action recognition and localization inside computer laboratories of a smart campusMultimedia Tools and Applications10.1007/s11042-020-09741-580:2(2907-2929)Online publication date: 1-Jan-2021
      • (2021)RETRACTED ARTICLE: Human action recognition using a hybrid deep learning heuristicSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06149-725:20(13079-13092)Online publication date: 1-Oct-2021
      • (2021)Fusion-GCN: Multimodal Action Recognition Using Graph Convolutional NetworksPattern Recognition10.1007/978-3-030-92659-5_17(265-281)Online publication date: 28-Sep-2021
      • (2020)A Novel Parameter Initialization Technique Using RBM-NN for Human Action RecognitionComputational Intelligence and Neuroscience10.1155/2020/88524042020Online publication date: 1-Jan-2020
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media