Abstract
In the context of video-based image classification, image annotation plays a vital role in improving the image classification decision based on it’s semantics. Though, several methods have been introduced to adopt the image annotation such as manual and semi-supervised. However, formal specification, high cost, high probability of errors and computation time remain major issues to perform image annotation. In order to overcome these issues, we propose a new image annotation technique which consists of three tiers namely frames extraction, interest point’s generation, and clustering. The aim of the proposed technique is to automate the label generation of video frames. Moreover, an evaluation model to assess the effectiveness of the proposed technique is used. The promising results of the proposed technique indicate the effectiveness (77% in terms of Adjusted Random Index) of the proposed technique in the context label generation for video frames. In the end, a comparative study analysis is made between the existing techniques and proposed methodology.
Similar content being viewed by others
References
Bi Y, Zhang M, Xue B (2018) Genetic programming for automatic global and local feature extraction to image classification. IEEE Congress on Evolutionary Computation (CEC). https://doi.org/10.1109/cec.2018.8477911
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797. https://doi.org/10.1016/j.eswa.2014.09.054
Chap3.htm. http://www.nada.kth.se/cvap/actions/. Accessed 11 Jan 2019
Chen Y, Gao H, Cai L, Shi M, Shen D, Ji S (2018) Voxel deconvolutional networks for 3D brain image labeling. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD 18, 2018. https://doi.org/10.1145/3219819.3219974
Donges N (2018) Pros and cons of neural networks – towards data science. Towards Data Science April 17, 2018. https://towardsdatascience.com/hype-disadvantages-of-neural-networks-6af04904ba5b. Accessed 27 Dec 2018
Gerum RC, Richter S, Fabry B, Zitterbart DP (2016) ClickPoints: an expandable toolbox for scientific image annotation and analysis. Methods Ecol Evol 8(6):750–756. https://doi.org/10.1111/2041-210x.12702
Hernández-García R, Ramos-Cózar J, Guil N, García-Reyes E, Sahli H (2018) Improving bag-of-visual-words model using visual N-grams for human action classification. Expert Syst Appl 92:182–191. https://doi.org/10.1016/j.eswa.2017.09.016
Kapildalwani Using Silhouette Analysis for Selecting the Number of Cluster for K-means Clustering. (Part 2). Data Science Musing of Kapild. December 08, 2015. https://kapilddatascience.wordpress.com/2015/11/10/using-silhouette-analysis-for-selecting-the-number-of-cluster-for-k-means-clustering/. Accessed 5 Jan 2019
Kavasidis I, Palazzo S, Di Salvo R, Giordano D, Spampinato C (2013) An innovative web-based collaborative platform for video annotation. Multimed Tools Appl 70(1):413–432. https://doi.org/10.1007/s11042-013-1419-7
Lonarkar V, Rao BA (2017) Content-based Image Retrieval by Segmentation and Clustering. International Conference on Inventive Computing and Informatics (ICICI), 2017. https://doi.org/10.1109/icici.2017.8365241
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/b:visi.0000029664.99615.94
Luo S, Yang H, Cheng W, Che X, Meinel C (2016) Real-time action recognition in surveillance videos using ConvNets. Neural Information Processing Lecture Notes in Computer Science:529–537. https://doi.org/10.1007/978-3-319-46675-0_58
Luo S, Yang H, Wang C, Che X, Meinel C (2016) Action recognition in surveillance video using ConvNets and motion history image. Artificial Neural Networks and Machine Learning – ICANN 2016 Lecture Notes in Computer Science, 187–195. https://doi.org/10.1007/978-3-319-44781-0_23.
Nemirovskiy VB, Stoyanov AK (2017) Clustering face images. Comput Opt 41(1):59–66. https://doi.org/10.18287/2412-6179-2017-41-1-59-66
Pagare R, Shinde A (2012) A study on image annotation techniques. Int J Comput Appl 37(6):42–45. https://doi.org/10.5120/4616-6295
Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039
Peikari M, Salama S, Nofech-Mozes S, Martel AL (2018) A cluster-then-label semi-supervised learning approach for pathology image classifica-tion. Sci Rep 8(1). https://doi.org/10.1038/s41598-018-24876-0
Peng X, Zhao B, Yan R, Tang H, Yi Z (2017) Bag of events: an efficient probability-based feature extraction method for AER image sensors. IEEE Transactions on Neural Networks and Learning Systems 28(4):791–803. https://doi.org/10.1109/tnnls.2016.2536741
Rezaee MJ, Jozmaleki M, Valipour M (2018) Integrating dynamic fuzzy C-means, data envelopment analysis and artificial neural network to online prediction performance of companies in stock exchange. Physica A: Statistical Mechanics and Its Applications 489:78–93. https://doi.org/10.1016/j.physa.2017.07.017
Sarwas G, Skoneczny S (2018) FSIFT based feature points for face hierarchical clustering. 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). https://doi.org/10.23919/spa.2018.8563400
Sharma D, Thulasiraman K, Wu D, Jiang JN (2017) Power network equivalents: a network science based K-means clustering method integrated with Silhouette analysis. Studies in Computational Intelligence Complex Networks & Their Applications VI:78–89. https://doi.org/10.1007/978-3-319-72150-7_7
Shi Z (2010) Image semantic analysis and understanding. Intelligent Information Processing V IFIP Advances in Information and Communication Technology, pp 4–5. https://doi.org/10.1007/978-3-642-16327-2_4
Space-Time Video Completion. http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html. Accessed 11 Jan 2019
Steinley D, Brusco MJ (2017) A note on the expected value of the Rand Index. Brit J Math Stat Psychol 71(2):287–299. https://doi.org/10.1111/bmsp.12116
Tran D, Sorokin A (2008) Human Activity Recognition with Metric Learn-ing. Lecture Notes in Computer Science Computer Vision – ECCV 2008, 548–561. https://doi.org/10.1007/978-3-540-88682-242
UCF Center for Research. MENU. Center for research in comptuer vision. http://crcv.ucf.edu/data/UCF_Sports_Action.php. Accessed 11 Jan 2019
UCF Center for Research. MENU. Center for research in comptuer vision. http://crcv.ucf.edu/data/UCF_YouTube_Action.php. Accessed 11 Jan 2019
Ukita N, Uematsu Y (2018) Semi- and weakly-supervised human pose estimation. Comput Vis Image Underst 170:67–78. https://doi.org/10.1016/j.cviu.2018.02.003
Wagner J, Baur T, Zhang Y, Valstar M. F, Schuller B, André E (2018) Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora. ArXiv.org. February 07, 2018. https://arxiv.org/abs/1802.02565. Accessed 29 Mar 2019
Wang C, Yang H, Meinel C (2016) Exploring multimodal video representation for action recognition. 2016 International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/ijcnn.2016.7727435
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Waheed, M., Hussain, S., Khan, A.A. et al. A methodology for image annotation of human actions in videos. Multimed Tools Appl 79, 24347–24365 (2020). https://doi.org/10.1007/s11042-020-09091-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09091-2