A methodology for image annotation of human actions in videos

Moomina Waheed¹,
Shahid Hussain ORCID: orcid.org/0000-0001-5945-3467¹,
Arif Ali Khan²,
Mansoor Ahmed¹ &
…
Bashir Ahmad³

218 Accesses
Explore all metrics

Abstract

In the context of video-based image classification, image annotation plays a vital role in improving the image classification decision based on it’s semantics. Though, several methods have been introduced to adopt the image annotation such as manual and semi-supervised. However, formal specification, high cost, high probability of errors and computation time remain major issues to perform image annotation. In order to overcome these issues, we propose a new image annotation technique which consists of three tiers namely frames extraction, interest point’s generation, and clustering. The aim of the proposed technique is to automate the label generation of video frames. Moreover, an evaluation model to assess the effectiveness of the proposed technique is used. The promising results of the proposed technique indicate the effectiveness (77% in terms of Adjusted Random Index) of the proposed technique in the context label generation for video frames. In the end, a comparative study analysis is made between the existing techniques and proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Recognizing key segments of videos for video annotation by learning from web image sets

Article 01 February 2016

Discovering Video Clusters from Visual Features and Noisy Tags

Video Segmentation Framework Based on Multi-kernel Representations and Feature Relevance Analysis for Object Classification

References

Bi Y, Zhang M, Xue B (2018) Genetic programming for automatic global and local feature extraction to image classification. IEEE Congress on Evolutionary Computation (CEC). https://doi.org/10.1109/cec.2018.8477911
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797. https://doi.org/10.1016/j.eswa.2014.09.054
Article Google Scholar
Chap3.htm. http://www.nada.kth.se/cvap/actions/. Accessed 11 Jan 2019
Chen Y, Gao H, Cai L, Shi M, Shen D, Ji S (2018) Voxel deconvolutional networks for 3D brain image labeling. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD 18, 2018. https://doi.org/10.1145/3219819.3219974
Donges N (2018) Pros and cons of neural networks – towards data science. Towards Data Science April 17, 2018. https://towardsdatascience.com/hype-disadvantages-of-neural-networks-6af04904ba5b. Accessed 27 Dec 2018
Gerum RC, Richter S, Fabry B, Zitterbart DP (2016) ClickPoints: an expandable toolbox for scientific image annotation and analysis. Methods Ecol Evol 8(6):750–756. https://doi.org/10.1111/2041-210x.12702
Article Google Scholar
Hernández-García R, Ramos-Cózar J, Guil N, García-Reyes E, Sahli H (2018) Improving bag-of-visual-words model using visual N-grams for human action classification. Expert Syst Appl 92:182–191. https://doi.org/10.1016/j.eswa.2017.09.016
Article Google Scholar
Kapildalwani Using Silhouette Analysis for Selecting the Number of Cluster for K-means Clustering. (Part 2). Data Science Musing of Kapild. December 08, 2015. https://kapilddatascience.wordpress.com/2015/11/10/using-silhouette-analysis-for-selecting-the-number-of-cluster-for-k-means-clustering/. Accessed 5 Jan 2019
Kavasidis I, Palazzo S, Di Salvo R, Giordano D, Spampinato C (2013) An innovative web-based collaborative platform for video annotation. Multimed Tools Appl 70(1):413–432. https://doi.org/10.1007/s11042-013-1419-7
Article Google Scholar
Lonarkar V, Rao BA (2017) Content-based Image Retrieval by Segmentation and Clustering. International Conference on Inventive Computing and Informatics (ICICI), 2017. https://doi.org/10.1109/icici.2017.8365241
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/b:visi.0000029664.99615.94
Article Google Scholar
Luo S, Yang H, Cheng W, Che X, Meinel C (2016) Real-time action recognition in surveillance videos using ConvNets. Neural Information Processing Lecture Notes in Computer Science:529–537. https://doi.org/10.1007/978-3-319-46675-0_58
Luo S, Yang H, Wang C, Che X, Meinel C (2016) Action recognition in surveillance video using ConvNets and motion history image. Artificial Neural Networks and Machine Learning – ICANN 2016 Lecture Notes in Computer Science, 187–195. https://doi.org/10.1007/978-3-319-44781-0_23.
Nemirovskiy VB, Stoyanov AK (2017) Clustering face images. Comput Opt 41(1):59–66. https://doi.org/10.18287/2412-6179-2017-41-1-59-66
Article Google Scholar
Pagare R, Shinde A (2012) A study on image annotation techniques. Int J Comput Appl 37(6):42–45. https://doi.org/10.5120/4616-6295
Article Google Scholar
Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039
Article Google Scholar
Peikari M, Salama S, Nofech-Mozes S, Martel AL (2018) A cluster-then-label semi-supervised learning approach for pathology image classifica-tion. Sci Rep 8(1). https://doi.org/10.1038/s41598-018-24876-0
Peng X, Zhao B, Yan R, Tang H, Yi Z (2017) Bag of events: an efficient probability-based feature extraction method for AER image sensors. IEEE Transactions on Neural Networks and Learning Systems 28(4):791–803. https://doi.org/10.1109/tnnls.2016.2536741
Article Google Scholar
Rezaee MJ, Jozmaleki M, Valipour M (2018) Integrating dynamic fuzzy C-means, data envelopment analysis and artificial neural network to online prediction performance of companies in stock exchange. Physica A: Statistical Mechanics and Its Applications 489:78–93. https://doi.org/10.1016/j.physa.2017.07.017
Article Google Scholar
Sarwas G, Skoneczny S (2018) FSIFT based feature points for face hierarchical clustering. 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). https://doi.org/10.23919/spa.2018.8563400
Sharma D, Thulasiraman K, Wu D, Jiang JN (2017) Power network equivalents: a network science based K-means clustering method integrated with Silhouette analysis. Studies in Computational Intelligence Complex Networks & Their Applications VI:78–89. https://doi.org/10.1007/978-3-319-72150-7_7
Shi Z (2010) Image semantic analysis and understanding. Intelligent Information Processing V IFIP Advances in Information and Communication Technology, pp 4–5. https://doi.org/10.1007/978-3-642-16327-2_4
Space-Time Video Completion. http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html. Accessed 11 Jan 2019
Steinley D, Brusco MJ (2017) A note on the expected value of the Rand Index. Brit J Math Stat Psychol 71(2):287–299. https://doi.org/10.1111/bmsp.12116
Article MATH Google Scholar
Tran D, Sorokin A (2008) Human Activity Recognition with Metric Learn-ing. Lecture Notes in Computer Science Computer Vision – ECCV 2008, 548–561. https://doi.org/10.1007/978-3-540-88682-242
UCF Center for Research. MENU. Center for research in comptuer vision. http://crcv.ucf.edu/data/UCF_Sports_Action.php. Accessed 11 Jan 2019
UCF Center for Research. MENU. Center for research in comptuer vision. http://crcv.ucf.edu/data/UCF_YouTube_Action.php. Accessed 11 Jan 2019
Ukita N, Uematsu Y (2018) Semi- and weakly-supervised human pose estimation. Comput Vis Image Underst 170:67–78. https://doi.org/10.1016/j.cviu.2018.02.003
Article Google Scholar
Wagner J, Baur T, Zhang Y, Valstar M. F, Schuller B, André E (2018) Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora. ArXiv.org. February 07, 2018. https://arxiv.org/abs/1802.02565. Accessed 29 Mar 2019
Wang C, Yang H, Meinel C (2016) Exploring multimodal video representation for action recognition. 2016 International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/ijcnn.2016.7727435

Download references

Author information

Authors and Affiliations

COMSATS University, Islamabad, Pakistan
Moomina Waheed, Shahid Hussain & Mansoor Ahmed
College of CS&T, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Arif Ali Khan
Department of Computer Science, Qurtuba University of Science & Information Technology, DIK, Pakistan
Bashir Ahmad

Authors

Moomina Waheed
View author publications
You can also search for this author in PubMed Google Scholar
Shahid Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Arif Ali Khan
View author publications
You can also search for this author in PubMed Google Scholar
Mansoor Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Bashir Ahmad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shahid Hussain.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Waheed, M., Hussain, S., Khan, A.A. et al. A methodology for image annotation of human actions in videos. Multimed Tools Appl 79, 24347–24365 (2020). https://doi.org/10.1007/s11042-020-09091-2

Download citation

Received: 08 July 2019
Revised: 02 April 2020
Accepted: 22 May 2020
Published: 20 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11042-020-09091-2

A methodology for image annotation of human actions in videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Recognizing key segments of videos for video annotation by learning from web image sets

Discovering Video Clusters from Visual Features and Noisy Tags

Video Segmentation Framework Based on Multi-kernel Representations and Feature Relevance Analysis for Object Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A methodology for image annotation of human actions in videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Recognizing key segments of videos for video annotation by learning from web image sets

Discovering Video Clusters from Visual Features and Noisy Tags

Video Segmentation Framework Based on Multi-kernel Representations and Feature Relevance Analysis for Object Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation