Abstract
Training and validation sets of labeled data are important components used in supervised learning to build a classification model. During training, most learning algorithms use all images from the given training set to estimate the model’s parameters. Particularly for video classification, it is required a keyframe extraction technique in order to select representative frames for training, which commonly is based on simple heuristics such as low level features frame difference. As some learning algorithms are noise sensitive, it is important to carefully select frames for training so that the model’s optimization is accomplished more accurately and faster. We propose in this paper to analyze four methodologies for selecting representative frames of a periocular video database. One of them is based on the thresholds calculation (T), the other is a modified Kennard-Stone (KS) model, the thir method is based on sum of absolute difference in LUV colorspace and the last one is random sampling. To evaluate the selected image sets we use two deep network methodologies: feature extraction (FE) and fine tuning (FT). The results show that with a reduced amount of training images we can achieve the same accuracy of the complete database using the modified KS refinement methodology and the FT evaluation method.
Similar content being viewed by others
References
Al-Obaydy WNI, Suandi SA (2020) Automatic pose normalization for open-set single-sample face recognition in video surveillance. Multimed Tools Appl 79:12
Alonso-Fernandez F, Bigun J, Englund C (2018) Expression recognition using the periocular region: A feasibility study. In: 2018 14th International conference on signal-image technology internet-based systems (SITIS), pp 536–541
Ambika DR, Radhika KR, Seshachalam D (2012) The eye says it all: Periocular region methodologies. In: 2012 International conference on multimedia computing and systems, pp 180–185
Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357
Balcázar J, Dai Y, Watanabe O (2001) A random sampling technique for training support vector machines. In: Abe N, Khardon R, Zeugmann T (eds) Algorithmic learning theory. Springer, Berlin, pp 119–134
Barra S, Bisogni C, Nappi M, Ricciardi S (2019) F-fid: fast fuzzy-based iris de-noising for mobile security applications. Multimed Tools Appl 01:1–21
Barros de Almeida M, de Padua Braga A, Braga JP (2000) Svm-km: speeding svms learning with a priori cluster selection and k-means. In: Proceedings. vol.1. Sixth Brazilian symposium on neural networks, pp 162–167
Barroso E, Santos G, Cardoso L, Padole C, Proença H (2016) Periocular recognition: how much facial expressions affect performance? vol 19
Bulut E, Capin T (2007) Key frame extraction from motion capture data by curve saliency. CASA
Cervantes J, Lamont FG, López-Chau A, Mazahua LR, Ruíz JS (2015) Data selection based on decision tree for svm classification on large data sets. Appl Soft Comput 37:787–798. Online. Available: http://www.sciencedirect.com/science/article/pii/S1568494615005591
de Sousa LC (2008) Espectroscopia na região do infravermelho próximo para predição de características da madeira para produção de celulose. Ph.D. dissertation Universidade Federal de Viçosa
de Sousa LC, Gomide JL, Milagres FR, de Almeida DP (2011) Desenvolvimento de modelos de calibração nirs para minimização das análises de madeira de eucalyptus spp. Ciência Florestal 21(3):591–599. OnlineAvailable: http://www.scielo.br/pdf/cflo/v21n3/1980-5098-cflo-21-03-00591.pdf
de Souza JM, Gonzaga A (2019) Human iris feature extraction under pupil size variation using local texture descriptors. Multimed Tools Appl. Online. Available: https://doi.org/10.1007/s11042-019-7371-4
Ding J, Chen B, Liu H, Huang M (2016) Convolutional neural network with data augmentation for sar target recognition. IEEE Geosci Remote Sens Lett 13(3):364–368
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T, Xing EP (2014) Decaf: A deep convolutional activation feature for generic visual recognition. In: Jebara T (ed) Proceedings of the 31st international conference on machine learning, ser. Proceedings of Machine Learning Research, Bejing, China: PMLR, vol 32, pp 647–655. Online. Available: http://proceedings.mlr.press/v32/donahue14.html
Ferraz CT, Saito JH (2018) A comprehensive analysis of local binary convolutional neural network for fast face recognition in surveillance video. In: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web, ser. WebMedia ’18. ACM, New York, pp 265–268. Online. Available: https://doi.org/10.1145/3243082.3267444
Gawande U, Hajari K, Golhar Y (2020) Deep learning approach to key frame detection in human action videos. In: Sadollah A, Sinha TS (eds) Recent trends in computational intelligence. Rijeka: IntechOpen, ch. 7. Online. Available: https://doi.org/10.5772/intechopen.91188
González-Lozoya S, de la Calleja J, Pellegrin L, Escalante HJ, Medina M, Benitez-Ruiz A (2020) Recognition of facial expressions based on cnn features. Multimed Tools Appl
Hannane R, Elboushaki A, Afdel K, Naghabhushan P, Javed M (2016) An efficient method for video shot boundary detection and keyframe extraction using sift-point distribution histogram. Int J Multimed Info Retrieval 5:89–104. Online. Available: https://link.springer.com/article/10.1007/s13735-016-0095-6
He K, Girshick RB, Dollár P (2018) Rethinking imagenet pre-training, arXiv:1811.08883
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition, arXiv:1512.03385
Hernandez-Diaz K, Alonso-Fernandez F, Bigün J (2018) Periocular recognition using CNN features off-the-shelf, arXiv:1809.06157
Jogin M, Mohana MS, Divya GD, Meghana RK, Apoorva S (2018) Feature extraction using convolution neural networks (cnn) and deep learning. In: 2018 3rd IEEE International conference on recent trends in electronics, information communication technology (RTEICT), pp 2319–2323
Kaudki O, Bhurchandi K (2018) A robust iris recognition approach using fuzzy edge processing technique. In: 2018 9th International conference on computing, communication and networking technologies (ICCCNT), pp 1–6
Kennard R, Stone L (1969) Computer aided design of experiments. Technometrics 11:137–148
Lee YW, Kim KW, Hoang TM, Arsalan M, Park KR (2019) Deep residual cnn-based ocular recognition based on rough pupil detection in the images by nir camera sensor. Sensors (Basel, Switzerland), vol 19
Morais CLM, Santos MCD, Lima KMG, Martin FL (2019) Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach. In: Bioinformatics, vol 35, pp 5257–5263. Online. Available: https://doi.org/10.1093/bioinformatics/btz421
Muhammad K, Hussain T, Baik SW (2018) Efficient cnn based summarization of surveillance videos for resource-constrained devices. Pattern Recogn Lett. Online. Available: http://www.sciencedirect.com/science/article/pii/S0167865518303842
Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900. https://doi.org/10.1007/s10462-017-9611-1
Nguyen K, Fookes C, Ross A, Sridharan S (2018) Iris recognition with off-the-shelf cnn features: A deep learning perspective. IEEE Access 6:18848–18855
Nigam I, Vatsa M, Singh R (2015) Ocular biometrics: A survey of modalities and fusion approaches. Information Fusion 26:1–35. Online. Available: http://www.sciencedirect.com/science/article/pii/S1566253515000354
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Ouyang S, Zhong L, Luo R (2018) The comparison and analysis of extracting video key frame. IOP Conference Series: Materials Science and Engineering 359:012010
Padole C, Proenca H (2012) Periocular recognition: Analysis of performance degradation factors. In: 2012 5th IAPR International conference on biometrics (ICB), pp 439–445
Paul MKA, Kavitha J, Rani PAJ (2018) Key-frame extraction techniques: A review. Recent Patents on Computer Science (Discontinued) 11(1):3–16
Proenca H, Neves JC (2017) Irina: Iris recognition (even) in inaccurately segmented data. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Qi X, Liu C, Schuckers S (2018) Boosting face in video recognition via cnn based key frame extraction. In: 2018 International conference on biometrics (ICB), pp 132–139
Ravanbakhsh M, Mousavi H, Rastegari M, Murino V, Davis LS (2015) Action recognition with image based CNN features, arXiv:1512.03980
Sajjad M, Khan S, Muhammad K, Wu W, Ullah A, Baik SW (2019) Multi-grade brain tumor classification using deep cnn with extensive data augmentation. Journal of Computational Science 30:174–182. Online. Available: http://www.sciencedirect.com/science/article/pii/S1877750318307385
Saptoro A, Tadé M (2012) A modified kennard-stone algorithm for optimal division of data for developing artificial neural network models. Chem Prod Process Model, vol 7
Schanda J (2007) Colorimetry: Understanding the CIE System. Wiley. Online. Available: https://books.google.com.br/books?id=uZadszSGe9MC
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the Ninth ACM International Conference on Multimedia, ser. MULTIMEDIA ’01. ACM, New York, pp 107–118, DOI https://doi.org/10.1145/500141.500159, (to appear in print)
Tran L, Choi D (2020) Data augmentation for inertial sensor-based gait deep neural network. IEEE Access 8:12364–12378
Verbiest N, Derrac J, Cornelis C, García S, Herrera F (2016) Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis. Appl Soft Comput 38:10–22. Online. Available: http://www.sciencedirect.com/science/article/pii/S1568494615005761
Yang HS, Lee JM, Jeong SKW, Kim S, Moon YS (2019) Improved quality keyframe selection method for hd video. KSII Trans Internet Info Sys 13(6):3074–3091
Acknowledgments
The authors would like to thank NVidia for GPU donation. We would also like to thank Capes (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) for financial support, financing code - 001.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Toledo Ferraz, C., Barcellos, W., Pereira Junior, O. et al. A comparison among keyframe extraction techniques for CNN classification based on video periocular images. Multimed Tools Appl 80, 12843–12856 (2021). https://doi.org/10.1007/s11042-020-10384-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10384-9