Improving Audiovisual Content Annotation Through a Semi-automated Process Based on Deep Learning

Luís Vilaça¹⁹,
Paula Viana^19,20,
Pedro Carvalho²⁰ &
…
Teresa Andrade^20,21

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 942))

Included in the following conference series:

International Conference on Soft Computing and Pattern Recognition

491 Accesses
3 Citations

Abstract

Over the last years, Deep Learning has become one of the most popular research fields of Artificial Intelligence. Several approaches have been developed to address conventional challenges of AI. In computer vision, these methods provide the means to solve tasks like image classification, object identification and extraction of features.

In this paper, some approaches to face detection and recognition are presented and analyzed, in order to identify the one with the best performance. The main objective is to automate the annotation of a large dataset and to avoid the costy and time-consuming process of content annotation. The approach follows the concept of incremental learning and a R-CNN model was implemented. Tests were conducted with the objective of detecting and recognizing one personality within image and video content.

Results coming from this initial automatic process are then made available to an auxiliary tool that enables further validation of the annotations prior to uploading them to the archive.

Tests show that, even with a small size dataset, the results obtained are satisfactory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 103.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Lead Artist Identification from Music Videos Using Auto-encoders

Understanding videos with face recognition: a complete pipeline and applications

Article 15 June 2022

Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks

Article 01 July 2017

References

Darkflow repository. https://github.com/thtrieu/darkflow. Accessed 09 July 2018
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: recognition using class specific linear projection. Technical report, Yale University, New Haven, United States (1997)
Article Google Scholar
Bertini, M., Del Bimbo, A., Torniai, C.: Automatic video annotation using ontologies extended with visual information. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, MULTIMEDIA 2005, pp. 395–398. ACM, New York (2005)
Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-8(6), 679–698 (1986)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, vol. 1, pp. 886–893, June 2005
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524 (2013)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE (2017)
Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017)
Google Scholar
Howell, A.J., Buxton, H.: Invariance in radial basis function neural networks in human face classification. Neural Process. Lett. 2(3), 26–30 (1995)
Article Google Scholar
Kotropoulos, C., Pitas, I.: Rule-based face detection in frontal views. In: Proceedings International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 2537–2540 (1997)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks, pp. 1097–1105 (2012)
Google Scholar
Lanitis, A., Taylor, C.J., Cootes, T.F.: Automatic interpretation and coding of face images using flexible models. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 743–756 (1997)
Article Google Scholar
Larson, M., Soleymani, M., Serdyukov, P., Rudinac, S., Wartena, C., Murdock, V., Friedland, G., Ordelman, R., Jones, G.J.F.: Automatic tagging and geotagging in video collections and communities. In: Proceedings 1st ACM International Conference on Multimedia Retrieval, ICMR 2011, pp. 51:1–51:8. ACM, New York (2011)
Google Scholar
Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)
Article Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Google Scholar
Moxley, E., Mei, T., Hua, X., Ma, W., Manjunath, B.S.: Automatic video annotation through search and mining. In: 2008 IEEE International Conference on Multimedia and Expo, pp. 685–688, June 2008
Google Scholar
Osuna, E., Freund, R., Girosit, F.: Training support vector machines: an application to face detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 130–136, June 1997
Google Scholar
Pinto, J.P., Viana, P.: TAG4VD: a game for collaborative video annotation. In: Proceedings of the 2013 ACM International Workshop on Immersive Media Experiences, ImmersiveMe 2013, pp. 25–28. ACM, New York (2013)
Google Scholar
Pinto, J.P., Viana, P.: Using the crowd to boost video annotation processes: a game based approach. In: Proceedings of the 12th European Conference on Visual Media Production, CVMP 2015, pp. 22:1–22:1. ACM, New York (2015)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR abs/1612.08242 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks (2015)
Google Scholar
Sirohey, S.A.: Human face segmentation and identification. Technical report (1993)
Google Scholar
Sirovich, L., Kirby, M.: Low-dimensional procedure for the characterization of human faces. J. Opt. Soc. Am. A 4(3), 519–524 (1987)
Article Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015)
Google Scholar
Tsukamoto, A., Lee, C.W., Tsuji, S.: Detection and pose estimation of human face with synthesized image models. In: Proceedings of 12th International Conference on Pattern Recognition, vol. 1, pp. 754–757, October 1994
Google Scholar
Tukamoto, A.: Detection and tracking of human face with synthesized templates. In: Proceedings of the ACCV 1993, pp. 183–186 (1993)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2013)
MATH Google Scholar
Viana, P., Pinto, J.P.: A collaborative approach for semantic time-based video annotation using gamification. Hum.-Centric Comput. Inf. Sci. 7(1), 13 (2017)
Article Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features, vol. 1, pp. I-511–I-518 (2001)
Google Scholar
Yang, G., Huang, T.S.: Human face detection in a complex background. Pattern Recognit. 27(1), 53–63 (1994)
Article Google Scholar
Yang, M.H., Ahuja, N.: Detecting human faces in color images. In: Proceedings of the International Conference on Image Processing, ICIP 1998, vol. 1, pp. 127–130, October 1998
Google Scholar

Download references

Acknowledgments

The work presented was partially supported by the following projects: FourEyes, a Research Line within project “TEC4Growth: Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01- 0145-FEDER-000020” financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF); FotoInMotion funded by H2020 Framework Programme of the European Commission.

Author information

Authors and Affiliations

School of Engineering, Polytechnic of Porto, Porto, Portugal
Luís Vilaça & Paula Viana
INESC TEC, Porto, Portugal
Paula Viana, Pedro Carvalho & Teresa Andrade
Faculty of Engineering, University of Porto, Porto, Portugal
Teresa Andrade

Authors

Luís Vilaça
View author publications
You can also search for this author in PubMed Google Scholar
Paula Viana
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Andrade
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paula Viana .

Editor information

Editors and Affiliations

School of Engineering, Instituto Superior de Engenharia (ISEP/IPP), Porto, Portugal
Ana Maria Madureira
Machine Intelligence Research Labs (MIR), Auburn, WA, USA
Ajith Abraham
Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
Politécnico de Leiria, Leiria, Portugal
Catarina Silva
Politécnico de Leiria, Leiria, Portugal
Mário Antunes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vilaça, L., Viana, P., Carvalho, P., Andrade, T. (2020). Improving Audiovisual Content Annotation Through a Semi-automated Process Based on Deep Learning. In: Madureira, A., Abraham, A., Gandhi, N., Silva, C., Antunes, M. (eds) Proceedings of the Tenth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2018). SoCPaR 2018. Advances in Intelligent Systems and Computing, vol 942. Springer, Cham. https://doi.org/10.1007/978-3-030-17065-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-17065-3_7
Published: 10 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17064-6
Online ISBN: 978-3-030-17065-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics