Shot Classification and Keyframe Detection for Vision Based Speakers Diarization in Parliamentary Debates

Pedro A. Marín-Reyes²⁰,
Javier Lorenzo-Navarro²⁰,
Modesto Castrillón-Santana²⁰ &
…
Elena Sánchez-Nielsen²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9868))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

1673 Accesses
2 Citations

Abstract

Automatic labelling of speakers is an essential task for speakers diarization in parliamentary debates given the huge amount of video data to annotate. In this paper, we address the speaker diarization problem as a visual speaker re-identification issue with a special emphasis on the analysis of different shot types. We propose two approaches that makes use of convolutional neural networks (CNN) and biometric traits for keyframe extraction. Experimental results have been evaluated with challenging real-world datasets from the Canary Islands Parliament, and contrasted with a similar approach that does not analyze the shot type. Results show that the use of CNN for shot classification and biometric traits help to improve the performance of the re-identification outcomes in an average rate of 9.8 %.

This work has been partially supported by the Spanish Government under the projects TIN2011-24598 and TIN2015-64395-R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Who is Really Talking? A Visual-Based Speaker Diarization Strategy

Cross-Modal Active Speaker Detection Algorithm in Video and End-To-End Landing Solution

AS-Net: active speaker detection using deep audio-visual attention

Article Open access 05 February 2024

References

Barra-Chicote, R., Pardo, J.M., Ferreiros, J., Montero, J.M.: Speaker diarization based on intensity channel contribution. IEEE Trans. Audio Speech Lang. Process. 19(4), 754–761 (2011)
Article Google Scholar
Castrillón, M., Déniz, O., Hernández, D., Lorenzo, J.: A comparison of face and facial feature detectors based on the violajones general object detection framework. Mach. Vis. Appl. 22(3), 481–494 (2011)
Google Scholar
Cong, D.-N.T., Khoudour, L., Achard, C., Meurie, C., Lezoray, O.: People re-identification by spectral classification of silhouettes. Sig. Process. 90(8), 2362–2374 (2010). Special Section on Processing and Analysis of High-Dimensional Masses of Image and Signal Data
Article MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, London (2006)
MATH Google Scholar
Garau, G., Bourlard, H.: Using audio and visual cues for speaker diarisation initialisation. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4942–4945 (2010)
Google Scholar
Kapsouras, I., Tefas, A., Nikolaidis, N., Peeters, G., Benaroya, L., Pitas. I.: Multimodal speaker clustering in full length movies. Multimed. Tools Appl. 1–20 (2016). doi:10.1007/s11042-015-3181-5
Google Scholar
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1962–1977 (2011)
Article Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Noulas, A., Englebienne, G., Krose, B.J.A.: Multimodal speaker diarization. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 79–93 (2012)
Article Google Scholar
Sánchez-Nielsen, E., Chávez-Gutiérrez, F., Lorenzo-Navarro, J., Castrillón-Santana, M.: A multimedia system to produce and deliver video fragments on demand on parliamentary websites. Multimed. Tools Appl. 1–27 (2016). doi:10.1007/s11042-016-3306-5
Google Scholar
Sao, N., Mishra, R.: A survey based on video shot boundary detection techniques. Int. J. Adv. Res. Comput. Commun. Eng. (IJARCCE) 3(4) (2014)
Google Scholar
Sarafianos, N., Giannakopoulos, T., Petridis, S.: Audio-visual speaker diarization using fisher linear semi-discriminant analysis. Multimed. Tools Appl. 75(1), 115–130 (2016)
Article Google Scholar
Sujatha, C., Mudenagudi, U.: A study on keyframe extraction methods for video summary. In: 2011 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 73–77 (2011)
Google Scholar
Teixeira, T., Dublon, G., Savvides, A.: A survey of human-sensing: methods for detecting presence, count, location, track, and identity. ACM Comput. Surv. 5, 1–77 (2010)
Google Scholar
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)
Article Google Scholar
Vallet, F., Essid, S., Carrive, J.: A multimodal approach to speaker diarization on TV talk-shows. IEEE Trans. Multimed. 15(3), 509–520 (2013)
Article Google Scholar
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 151–173 (2004)
Article Google Scholar
Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: a literature survey. Assoc. Comput. Mach. 35(4), 399–458 (2003). http://doi. acm.org/10.1145/954339.954342
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Universitario SIANI, Universidad de Las Palmas de Gran Canaria, 35017, Las Palmas, Spain
Pedro A. Marín-Reyes, Javier Lorenzo-Navarro & Modesto Castrillón-Santana
Departamento de Ingeniería Informática y de Sistemas, Universidad de la Laguna, 38271, Santa Cruz de Tenerife, Spain
Elena Sánchez-Nielsen

Authors

Pedro A. Marín-Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Javier Lorenzo-Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Modesto Castrillón-Santana
View author publications
You can also search for this author in PubMed Google Scholar
Elena Sánchez-Nielsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro A. Marín-Reyes .

Editor information

Editors and Affiliations

Artificial Intelligence Center, University of Oviedo, Gijón, Spain
Oscar Luaces
University of Castilla-La Mancha , Albacete, Spain
José A. Gámez
Public University of Navarre , Pamplona, Spain
Edurne Barrenechea
Universidad Pablo de Olavide , Sevilla, Spain
Alicia Troncoso
Public University of Navarre , Pamplona, Navarra, Spain
Mikel Galar
University of Salamanca , Salamanca, Spain
Héctor Quintián
University of Salamanca , Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marín-Reyes, P.A., Lorenzo-Navarro, J., Castrillón-Santana, M., Sánchez-Nielsen, E. (2016). Shot Classification and Keyframe Detection for Vision Based Speakers Diarization in Parliamentary Debates. In: Luaces , O., et al. Advances in Artificial Intelligence. CAEPIA 2016. Lecture Notes in Computer Science(), vol 9868. Springer, Cham. https://doi.org/10.1007/978-3-319-44636-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-44636-3_5
Published: 08 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44635-6
Online ISBN: 978-3-319-44636-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics