Who is Really Talking? A Visual-Based Speaker Diarization Strategy

Pedro A. Marín-Reyes¹⁶,
Javier Lorenzo-Navarro¹⁶,
Modesto Castrillón-Santana¹⁶ &
…
Elena Sánchez-Nielsen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10672))

Included in the following conference series:

International Conference on Computer Aided Systems Theory

1335 Accesses
1 Citations

Abstract

The speaker activity at the Canary Islands Parliament is recorded, and later manually annotated. This task can be modelled as a diarization problem, that is a way to automatically annotated who and when is speaking. In this paper, we propose the use of the visual cue to solve the diarization task. To perform this approach, it is mandatory to detect individuals, determine the one speaking, and extract features for matching. In order to test the performance of our proposal, we evaluate four different strategies based on the visual shot features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Character-Aware Audio-Visual Subtitling in Context

Multimodal speaker clustering in full length movies

Article 11 January 2016

Shot Classification and Keyframe Detection for Vision Based Speakers Diarization in Parliamentary Debates

Notes

1.
Videos available at http://www.parcan.es.

References

Miró, X.A., Bozonnet, S., Evans, N.W.D., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)
Article Google Scholar
Barra-Chicote, R., Pardo, J.M., Ferreiros, J., Montero, J.M.: Speaker diarization based on intensity channel contribution. IEEE Trans. Audio Speech Lang. Process. 19(4), 754–761 (2011)
Article Google Scholar
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)
Article Google Scholar
Ning, H., Liu, M., Tang, H., Huang, T.: A spectral clustering approach to speaker diarization. In: Proceedings of ICSLP (2006)
Google Scholar
Lupu, E., Apatean, A., Arsinte, R.: Speaker diarization experiments for Romanian parliamentary speech. In: 2015 International Symposium on Signals, Circuits and Systems (ISSCS), pp. 1–4, July 2015
Google Scholar
Meignier, S., Merlin, T.: Lium spkdiarization: an open source toolkit for diarization. In: CMU SPUD Workshop, Dallas (Texas, USA), mars 2010
Google Scholar
Campr, P., Kunešová, M., Vaněk, J., Čech, J., Psutka, J.: Audio-video speaker diarization for unsupervised speaker and face model creation. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 465–472. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_56
Google Scholar
Everingham, M., Sivic, J., Zisserman, A.: Taking the bite out of automated naming of characters in TV video. Image Vis. Comput. 27(5), 545–559 (2009)
Article Google Scholar
Sang, J., Xu, C.: Robust face-name graph matching for movie character identification. IEEE Trans. Multimed. 14(3), 586–596 (2012)
Article Google Scholar
Marín-Reyes, P.A., Lorenzo-Navarro, J., Castrillón-Santana, M., Sánchez-Nielsen, E.: Shot classification and keyframe detection for vision based speakers diarization in parliamentary debates. In: Luaces, O., Gámez, J.A., Barrenechea, E., Troncoso, A., Galar, M., Quintián, H., Corchado, E. (eds.) CAEPIA 2016. LNCS (LNAI), vol. 9868, pp. 48–57. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44636-3_5
Chapter Google Scholar
Castrillón-Santana, M., Lorenzo-Navarro, J., Ramón-Balmaseda, E.: Multi-scale score level fusion of local descriptors for gender classification in the wild. Multimed. Tools Appl. (2016, in press)
Google Scholar
Cong, D.N.T., Khoudour, L., Achard, C., Meurie, C., Lezoray, O.: People re-identification by spectral classification of silhouettes. Sig. Process. 90(8), 2362–2374 (2010). Special Section on Processing and Analysis of High-Dimensional Masses of Image and Signal Data
Article MATH Google Scholar

Download references

Acknowledgement

This work is partially supported by Government of Spain through TIN2015-64395-R and by the Ministerio de Economía y Competitividad, Government of Spain and FEDER funds of the European Union through TIN2016-78919-R (MINECO/FEDER).

Author information

Authors and Affiliations

Instituto Universitario SIANI, Universidad de las Palmas de Gran Canaria, 35017, Las Palmas, Spain
Pedro A. Marín-Reyes, Javier Lorenzo-Navarro & Modesto Castrillón-Santana
Departamento de Ingeniería Informática y de Sistemas, Universidad de la Laguna, 38271, Santa Cruz de Tenerife, Spain
Elena Sánchez-Nielsen

Authors

Pedro A. Marín-Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Javier Lorenzo-Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Modesto Castrillón-Santana
View author publications
You can also search for this author in PubMed Google Scholar
Elena Sánchez-Nielsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro A. Marín-Reyes .

Editor information

Editors and Affiliations

University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Roberto Moreno-Díaz
Johannes Kepler University Linz, Linz, Austria
Franz Pichler
University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Alexis Quesada-Arencibia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marín-Reyes, P.A., Lorenzo-Navarro, J., Castrillón-Santana, M., Sánchez-Nielsen, E. (2018). Who is Really Talking? A Visual-Based Speaker Diarization Strategy. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2017. EUROCAST 2017. Lecture Notes in Computer Science(), vol 10672. Springer, Cham. https://doi.org/10.1007/978-3-319-74727-9_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-74727-9_38
Published: 26 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74726-2
Online ISBN: 978-3-319-74727-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Who is Really Talking? A Visual-Based Speaker Diarization Strategy

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Character-Aware Audio-Visual Subtitling in Context

Multimodal speaker clustering in full length movies

Shot Classification and Keyframe Detection for Vision Based Speakers Diarization in Parliamentary Debates

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Who is Really Talking? A Visual-Based Speaker Diarization Strategy

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Character-Aware Audio-Visual Subtitling in Context

Multimodal speaker clustering in full length movies

Shot Classification and Keyframe Detection for Vision Based Speakers Diarization in Parliamentary Debates

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation