[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Who is Really Talking? A Visual-Based Speaker Diarization Strategy

  • Conference paper
  • First Online:
Computer Aided Systems Theory – EUROCAST 2017 (EUROCAST 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10672))

Included in the following conference series:

Abstract

The speaker activity at the Canary Islands Parliament is recorded, and later manually annotated. This task can be modelled as a diarization problem, that is a way to automatically annotated who and when is speaking. In this paper, we propose the use of the visual cue to solve the diarization task. To perform this approach, it is mandatory to detect individuals, determine the one speaking, and extract features for matching. In order to test the performance of our proposal, we evaluate four different strategies based on the visual shot features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Videos available at http://www.parcan.es.

References

  1. Miró, X.A., Bozonnet, S., Evans, N.W.D., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)

    Article  Google Scholar 

  2. Barra-Chicote, R., Pardo, J.M., Ferreiros, J., Montero, J.M.: Speaker diarization based on intensity channel contribution. IEEE Trans. Audio Speech Lang. Process. 19(4), 754–761 (2011)

    Article  Google Scholar 

  3. Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)

    Article  Google Scholar 

  4. Ning, H., Liu, M., Tang, H., Huang, T.: A spectral clustering approach to speaker diarization. In: Proceedings of ICSLP (2006)

    Google Scholar 

  5. Lupu, E., Apatean, A., Arsinte, R.: Speaker diarization experiments for Romanian parliamentary speech. In: 2015 International Symposium on Signals, Circuits and Systems (ISSCS), pp. 1–4, July 2015

    Google Scholar 

  6. Meignier, S., Merlin, T.: Lium spkdiarization: an open source toolkit for diarization. In: CMU SPUD Workshop, Dallas (Texas, USA), mars 2010

    Google Scholar 

  7. Campr, P., Kunešová, M., Vaněk, J., Čech, J., Psutka, J.: Audio-video speaker diarization for unsupervised speaker and face model creation. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 465–472. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_56

    Google Scholar 

  8. Everingham, M., Sivic, J., Zisserman, A.: Taking the bite out of automated naming of characters in TV video. Image Vis. Comput. 27(5), 545–559 (2009)

    Article  Google Scholar 

  9. Sang, J., Xu, C.: Robust face-name graph matching for movie character identification. IEEE Trans. Multimed. 14(3), 586–596 (2012)

    Article  Google Scholar 

  10. Marín-Reyes, P.A., Lorenzo-Navarro, J., Castrillón-Santana, M., Sánchez-Nielsen, E.: Shot classification and keyframe detection for vision based speakers diarization in parliamentary debates. In: Luaces, O., Gámez, J.A., Barrenechea, E., Troncoso, A., Galar, M., Quintián, H., Corchado, E. (eds.) CAEPIA 2016. LNCS (LNAI), vol. 9868, pp. 48–57. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44636-3_5

    Chapter  Google Scholar 

  11. Castrillón-Santana, M., Lorenzo-Navarro, J., Ramón-Balmaseda, E.: Multi-scale score level fusion of local descriptors for gender classification in the wild. Multimed. Tools Appl. (2016, in press)

    Google Scholar 

  12. Cong, D.N.T., Khoudour, L., Achard, C., Meurie, C., Lezoray, O.: People re-identification by spectral classification of silhouettes. Sig. Process. 90(8), 2362–2374 (2010). Special Section on Processing and Analysis of High-Dimensional Masses of Image and Signal Data

    Article  MATH  Google Scholar 

Download references

Acknowledgement

This work is partially supported by Government of Spain through TIN2015-64395-R and by the Ministerio de Economía y Competitividad, Government of Spain and FEDER funds of the European Union through TIN2016-78919-R (MINECO/FEDER).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro A. Marín-Reyes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marín-Reyes, P.A., Lorenzo-Navarro, J., Castrillón-Santana, M., Sánchez-Nielsen, E. (2018). Who is Really Talking? A Visual-Based Speaker Diarization Strategy. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2017. EUROCAST 2017. Lecture Notes in Computer Science(), vol 10672. Springer, Cham. https://doi.org/10.1007/978-3-319-74727-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74727-9_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74726-2

  • Online ISBN: 978-3-319-74727-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics