[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Shot Classification and Keyframe Detection for Vision Based Speakers Diarization in Parliamentary Debates

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (CAEPIA 2016)

Abstract

Automatic labelling of speakers is an essential task for speakers diarization in parliamentary debates given the huge amount of video data to annotate. In this paper, we address the speaker diarization problem as a visual speaker re-identification issue with a special emphasis on the analysis of different shot types. We propose two approaches that makes use of convolutional neural networks (CNN) and biometric traits for keyframe extraction. Experimental results have been evaluated with challenging real-world datasets from the Canary Islands Parliament, and contrasted with a similar approach that does not analyze the shot type. Results show that the use of CNN for shot classification and biometric traits help to improve the performance of the re-identification outcomes in an average rate of 9.8 %.

This work has been partially supported by the Spanish Government under the projects TIN2011-24598 and TIN2015-64395-R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Barra-Chicote, R., Pardo, J.M., Ferreiros, J., Montero, J.M.: Speaker diarization based on intensity channel contribution. IEEE Trans. Audio Speech Lang. Process. 19(4), 754–761 (2011)

    Article  Google Scholar 

  2. Castrillón, M., Déniz, O., Hernández, D., Lorenzo, J.: A comparison of face and facial feature detectors based on the violajones general object detection framework. Mach. Vis. Appl. 22(3), 481–494 (2011)

    Google Scholar 

  3. Cong, D.-N.T., Khoudour, L., Achard, C., Meurie, C., Lezoray, O.: People re-identification by spectral classification of silhouettes. Sig. Process. 90(8), 2362–2374 (2010). Special Section on Processing and Analysis of High-Dimensional Masses of Image and Signal Data

    Article  MATH  Google Scholar 

  4. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, London (2006)

    MATH  Google Scholar 

  5. Garau, G., Bourlard, H.: Using audio and visual cues for speaker diarisation initialisation. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4942–4945 (2010)

    Google Scholar 

  6. Kapsouras, I., Tefas, A., Nikolaidis, N., Peeters, G., Benaroya, L., Pitas. I.: Multimodal speaker clustering in full length movies. Multimed. Tools Appl. 1–20 (2016). doi:10.1007/s11042-015-3181-5

    Google Scholar 

  7. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1962–1977 (2011)

    Article  Google Scholar 

  8. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  9. Noulas, A., Englebienne, G., Krose, B.J.A.: Multimodal speaker diarization. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 79–93 (2012)

    Article  Google Scholar 

  10. Sánchez-Nielsen, E., Chávez-Gutiérrez, F., Lorenzo-Navarro, J., Castrillón-Santana, M.: A multimedia system to produce and deliver video fragments on demand on parliamentary websites. Multimed. Tools Appl. 1–27 (2016). doi:10.1007/s11042-016-3306-5

    Google Scholar 

  11. Sao, N., Mishra, R.: A survey based on video shot boundary detection techniques. Int. J. Adv. Res. Comput. Commun. Eng. (IJARCCE) 3(4) (2014)

    Google Scholar 

  12. Sarafianos, N., Giannakopoulos, T., Petridis, S.: Audio-visual speaker diarization using fisher linear semi-discriminant analysis. Multimed. Tools Appl. 75(1), 115–130 (2016)

    Article  Google Scholar 

  13. Sujatha, C., Mudenagudi, U.: A study on keyframe extraction methods for video summary. In: 2011 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 73–77 (2011)

    Google Scholar 

  14. Teixeira, T., Dublon, G., Savvides, A.: A survey of human-sensing: methods for detecting presence, count, location, track, and identity. ACM Comput. Surv. 5, 1–77 (2010)

    Google Scholar 

  15. Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)

    Article  Google Scholar 

  16. Vallet, F., Essid, S., Carrive, J.: A multimodal approach to speaker diarization on TV talk-shows. IEEE Trans. Multimed. 15(3), 509–520 (2013)

    Article  Google Scholar 

  17. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 151–173 (2004)

    Article  Google Scholar 

  18. Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: a literature survey. Assoc. Comput. Mach. 35(4), 399–458 (2003). http://doi. acm.org/10.1145/954339.954342

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro A. Marín-Reyes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Marín-Reyes, P.A., Lorenzo-Navarro, J., Castrillón-Santana, M., Sánchez-Nielsen, E. (2016). Shot Classification and Keyframe Detection for Vision Based Speakers Diarization in Parliamentary Debates. In: Luaces , O., et al. Advances in Artificial Intelligence. CAEPIA 2016. Lecture Notes in Computer Science(), vol 9868. Springer, Cham. https://doi.org/10.1007/978-3-319-44636-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44636-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44635-6

  • Online ISBN: 978-3-319-44636-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics