Abstract
During online speaker diarization, a situation may occur where a single speaker is being represented by several different models. Such situation leads to worsened diarization results, because the diarization system considers every change of a model to be a change of speakers. In the article we describe a method for detecting this situation and propose several ways of solving it. Experiments show that the most suitable option is treating multiple GMMs as belonging to a single speaker, i.e. updating all of them with the same data every time one of them is assigned a new segment. In that case, there was a relative improvement in Diarization Error Rate of 30.69% in comparison with the baseline system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker Diarization: A Review of Recent Research. IEEE Transactions on Audio, Speech, and Language Processing 20, 356–370 (2012)
Campr, P., Kunešová, M., Vaněk, J., Čech, J., Psutka, J.: Audio-video speaker diarization for unsupervised speaker and face model creation. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 465–472. Springer, Heidelberg (2014)
Geiger, J., Wallhoff, F., Rigoll, G.: GMM-UBM based open-set online speaker diarization. In: Proc. Interspeech, pp. 2330–2333 (2010)
Markov, K., Nakamura, S.: Never-ending learning system for on-line speaker diarization. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2007, pp. 699–704 (2007)
Reynolds, D., Singer, E., Carlson, B., O’Leary G., McLaughlin, J., Zissman, M.: Blind clustering of speech utterances based on speaker and language characteristics. In: Proceedings of the 5th International Conference on Spoken Language Processing, vol. 7, pp. 3193–3196 (1998)
Sato, M., Ishii, S.: On-line EM algorithm for the Normalized Gaussian Network. Neural Computation 12, 407–432 (2000)
National Institute of Standards and Technology. http://www.itl.nist.gov
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kunešová, M., Radová, V. (2015). Ideas for Clustering of Similar Models of a Speaker in an Online Speaker Diarization System. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)