Abstract
In this paper we describe the ICSI-SRI entry in the Rich Transcription 2005 Spring Meeting Recognition Evaluation. The current system is based on the ICSI-SRI clustering system for Broadcast News (BN), with extra modules to process the different meetings tasks in which we participated. Our base system uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to determine when to stop merging clusters and to decide which pairs of clusters to merge. This approach does not require any pre-trained models, thus increasing robustness and simplifying the port from BN to the meetings domain. For the meetings domain, we have added several features to our baseline clustering system, including a “purification” module that tries to keep the clusters acoustically homogeneous throughout the clustering process, and a delay&sum beamforming algorithm which enhances signal quality for the multiple distant microphones (MDM) sub-task. In post-evaluation work we further improved the delay&sum algorithm, experimented with a new speech/non-speech detector and proposed a new system for the lecture room environment.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ajmera, J., Bourlard, H., Lapidot, I.: Improved unknown-multiple speaker clustering using HMM. IDIAP, Tech. Rep. (2002)
Ajmera, J., Bourlard, H., Lapidot, I., McCowan, I.: Unknown-multiple speaker clustering using HMM. In: ICSLP 2002, Denver, Colorado, USA (September 2002)
Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: ASRU 2003, US Virgin Islands, USA (December 2003)
Wooters, C., Fung, J., Peskin, B., Anguera, X.: Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system. In: Rich Transcription Workshop, New Jersey, USA (2004)
Shaobing Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA (February 1998)
Flanagan, J., Johnson, J., Kahn, R., Elko, G.: Computer-steered microphone arrays for sound transduction in large rooms. Journal of the Acoustic Society of America 78, 1508–1518 (November 1994)
Brandstein, M.S., Silverman, H.F.: A robust method for speech signal timedelay estimation in reverberant rooms. In: ICASSP 1997, Munich, Germany (1997)
Hirsch, H.-G.: HMM adaptation for applications in telecommunication. Speech Communication 34, 127–139 (2001)
Li, Q., Tsai, A.: A matched filter approach to endpoint detection for robust speaker verification. In: IEEE Workshop on Automatic Identification Advanced Technologies, New Jersey, USA (October 1999)
NIST speech tools and APIs, Available at, http://www.nist.gov/speech/tools/index.htm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Anguera, X., Wooters, C., Peskin, B., Aguiló, M. (2006). Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_34
Download citation
DOI: https://doi.org/10.1007/11677482_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32549-9
Online ISBN: 978-3-540-32550-5
eBook Packages: Computer ScienceComputer Science (R0)