[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Multistage speaker diarization of broadcast news

Published: 01 September 2006 Publication History

Abstract

This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system

Cited By

View all
  • (2024)Spoken Language Change Detection Inspired by Speaker Change DetectionCircuits, Systems, and Signal Processing10.1007/s00034-024-02743-w43:10(6373-6398)Online publication date: 1-Oct-2024
  • (2020)Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering AlgorithmCircuits, Systems, and Signal Processing10.1007/s00034-020-01357-239:8(4094-4109)Online publication date: 1-Aug-2020
  • (2019)Unsupervised adaptation of PLDA models for broadcast diarizationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-019-0167-72019:1Online publication date: 27-Dec-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing  Volume 14, Issue 5
September 2006
392 pages

Publisher

IEEE Press

Publication History

Published: 01 September 2006

Author Tags

  1. Bayesian information criterion (BIC) clustering
  2. speaker diarization
  3. speaker identification (SID)
  4. speaker segmentation and clustering

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Spoken Language Change Detection Inspired by Speaker Change DetectionCircuits, Systems, and Signal Processing10.1007/s00034-024-02743-w43:10(6373-6398)Online publication date: 1-Oct-2024
  • (2020)Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering AlgorithmCircuits, Systems, and Signal Processing10.1007/s00034-020-01357-239:8(4094-4109)Online publication date: 1-Aug-2020
  • (2019)Unsupervised adaptation of PLDA models for broadcast diarizationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-019-0167-72019:1Online publication date: 27-Dec-2019
  • (2019)Improving speech embedding using crossmodal transfer learning with audio-visual dataMultimedia Tools and Applications10.1007/s11042-018-6992-378:11(15681-15704)Online publication date: 1-Jun-2019
  • (2019)Speech and language processing for assessing child–adult interaction based on diarization and locationInternational Journal of Speech Technology10.1007/s10772-019-09590-022:3(697-709)Online publication date: 1-Sep-2019
  • (2018)An Adaptive Method for Cross-Recording Speaker DiarizationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.284402526:10(1821-1832)Online publication date: 1-Oct-2018
  • (2017)Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast newsEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-017-0117-12017:1(1-15)Online publication date: 1-Dec-2017
  • (2017)A domain adaptation approach to improve speaker turn embedding using face representationProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3136800(411-415)Online publication date: 3-Nov-2017
  • (2017)Speaker diarization: A perspective on challenges and opportunities from theory to practice2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7953098(4950-4954)Online publication date: 5-Mar-2017
  • (2017)Detection of anomaly acoustic scenes based on a temporal dissimilarity model2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952181(376-380)Online publication date: 5-Mar-2017
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media