research-article

Multistage speaker diarization of broadcast news

Authors:

C. Barras,

Xuan Zhu,

S. Meignier,

J. -L. GauvainAuthors Info & Claims

IEEE Transactions on Audio, Speech, and Language Processing, Volume 14, Issue 5

Pages 1505 - 1512

https://doi.org/10.1109/TASL.2006.878261

Published: 01 September 2006 Publication History

Abstract

This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system

Cited By

View all

Mishra JPrasanna S(2024)Spoken Language Change Detection Inspired by Speaker Change DetectionCircuits, Systems, and Signal Processing10.1007/s00034-024-02743-w43:10(6373-6398)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s00034-024-02743-w
Dabbabi KHajji SCherif A(2020)Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering AlgorithmCircuits, Systems, and Signal Processing10.1007/s00034-020-01357-239:8(4094-4109)Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.1007/s00034-020-01357-2
Viñals IOrtega AVillalba JMiguel ALleida E(2019)Unsupervised adaptation of PLDA models for broadcast diarizationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-019-0167-72019:1Online publication date: 27-Dec-2019
https://dl.acm.org/doi/10.1186/s13636-019-0167-7
Show More Cited By

Recommendations

Adaptive speaker diarization of broadcast news based on factor analysis

Subspace methods benefit the entire speaker diarization process.Additional measures have to be taken to suppress nuisance variability.Subspace methods pave the way for adaptive speaker segmentation and clustering. The introduction of factor analysis ...
An overview of automatic speaker diarization systems

Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background ...
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

Human-machine interaction in meetings requires the localization and identification of the speakers interacting with the system as well as the recognition of the words spoken. A seminal step toward this goal is the field of rich transcription research, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Audio, Speech, and Language Processing

IEEE Transactions on Audio, Speech, and Language Processing Volume 14, Issue 5

September 2006

392 pages

ISSN:1558-7916

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 September 2006

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Mishra JPrasanna S(2024)Spoken Language Change Detection Inspired by Speaker Change DetectionCircuits, Systems, and Signal Processing10.1007/s00034-024-02743-w43:10(6373-6398)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s00034-024-02743-w
Dabbabi KHajji SCherif A(2020)Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering AlgorithmCircuits, Systems, and Signal Processing10.1007/s00034-020-01357-239:8(4094-4109)Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.1007/s00034-020-01357-2
Viñals IOrtega AVillalba JMiguel ALleida E(2019)Unsupervised adaptation of PLDA models for broadcast diarizationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-019-0167-72019:1Online publication date: 27-Dec-2019
https://dl.acm.org/doi/10.1186/s13636-019-0167-7
Le NOdobez J(2019)Improving speech embedding using crossmodal transfer learning with audio-visual dataMultimedia Tools and Applications10.1007/s11042-018-6992-378:11(15681-15704)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s11042-018-6992-3
Hansen JNajafian MLileikyte RIrvin DRous B(2019)Speech and language processing for assessing child–adult interaction based on diarization and locationInternational Journal of Speech Technology10.1007/s10772-019-09590-022:3(697-709)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1007/s10772-019-09590-0
Le Lan GCharlet DLarcher AMeignier S(2018)An Adaptive Method for Cross-Recording Speaker DiarizationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.284402526:10(1821-1832)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1109/TASLP.2018.2844025
Dabbabi KHajji SCherif A(2017)Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast newsEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-017-0117-12017:1(1-15)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1186/s13636-017-0117-1
Le NOdobez JLank EVinciarelli AHoggan ESubramanian SBrewster S(2017)A domain adaptation approach to improve speaker turn embedding using face representationProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3136800(411-415)Online publication date: 3-Nov-2017
https://dl.acm.org/doi/10.1145/3136755.3136800
Church KZhu WVopicka JPelecanos JDimitriadis DFousek P(2017)Speaker diarization: A perspective on challenges and opportunities from theory to practice2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7953098(4950-4954)Online publication date: 5-Mar-2017
https://dl.acm.org/doi/10.1109/ICASSP.2017.7953098
Komatsu TKondo R(2017)Detection of anomaly acoustic scenes based on a temporal dissimilarity model2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952181(376-380)Online publication date: 5-Mar-2017
https://dl.acm.org/doi/10.1109/ICASSP.2017.7952181
Show More Cited By

Abstract

Cited By

Recommendations

Adaptive speaker diarization of broadcast news based on factor analysis

An overview of automatic speaker diarization systems

Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations