Computer Science > Sound

arXiv:2401.15993 (cs)

[Submitted on 29 Jan 2024]

Title:Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings

Authors:He Zhao, Hangting Chen, Jianwei Yu, Yuehai Wang

Abstract:Target speaker extraction (TSE) aims to extract the target speaker's voice from the input mixture. Previous studies have concentrated on high-overlapping scenarios. However, real-world applications usually meet more complex scenarios like variable speaker overlapping and target speaker absence. In this paper, we introduces a framework to perform continuous TSE (C-TSE), comprising a target speaker voice activation detection (TSVAD) and a TSE model. This framework significantly improves TSE performance on similar speakers and enhances personalization, which is lacking in traditional diarization methods. In detail, unlike conventional TSVAD deployed to refine the diarization results, the proposed Attention-target speaker voice activation detection (A-TSVAD) directly generates timestamps of the target speaker. We also explore some different integration methods of A-TSVAD and TSE by comparing the cascaded and parallel methods. The framework's effectiveness is assessed using a range of metrics, including diarization and enhancement metrics. Our experiments demonstrate that A-TSVAD outperforms conventional methods in reducing diarization errors. Furthermore, the integration of A-TSVAD and TSE in a sequential cascaded manner further enhances extraction accuracy.

Comments:	8 pages, 6 figures
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2401.15993 [cs.SD]
	(or arXiv:2401.15993v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2401.15993

Submission history

From: He Zhao [view email]
[v1] Mon, 29 Jan 2024 09:23:26 UTC (704 KB)

Computer Science > Sound

Title:Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators