Computer Science > Sound

arXiv:2310.19602 (cs)

[Submitted on 30 Oct 2023]

Title:DCHT: Deep Complex Hybrid Transformer for Speech Enhancement

Authors:Jialu Li, Junhui Li, Pu Wang, Youshan Zhang

View PDF

Abstract:Most of the current deep learning-based approaches for speech enhancement only operate in the spectrogram or waveform domain. Although a cross-domain transformer combining waveform- and spectrogram-domain inputs has been proposed, its performance can be further improved. In this paper, we present a novel deep complex hybrid transformer that integrates both spectrogram and waveform domains approaches to improve the performance of speech enhancement. The proposed model consists of two parts: a complex Swin-Unet in the spectrogram domain and a dual-path transformer network (DPTnet) in the waveform domain. We first construct a complex Swin-Unet network in the spectrogram domain and perform speech enhancement in the complex audio spectrum. We then introduce improved DPT by adding memory-compressed attention. Our model is capable of learning multi-domain features to reduce existing noise on different domains in a complementary way. The experimental results on the BirdSoundsDenoising dataset and the VCTK+DEMAND dataset indicate that our method can achieve better performance compared to state-of-the-art methods.

Comments:	IEEE DDP conference
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2310.19602 [cs.SD]
	(or arXiv:2310.19602v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2310.19602

Submission history

From: Youshan Zhang [view email]
[v1] Mon, 30 Oct 2023 14:58:11 UTC (1,084 KB)

Computer Science > Sound

Title:DCHT: Deep Complex Hybrid Transformer for Speech Enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:DCHT: Deep Complex Hybrid Transformer for Speech Enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators