Computer Science > Sound

arXiv:2410.05920 (cs)

[Submitted on 8 Oct 2024 (v1), last revised 31 Oct 2024 (this version, v3)]

Title:FINALLY: fast and universal speech enhancement with studio-like quality

Authors:Nicholas Babaev, Kirill Tamogashev, Azat Saginbaev, Ivan Shchekotov, Hanbin Bae, Hosang Sung, WonJun Lee, Hoon-Young Cho, Pavel Andreev

View PDF HTML (experimental)

Abstract:In this paper, we address the challenge of speech enhancement in real-world recordings, which often contain various forms of distortion, such as background noise, reverberation, and microphone artifacts. We revisit the use of Generative Adversarial Networks (GANs) for speech enhancement and theoretically show that GANs are naturally inclined to seek the point of maximum density within the conditional clean speech distribution, which, as we argue, is essential for the speech enhancement task. We study various feature extractors for perceptual loss to facilitate the stability of adversarial training, developing a methodology for probing the structure of the feature space. This leads us to integrate WavLM-based perceptual loss into MS-STFT adversarial training pipeline, creating an effective and stable training procedure for the speech enhancement model. The resulting speech enhancement model, which we refer to as FINALLY, builds upon the HiFi++ architecture, augmented with a WavLM encoder and a novel training pipeline. Empirical results on various datasets confirm our model's ability to produce clear, high-quality speech at 48 kHz, achieving state-of-the-art performance in the field of speech enhancement. Demo page: this https URL

Comments:	Accepted to NeurIPS 2024
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2410.05920 [cs.SD]
	(or arXiv:2410.05920v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2410.05920

Submission history

From: Pavel Andreev [view email]
[v1] Tue, 8 Oct 2024 11:16:03 UTC (1,815 KB)
[v2] Sat, 26 Oct 2024 19:15:39 UTC (1,815 KB)
[v3] Thu, 31 Oct 2024 08:47:01 UTC (1,815 KB)

Computer Science > Sound

Title:FINALLY: fast and universal speech enhancement with studio-like quality

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:FINALLY: fast and universal speech enhancement with studio-like quality

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators