Computer Science > Sound

arXiv:2212.05294 (cs)

[Submitted on 10 Dec 2022 (v1), last revised 13 Dec 2022 (this version, v2)]

Title:Variational Speech Waveform Compression to Catalyze Semantic Communications

Authors:Shengshi Yao, Zixuan Xiao, Sixian Wang, Jincheng Dai, Kai Niu, Ping Zhang

View PDF

Abstract:We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more accurately, giving rise to better compression performance. In particular, the speech signals are analyzed and synthesized by a pair of nonlinear transforms, yielding latent features. An entropy model with hyperprior is built to capture the probabilistic distribution of latent features, followed with quantization and entropy coding. The proposed waveform codec can be optimized flexibly towards arbitrary rate, and the other appealing feature is that it can be easily optimized for any differentiable loss function, including perceptual loss used in semantic communications. To further improve the fidelity, we incorporate residual coding to mitigate the degradation arising from quantization distortion at the latent space. Results indicate that achieving the same performance, the proposed method saves up to 27% coding rate than widely used adaptive multi-rate wideband (AMR-WB) codec as well as emerging neural waveform coding methods.

Subjects:	Sound (cs.SD); Information Theory (cs.IT); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2212.05294 [cs.SD]
	(or arXiv:2212.05294v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2212.05294

Submission history

From: Jincheng Dai [view email]
[v1] Sat, 10 Dec 2022 12:52:59 UTC (441 KB)
[v2] Tue, 13 Dec 2022 13:20:57 UTC (441 KB)

Computer Science > Sound

Title:Variational Speech Waveform Compression to Catalyze Semantic Communications

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Variational Speech Waveform Compression to Catalyze Semantic Communications

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators