Mamun et al., 2023 - Google Patents

CFTNet: Complex-valued frequency transformation network for speech enhancement

Mamun et al., 2023

Document ID: 1455954277290340212
Author: Mamun N; Hansen J
Publication year: 2023
Publication venue: Proc. INTERSPEECH 2023

External Links

Cited by

Snippet

It is widely known that the presence of multi-speaker babble noise greatly degrades speech intelligibility. However, suppressing noise without creating artifacts in human speech is challenging in environments with a low signal-to-noise ratio (SNR), and even more so if …

Continue reading at www.researchgate.net (PDF) (other versions)

230000009466 transformation 0 title abstract description 16

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Similar Documents

Publication	Publication Date	Title
Li et al.	2022	Glance and gaze: A collaborative learning framework for single-channel speech enhancement
Hou et al.	2018	Audio-visual speech enhancement using multimodal deep convolutional neural networks
Krishna et al.	2020	Speech synthesis using EEG
Su et al.	2021	Bandwidth extension is all you need
US10957303B2 (en)	2021-03-23	Training apparatus, speech synthesis system, and speech synthesis method
CN111081268A (en)	2020-04-28	Phase-correlated shared deep convolutional neural network speech enhancement method
Venkataramani et al.	2017	Adaptive front-ends for end-to-end source separation
Koizumi et al.	2022	SpecGrad: Diffusion probabilistic model based neural vocoder with adaptive noise spectral shaping
Pascual et al.	2019	Time-domain speech enhancement using generative adversarial networks
Li et al.	2021	Real-time monaural speech enhancement with short-time discrete cosine transform
US9484044B1 (en)	2016-11-01	Voice enhancement and/or speech features extraction on noisy audio signals using successively refined transforms
Geng et al.	2020	End-to-end speech enhancement based on discrete cosine transform
Koizumi et al.	2023	WaveFit: An iterative and non-autoregressive neural vocoder based on fixed-point iteration
Adiga et al.	2019	Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN.
Li et al.	2022	Filtering and refining: A collaborative-style framework for single-channel speech enhancement
Hou et al.	2017	Audio-visual speech enhancement based on multimodal deep convolutional neural network
Su et al.	2019	Perceptually-motivated environment-specific speech enhancement
Yu et al.	2021	Reconstructing speech from real-time articulatory MRI using neural vocoders
CN111326170A (en)	2020-06-23	Method and device for converting ear voice into normal voice by combining time-frequency domain expansion convolution
CN108198566A (en)	2018-06-22	Information processing method and device, electronic device and storage medium
Hamsa et al.	2023	Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG
Schröter et al.	2022	Low latency speech enhancement for hearing aids using deep filtering
Mamun et al.	2023	CFTNet: Complex-valued frequency transformation network for speech enhancement
Ding et al.	2022	Ultraspeech: Speech enhancement by interaction between ultrasound and speech
Chen et al.	2022	CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application