[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Mamun et al., 2023 - Google Patents

CFTNet: Complex-valued frequency transformation network for speech enhancement

Mamun et al., 2023

View PDF
Document ID
1455954277290340212
Author
Mamun N
Hansen J
Publication year
Publication venue
Proc. INTERSPEECH 2023

External Links

Snippet

It is widely known that the presence of multi-speaker babble noise greatly degrades speech intelligibility. However, suppressing noise without creating artifacts in human speech is challenging in environments with a low signal-to-noise ratio (SNR), and even more so if …
Continue reading at www.researchgate.net (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Similar Documents

Publication Publication Date Title
Li et al. Glance and gaze: A collaborative learning framework for single-channel speech enhancement
Hou et al. Audio-visual speech enhancement using multimodal deep convolutional neural networks
Krishna et al. Speech synthesis using EEG
Su et al. Bandwidth extension is all you need
US10957303B2 (en) Training apparatus, speech synthesis system, and speech synthesis method
CN111081268A (en) Phase-correlated shared deep convolutional neural network speech enhancement method
Venkataramani et al. Adaptive front-ends for end-to-end source separation
Koizumi et al. SpecGrad: Diffusion probabilistic model based neural vocoder with adaptive noise spectral shaping
Pascual et al. Time-domain speech enhancement using generative adversarial networks
Li et al. Real-time monaural speech enhancement with short-time discrete cosine transform
US9484044B1 (en) Voice enhancement and/or speech features extraction on noisy audio signals using successively refined transforms
Geng et al. End-to-end speech enhancement based on discrete cosine transform
Koizumi et al. WaveFit: An iterative and non-autoregressive neural vocoder based on fixed-point iteration
Adiga et al. Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN.
Li et al. Filtering and refining: A collaborative-style framework for single-channel speech enhancement
Hou et al. Audio-visual speech enhancement based on multimodal deep convolutional neural network
Su et al. Perceptually-motivated environment-specific speech enhancement
Yu et al. Reconstructing speech from real-time articulatory MRI using neural vocoders
CN111326170A (en) Method and device for converting ear voice into normal voice by combining time-frequency domain expansion convolution
CN108198566A (en) Information processing method and device, electronic device and storage medium
Hamsa et al. Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG
Schröter et al. Low latency speech enhancement for hearing aids using deep filtering
Mamun et al. CFTNet: Complex-valued frequency transformation network for speech enhancement
Ding et al. Ultraspeech: Speech enhancement by interaction between ultrasound and speech
Chen et al. CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application