Li et al., 2017 - Google Patents
Deep neural network‐based linear predictive parameter estimations for speech enhancementLi et al., 2017
View PDF- Document ID
- 15318799945069053548
- Author
- Li Y
- Kang S
- Publication year
- Publication venue
- IET Signal Processing
External Links
Snippet
This study presents a speech enhancement technique to improve noise corrupted speech via deep neural network (DNN)‐based linear predictive (LP) parameter estimations of speech and noise. With regard to the LP coefficient estimation, an enhanced estimation …
- 230000001537 neural 0 title abstract description 8
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huang et al. | Joint optimization of masks and deep recurrent neural networks for monaural source separation | |
Cutajar et al. | Comparative study of automatic speech recognition techniques | |
Li et al. | Deep neural network‐based linear predictive parameter estimations for speech enhancement | |
Luo et al. | Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform | |
Coto-Jiménez et al. | Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks | |
KP et al. | ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score | |
Li et al. | Artificial bandwidth extension using deep neural network‐based spectral envelope estimation and enhanced excitation estimation | |
Li et al. | A conditional generative model for speech enhancement | |
Zöhrer et al. | Representation learning for single-channel source separation and bandwidth extension | |
Li et al. | Whisper‐to‐speech conversion using restricted Boltzmann machine arrays | |
Dwijayanti et al. | Enhancement of speech dynamics for voice activity detection using DNN | |
Jagadeeshwar et al. | ASERNet: Automatic speech emotion recognition system using MFCC-based LPC approach with deep learning CNN | |
Cheng et al. | DNN-based speech enhancement with self-attention on feature dimension | |
Hagen et al. | Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR | |
Djeffal et al. | Noise-robust speech recognition: A comparative analysis of lstm and cnn approaches | |
Goh et al. | Robust speech recognition system using bidirectional Kalman filter | |
Wang et al. | Towards efficient recurrent architectures: a deep LSTM neural network applied to speech enhancement and recognition | |
Gupta et al. | High‐band feature extraction for artificial bandwidth extension using deep neural network and H∞ optimisation | |
Park et al. | Unsupervised speech domain adaptation based on disentangled representation learning for robust speech recognition | |
Biswas et al. | Admissible wavelet packet sub‐band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition | |
Silva et al. | Intelligent genetic fuzzy inference system for speech recognition: An approach from low order feature based on discrete cosine transform | |
Coto-Jiménez | Robustness of LSTM neural networks for the enhancement of spectral parameters in noisy speech signals | |
Parvathala et al. | Neural comb filtering using sliding window attention network for speech enhancement | |
Wang et al. | Multi‐stage attention network for monaural speech enhancement | |
Nandi et al. | Implicit excitation source features for robust language identification |