Li et al., 2017 - Google Patents

Deep neural network‐based linear predictive parameter estimations for speech enhancement

Li et al., 2017

Document ID: 15318799945069053548
Author: Li Y; Kang S
Publication year: 2017
Publication venue: IET Signal Processing

External Links

Cited by

Snippet

This study presents a speech enhancement technique to improve noise corrupted speech via deep neural network (DNN)‐based linear predictive (LP) parameter estimations of speech and noise. With regard to the LP coefficient estimation, an enhanced estimation …

Continue reading at ietresearch.onlinelibrary.wiley.com (PDF) (other versions)

230000001537 neural 0 title abstract description 8

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run

Similar Documents

Publication	Publication Date	Title
Huang et al.	2015	Joint optimization of masks and deep recurrent neural networks for monaural source separation
Cutajar et al.	2013	Comparative study of automatic speech recognition techniques
Li et al.	2017	Deep neural network‐based linear predictive parameter estimations for speech enhancement
Luo et al.	2017	Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
Coto-Jiménez et al.	2016	Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks
KP et al.	2020	ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score
Li et al.	2016	Artificial bandwidth extension using deep neural network‐based spectral envelope estimation and enhanced excitation estimation
Li et al.	2018	A conditional generative model for speech enhancement
Zöhrer et al.	2015	Representation learning for single-channel source separation and bandwidth extension
Li et al.	2014	Whisper‐to‐speech conversion using restricted Boltzmann machine arrays
Dwijayanti et al.	2018	Enhancement of speech dynamics for voice activity detection using DNN
Jagadeeshwar et al.	2023	ASERNet: Automatic speech emotion recognition system using MFCC-based LPC approach with deep learning CNN
Cheng et al.	2020	DNN-based speech enhancement with self-attention on feature dimension
Hagen et al.	2005	Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR
Djeffal et al.	2023	Noise-robust speech recognition: A comparative analysis of lstm and cnn approaches
Goh et al.	2015	Robust speech recognition system using bidirectional Kalman filter
Wang et al.	2024	Towards efficient recurrent architectures: a deep LSTM neural network applied to speech enhancement and recognition
Gupta et al.	2020	High‐band feature extraction for artificial bandwidth extension using deep neural network and H∞ optimisation
Park et al.	2019	Unsupervised speech domain adaptation based on disentangled representation learning for robust speech recognition
Biswas et al.	2016	Admissible wavelet packet sub‐band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition
Silva et al.	2014	Intelligent genetic fuzzy inference system for speech recognition: An approach from low order feature based on discrete cosine transform
Coto-Jiménez	2018	Robustness of LSTM neural networks for the enhancement of spectral parameters in noisy speech signals
Parvathala et al.	2023	Neural comb filtering using sliding window attention network for speech enhancement
Wang et al.	2023	Multi‐stage attention network for monaural speech enhancement
Nandi et al.	2015	Implicit excitation source features for robust language identification