Hermansky et al., 1993 - Google Patents

Recognition of speech in additive and convolutional noise based on RASTA spectral processing

Hermansky et al., 1993

Document ID: 13882774964746088672
Author: Hermansky H; Morgan N; Hirsch H
Publication year: 1993
Publication venue: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing

External Links

Cited by

Snippet

RASTA (relative spectral) processing is studied in a spectral domain which is linear-like for small spectral values and logarithmic-like for large spectral values. Experiments with a recognizer trained on clean speech and test data degraded by both convolutional and …

Continue reading at ikspub.iks.rwth-aachen.de (PDF) (other versions)

230000003595 spectral 0 title abstract description 35

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0202—Applications
- G10L21/0205—Enhancement of intelligibility of clean or coded speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique using neural networks
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building

Similar Documents

Publication	Publication Date	Title
Hermansky et al.	1993	Recognition of speech in additive and convolutional noise based on RASTA spectral processing
Hermansky et al.	1994	RASTA processing of speech
US6289309B1 (en)	2001-09-11	Noise spectrum tracking for speech enhancement
Xiao et al.	2008	Normalization of the speech modulation spectra for robust speech recognition
US20090163168A1 (en)	2009-06-25	Efficient initialization of iterative parameter estimation
Chen et al.	2008	Fundamentals of noise reduction
Wan et al.	1999	Networks for speech enhancement
Shao et al.	2007	A generalized time–frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system
US5963899A (en)	1999-10-05	Method and system for region based filtering of speech
Nandkumar et al.	1995	Dual-channel iterative speech enhancement with constraints on an auditory-based spectrum
WO2006114101A1 (en)	2006-11-02	Detection of speech present in a noisy signal and speech enhancement making use thereof
Alam et al.	2012	Robust feature extraction for speech recognition by enhancing auditory spectrum
Taşmaz et al.	2008	Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE–STSA estimation in various noise environments
Flynn et al.	2008	Combined speech enhancement and auditory modelling for robust distributed speech recognition
Kermorvant	1999	A comparison of noise reduction techniques for robust speech recognition
Milner et al.	1994	Comparison of some noise-compensation methods for speech recognition in adverse environments
Shao et al.	2005	A versatile speech enhancement system based on perceptual wavelet denoising
Maganti et al.	2012	A perceptual masking approach for noise robust speech recognition
Avendano et al.	1997	On the effects of short-term spectrum smoothing in channel normalization
Haton	1995	Automatic recognition of noisy speech
Chen et al.	2001	Robust MFCCs derived from differentiated power spectrum
KR100198713B1 (en)	1999-06-15	Noise processing method using nomalization of spectral magnitude and cepstral transformation in speech recognition apparatus
Vali et al.	2006	Robust speech recognition by modifying clean and telephone feature vectors using bidirectional neural network.
Dionelis	2018	On single-channel speech enhancement and on non-linear modulation-domain Kalman filtering
Zhu et al.	2003	Using noise reduction and spectral emphasis techniques to improve ASR performance in noisy conditions