Hermansky et al., 1993 - Google Patents
Recognition of speech in additive and convolutional noise based on RASTA spectral processingHermansky et al., 1993
View PDF- Document ID
- 13882774964746088672
- Author
- Hermansky H
- Morgan N
- Hirsch H
- Publication year
- Publication venue
- 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing
External Links
Snippet
RASTA (relative spectral) processing is studied in a spectral domain which is linear-like for small spectral values and logarithmic-like for large spectral values. Experiments with a recognizer trained on clean speech and test data degraded by both convolutional and …
- 230000003595 spectral 0 title abstract description 35
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0202—Applications
- G10L21/0205—Enhancement of intelligibility of clean or coded speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hermansky et al. | Recognition of speech in additive and convolutional noise based on RASTA spectral processing | |
Hermansky et al. | RASTA processing of speech | |
US6289309B1 (en) | Noise spectrum tracking for speech enhancement | |
Xiao et al. | Normalization of the speech modulation spectra for robust speech recognition | |
US20090163168A1 (en) | Efficient initialization of iterative parameter estimation | |
Chen et al. | Fundamentals of noise reduction | |
Wan et al. | Networks for speech enhancement | |
Shao et al. | A generalized time–frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system | |
US5963899A (en) | Method and system for region based filtering of speech | |
Nandkumar et al. | Dual-channel iterative speech enhancement with constraints on an auditory-based spectrum | |
WO2006114101A1 (en) | Detection of speech present in a noisy signal and speech enhancement making use thereof | |
Alam et al. | Robust feature extraction for speech recognition by enhancing auditory spectrum | |
Taşmaz et al. | Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE–STSA estimation in various noise environments | |
Flynn et al. | Combined speech enhancement and auditory modelling for robust distributed speech recognition | |
Kermorvant | A comparison of noise reduction techniques for robust speech recognition | |
Milner et al. | Comparison of some noise-compensation methods for speech recognition in adverse environments | |
Shao et al. | A versatile speech enhancement system based on perceptual wavelet denoising | |
Maganti et al. | A perceptual masking approach for noise robust speech recognition | |
Avendano et al. | On the effects of short-term spectrum smoothing in channel normalization | |
Haton | Automatic recognition of noisy speech | |
Chen et al. | Robust MFCCs derived from differentiated power spectrum | |
KR100198713B1 (en) | Noise processing method using nomalization of spectral magnitude and cepstral transformation in speech recognition apparatus | |
Vali et al. | Robust speech recognition by modifying clean and telephone feature vectors using bidirectional neural network. | |
Dionelis | On single-channel speech enhancement and on non-linear modulation-domain Kalman filtering | |
Zhu et al. | Using noise reduction and spectral emphasis techniques to improve ASR performance in noisy conditions |