Ganapathy et al., 2012 - Google Patents
Temporal resolution analysis in frequency domain linear predictionGanapathy et al., 2012
View HTML- Document ID
- 8900086826768209790
- Author
- Ganapathy S
- Hermansky H
- Publication year
- Publication venue
- The Journal of the Acoustical Society of America
External Links
Snippet
Frequency domain linear prediction (FDLP) is a technique for auto-regressive modeling of Hilbert envelopes. In this letter, the resolution properties of the FDLP model are investigated using synthetic signals with impulses immersed in noise. The effect of various factors are …
- 238000004458 analytical method 0 title abstract description 20
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zahorian et al. | A spectral/temporal method for robust fundamental frequency tracking | |
Das et al. | Exploring different attributes of source information for speaker verification with limited test data | |
Schädler et al. | Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition | |
US20150302845A1 (en) | Estimation system of spectral envelopes and group delays for sound analysis and synthesis, and audio signal synthesis system | |
Sheikhan et al. | Using DTW neural–based MFCC warping to improve emotional speech recognition | |
Jiang et al. | Geometric methods for spectral analysis | |
Ganchev | Contemporary methods for speech parameterization | |
Tavakoli et al. | A spatial modeling approach for linguistic object data: Analyzing dialect sound variations across Great Britain | |
Williamson et al. | Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality | |
Peer et al. | Phase-aware deep speech enhancement: It's all about the frame length | |
Meyer et al. | Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes | |
Ganapathy et al. | Modulation frequency features for phoneme recognition in noisy speech | |
Priyadarshani et al. | Dynamic time warping based speech recognition for isolated Sinhala words | |
Ganapathy et al. | Temporal envelope compensation for robust phoneme recognition using modulation spectrum | |
Mesgarani et al. | Toward optimizing stream fusion in multistream recognition of speech | |
Kadiri et al. | Mel-frequency cepstral coefficients derived using the zero-time windowing spectrum for classification of phonation types in singing | |
Islam et al. | Supervised single channel speech enhancement based on stationary wavelet transforms and non-negative matrix factorization with concatenated framing process and subband smooth ratio mask | |
Gowda et al. | Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation | |
Yegnanarayana | Group delay spectrogram of speech signals without phase wrapping | |
Jokinen et al. | Estimating the spectral tilt of the glottal source from telephone speech using a deep neural network | |
Ganapathy et al. | Temporal resolution analysis in frequency domain linear prediction | |
Chi et al. | Spectro-temporal modulation energy based mask for robust speaker identification | |
CN117935789A (en) | Speech recognition method, system, equipment and storage medium | |
Hu et al. | Learnable spectral dimension compression mapping for full-band speech enhancement | |
Fulop | Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction |