Ganapathy et al., 2012 - Google Patents
Temporal resolution analysis in frequency domain linear predictionGanapathy et al., 2012
View HTML- Document ID
- 8900086826768209790
- Author
- Ganapathy S
- Hermansky H
- Publication year
- Publication venue
- The Journal of the Acoustical Society of America
External Links
Snippet
Frequency domain linear prediction (FDLP) is a technique for auto-regressive modeling of Hilbert envelopes. In this letter, the resolution properties of the FDLP model are investigated using synthetic signals with impulses immersed in noise. The effect of various factors are …
- 238000004458 analytical method 0 title abstract description 20
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Alku et al. | Formant frequency estimation of high-pitched vowels using weighted linear prediction | |
Zahorian et al. | A spectral/temporal method for robust fundamental frequency tracking | |
Das et al. | Exploring different attributes of source information for speaker verification with limited test data | |
Ghosh et al. | A generalized smoothness criterion for acoustic-to-articulatory inversion | |
Schädler et al. | Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition | |
Dissen et al. | Formant estimation and tracking: A deep learning approach | |
Sheikhan et al. | Using DTW neural–based MFCC warping to improve emotional speech recognition | |
Mittal et al. | Study of characteristics of aperiodicity in Noh voices | |
Ramakrishnan et al. | Voice source characterization using pitch synchronous discrete cosine transform for speaker identification | |
Meer | Automatic alignment for New Englishes: Applying state-of-the-art aligners to Trinidadian English | |
Hermus et al. | Perceptual audio modeling with exponentially damped sinusoids | |
Williamson et al. | Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality | |
Haridas et al. | A novel approach to improve the speech intelligibility using fractional delta-amplitude modulation spectrogram | |
Guglani et al. | Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit | |
Hoang et al. | Blind phone segmentation based on spectral change detection using Legendre polynomial approximation | |
Priyadarshani et al. | Dynamic time warping based speech recognition for isolated Sinhala words | |
Meyer et al. | Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes | |
Ganapathy et al. | Modulation frequency features for phoneme recognition in noisy speech | |
Ganapathy et al. | Temporal envelope compensation for robust phoneme recognition using modulation spectrum | |
Prathosh et al. | Estimation of voice-onset time in continuous speech using temporal measures | |
Kadiri et al. | Mel-frequency cepstral coefficients derived using the zero-time windowing spectrum for classification of phonation types in singing | |
Gowda et al. | Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation | |
Mesgarani et al. | Toward optimizing stream fusion in multistream recognition of speech | |
Jokinen et al. | Estimating the spectral tilt of the glottal source from telephone speech using a deep neural network | |
Ganapathy et al. | Temporal resolution analysis in frequency domain linear prediction |