Ganapathy et al., 2012 - Google Patents

Temporal resolution analysis in frequency domain linear prediction

Ganapathy et al., 2012

Document ID: 8900086826768209790
Author: Ganapathy S; Hermansky H
Publication year: 2012
Publication venue: The Journal of the Acoustical Society of America

External Links

Cited by

Snippet

Frequency domain linear prediction (FDLP) is a technique for auto-regressive modeling of Hilbert envelopes. In this letter, the resolution properties of the FDLP model are investigated using synthetic signals with impulses immersed in noise. The effect of various factors are …

Continue reading at pubs.aip.org (HTML) (other versions)

238000004458 analytical method 0 title abstract description 20

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor

Similar Documents

Publication	Publication Date	Title
Alku et al.	2013	Formant frequency estimation of high-pitched vowels using weighted linear prediction
Zahorian et al.	2008	A spectral/temporal method for robust fundamental frequency tracking
Das et al.	2016	Exploring different attributes of source information for speaker verification with limited test data
Ghosh et al.	2010	A generalized smoothness criterion for acoustic-to-articulatory inversion
Schädler et al.	2015	Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition
Dissen et al.	2019	Formant estimation and tracking: A deep learning approach
Sheikhan et al.	2012	Using DTW neural–based MFCC warping to improve emotional speech recognition
Mittal et al.	2015	Study of characteristics of aperiodicity in Noh voices
Ramakrishnan et al.	2015	Voice source characterization using pitch synchronous discrete cosine transform for speaker identification
Meer	2020	Automatic alignment for New Englishes: Applying state-of-the-art aligners to Trinidadian English
Hermus et al.	2005	Perceptual audio modeling with exponentially damped sinusoids
Williamson et al.	2015	Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality
Haridas et al.	2018	A novel approach to improve the speech intelligibility using fractional delta-amplitude modulation spectrogram
Guglani et al.	2020	Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit
Hoang et al.	2015	Blind phone segmentation based on spectral change detection using Legendre polynomial approximation
Priyadarshani et al.	2012	Dynamic time warping based speech recognition for isolated Sinhala words
Meyer et al.	2011	Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes
Ganapathy et al.	2009	Modulation frequency features for phoneme recognition in noisy speech
Ganapathy et al.	2010	Temporal envelope compensation for robust phoneme recognition using modulation spectrum
Prathosh et al.	2014	Estimation of voice-onset time in continuous speech using temporal measures
Kadiri et al.	2019	Mel-frequency cepstral coefficients derived using the zero-time windowing spectrum for classification of phonation types in singing
Gowda et al.	2017	Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation
Mesgarani et al.	2011	Toward optimizing stream fusion in multistream recognition of speech
Jokinen et al.	2017	Estimating the spectral tilt of the glottal source from telephone speech using a deep neural network
Ganapathy et al.	2012	Temporal resolution analysis in frequency domain linear prediction