Liu et al., 2014 - Google Patents
An iterative framework for unsupervised learning in the plda based speaker verificationLiu et al., 2014
View PDF- Document ID
- 6440993109566674728
- Author
- Liu W
- Yu Z
- Li M
- Publication year
- Publication venue
- The 9th International Symposium on Chinese Spoken Language Processing
External Links
Snippet
We present an iterative and unsupervised learning approach for the speaker verification task. In conventional speaker verification, Probabilistic Linear Discriminant Analysis (PLDA) has been widely used as a supervised backend. However, PLDA requires fully labeled …
- 238000004458 analytical method 0 abstract description 5
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Richardson et al. | A unified deep neural network for speaker and language recognition | |
Ali et al. | Automatic dialect detection in arabic broadcast speech | |
Singer et al. | The MITLL NIST LRE 2011 language recognition system | |
Bahari et al. | Accent recognition using i-vector, gaussian mean supervector and gaussian posterior probability supervector for spontaneous telephone speech | |
Ji et al. | Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017. | |
Sadjadi et al. | Speaker age estimation on conversational telephone speech using senone posterior based i-vectors | |
Wang et al. | Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection | |
Van Segbroeck et al. | Rapid language identification | |
Li et al. | Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features | |
Li et al. | Generalized i-vector representation with phonetic tokenizations and tandem features for both text independent and text dependent speaker verification | |
Franco et al. | Adaptive and discriminative modeling for improved mispronunciation detection | |
Zhang et al. | Dialect Recognition Based on Unsupervised Bottleneck Features. | |
Maghsoodi et al. | Speaker recognition with random digit strings using uncertainty normalized HMM-based i-vectors | |
Ma et al. | Spoken language recognition using ensemble classifiers | |
Gholamdokht Firooz et al. | Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results | |
Mackova et al. | A study of acoustic features for emotional speaker recognition in I-vector representation | |
Campbell et al. | The MIT-LL/IBM 2006 speaker recognition system: High-performance reduced-complexity recognition | |
Liu et al. | An iterative framework for unsupervised learning in the plda based speaker verification | |
Yamamoto et al. | Denoising autoencoder-based speaker feature restoration for utterances of short duration. | |
Safari et al. | From features to speaker vectors by means of restricted boltzmann machine adaptation | |
Van Segbroeck et al. | UBM fused total variability modeling for language identification. | |
Fernando et al. | Eigenfeatures: An alternative to Shifted Delta Coefficients for Language Identification | |
Suo et al. | Using SVM as back-end classifier for language identification | |
Diez et al. | On the complementarity of phone posterior probabilities for improved speaker recognition | |
Shahin et al. | One-class SVMs based pronunciation verification approach |