Liu et al., 2014 - Google Patents

An iterative framework for unsupervised learning in the plda based speaker verification

Liu et al., 2014

Document ID: 6440993109566674728
Author: Liu W; Yu Z; Li M
Publication year: 2014
Publication venue: The 9th International Symposium on Chinese Spoken Language Processing

External Links

Cited by

Snippet

We present an iterative and unsupervised learning approach for the speaker verification task. In conventional speaker verification, Probabilistic Linear Discriminant Analysis (PLDA) has been widely used as a supervised backend. However, PLDA requires fully labeled …

Continue reading at citeseerx.ist.psu.edu (PDF) (other versions)

238000004458 analytical method 0 abstract description 5

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems

Similar Documents

Publication	Publication Date	Title
Richardson et al.	2015	A unified deep neural network for speaker and language recognition
Ali et al.	2015	Automatic dialect detection in arabic broadcast speech
Singer et al.	2012	The MITLL NIST LRE 2011 language recognition system
Bahari et al.	2013	Accent recognition using i-vector, gaussian mean supervector and gaussian posterior probability supervector for spontaneous telephone speech
Ji et al.	2017	Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017.
Sadjadi et al.	2016	Speaker age estimation on conversational telephone speech using senone posterior based i-vectors
Wang et al.	2013	Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection
Van Segbroeck et al.	2015	Rapid language identification
Li et al.	2014	Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features
Li et al.	2016	Generalized i-vector representation with phonetic tokenizations and tandem features for both text independent and text dependent speaker verification
Franco et al.	2014	Adaptive and discriminative modeling for improved mispronunciation detection
Zhang et al.	2017	Dialect Recognition Based on Unsupervised Bottleneck Features.
Maghsoodi et al.	2019	Speaker recognition with random digit strings using uncertainty normalized HMM-based i-vectors
Ma et al.	2007	Spoken language recognition using ensemble classifiers
Gholamdokht Firooz et al.	2018	Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results
Mackova et al.	2015	A study of acoustic features for emotional speaker recognition in I-vector representation
Campbell et al.	2007	The MIT-LL/IBM 2006 speaker recognition system: High-performance reduced-complexity recognition
Liu et al.	2014	An iterative framework for unsupervised learning in the plda based speaker verification
Yamamoto et al.	2015	Denoising autoencoder-based speaker feature restoration for utterances of short duration.
Safari et al.	2016	From features to speaker vectors by means of restricted boltzmann machine adaptation
Van Segbroeck et al.	2014	UBM fused total variability modeling for language identification.
Fernando et al.	2016	Eigenfeatures: An alternative to Shifted Delta Coefficients for Language Identification
Suo et al.	2008	Using SVM as back-end classifier for language identification
Diez et al.	2014	On the complementarity of phone posterior probabilities for improved speaker recognition
Shahin et al.	2018	One-class SVMs based pronunciation verification approach