Huang et al., 2019 - Google Patents

Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019.

Huang et al., 2019

Document ID: 7184272683492917668
Author: Huang J; Bocklet T
Publication year: 2019
Publication venue: Interspeech

External Links

Cited by

Snippet

This paper describes Intel's speaker recognition systems for the VOiCES from a Distance Challenge 2019. Our submission consists of a Resnet50, and four Xvector systems trained with different data augmentation and input features. Our novel contributions include the use …

Continue reading at www.isca-archive.org (PDF) (other versions)

230000004927 fusion 0 abstract description 24

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/10—Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation

Similar Documents

Publication	Publication Date	Title
Liu et al.	2018	GMM and CNN hybrid method for short utterance speaker recognition
Xie et al.	2019	Utterance-level aggregation for speaker recognition in the wild
Qin et al.	2020	Hi-mia: A far-field text-dependent speaker verification database and the baselines
Nugraha et al.	2016	Multichannel audio source separation with deep neural networks
Kwon et al.	2021	The ins and outs of speaker recognition: lessons from VoxSRC 2020
Heigold et al.	2016	End-to-end text-dependent speaker verification
Zhao et al.	2018	Wasserstein GAN and waveform loss-based acoustic model training for multi-speaker text-to-speech synthesis systems using a WaveNet vocoder
Huang et al.	2019	Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019.
Cai et al.	2020	Within-sample variability-invariant loss for robust speaker recognition under noisy environments
Qin et al.	2020	The INTERSPEECH 2020 far-field speaker verification challenge
CN110047504B (en)	2021-08-20	Speaker identification method under identity vector x-vector linear transformation
Wang et al.	2019	Discriminative neural embedding learning for short-duration text-independent speaker verification
CN103794207A (en)	2014-05-14	Dual-mode voice identity recognition method
Hsu et al.	2018	Scalable factorized hierarchical variational autoencoder training
CN109427328A (en)	2019-03-05	A kind of multicenter voice recognition methods based on filter network acoustic model
Bai et al.	2020	Speaker verification by partial AUC optimization with mahalanobis distance metric learning
Pardede et al.	2018	Convolutional neural network and feature transformation for distant speech recognition
CN117746908A (en)	2024-03-22	Voice emotion recognition method based on time-frequency characteristic separation type transducer cross fusion architecture
Cai et al.	2019	The DKU system for the speaker recognition task of the 2019 VOiCES from a distance challenge
Kataria et al.	2021	Deep feature cyclegans: Speaker identity preserving non-parallel microphone-telephone domain adaptation for speaker verification
Wang et al.	2020	Cross-domain adaptation with discrepancy minimization for text-independent forensic speaker verification
Zhang et al.	2021	Multi-level transfer learning from near-field to far-field speaker verification
Dowerah et al.	2023	Joint optimization of diffusion probabilistic-based multichannel speech enhancement with far-field speaker verification
Zheng et al.	2022	The speakin speaker verification system for far-field speaker verification challenge 2022
Zheng et al.	2023	Unisound system for voxceleb speaker recognition challenge 2023