Huang et al., 2019 - Google Patents
Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019.Huang et al., 2019
View PDF- Document ID
- 7184272683492917668
- Author
- Huang J
- Bocklet T
- Publication year
- Publication venue
- Interspeech
External Links
Snippet
This paper describes Intel's speaker recognition systems for the VOiCES from a Distance Challenge 2019. Our submission consists of a Resnet50, and four Xvector systems trained with different data augmentation and input features. Our novel contributions include the use …
- 230000004927 fusion 0 abstract description 24
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/10—Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | GMM and CNN hybrid method for short utterance speaker recognition | |
Xie et al. | Utterance-level aggregation for speaker recognition in the wild | |
Qin et al. | Hi-mia: A far-field text-dependent speaker verification database and the baselines | |
Nugraha et al. | Multichannel audio source separation with deep neural networks | |
Kwon et al. | The ins and outs of speaker recognition: lessons from VoxSRC 2020 | |
Heigold et al. | End-to-end text-dependent speaker verification | |
Zhao et al. | Wasserstein GAN and waveform loss-based acoustic model training for multi-speaker text-to-speech synthesis systems using a WaveNet vocoder | |
Huang et al. | Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019. | |
Cai et al. | Within-sample variability-invariant loss for robust speaker recognition under noisy environments | |
Qin et al. | The INTERSPEECH 2020 far-field speaker verification challenge | |
CN110047504B (en) | Speaker identification method under identity vector x-vector linear transformation | |
Wang et al. | Discriminative neural embedding learning for short-duration text-independent speaker verification | |
CN103794207A (en) | Dual-mode voice identity recognition method | |
Hsu et al. | Scalable factorized hierarchical variational autoencoder training | |
CN109427328A (en) | A kind of multicenter voice recognition methods based on filter network acoustic model | |
Bai et al. | Speaker verification by partial AUC optimization with mahalanobis distance metric learning | |
Pardede et al. | Convolutional neural network and feature transformation for distant speech recognition | |
CN117746908A (en) | Voice emotion recognition method based on time-frequency characteristic separation type transducer cross fusion architecture | |
Cai et al. | The DKU system for the speaker recognition task of the 2019 VOiCES from a distance challenge | |
Kataria et al. | Deep feature cyclegans: Speaker identity preserving non-parallel microphone-telephone domain adaptation for speaker verification | |
Wang et al. | Cross-domain adaptation with discrepancy minimization for text-independent forensic speaker verification | |
Zhang et al. | Multi-level transfer learning from near-field to far-field speaker verification | |
Dowerah et al. | Joint optimization of diffusion probabilistic-based multichannel speech enhancement with far-field speaker verification | |
Zheng et al. | The speakin speaker verification system for far-field speaker verification challenge 2022 | |
Zheng et al. | Unisound system for voxceleb speaker recognition challenge 2023 |