Zheng et al., 2023 - Google Patents

MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios

Zheng et al., 2023

View PDF

Document ID: 14884651667386331311
Author: Zheng Q; Chen Z; Liu H; Lu Y; Li J; Liu T
Publication year: 2023
Publication venue: Expert Systems with Applications

External Links

Cited by

Snippet

Speaker embeddings have become the most popular feature representation in speaker verification. Improving the robustness of speaker embedding extraction systems is a crucial problem. A multi-scale residual aggregation network (MSRANet), which is a simple but …

Continue reading at papers.ssrn.com (PDF) (other versions)

238000002474 experimental method 0 abstract description 23

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/005—Speaker recognisers specially adapted for particular applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Similar Documents

Publication	Publication Date	Title
Tirumala et al.	2017	Speaker identification features extraction methods: A systematic review
Zheng et al.	2023	MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios
CN111048097B (en)	2022-11-29	Twin network voiceprint recognition method based on 3D convolution
Khdier et al.	2021	Deep learning algorithms based voiceprint recognition system in noisy environment
Guo et al.	2018	Deep neural network based i-vector mapping for speaker verification using short utterances
Hasannezhad et al.	2022	PACDNN: A phase-aware composite deep neural network for speech enhancement
Jadhav et al.	2018	A Speaker Recognition System Using Gaussian Mixture Model, EM Algorithm and K-Means Clustering
Xue et al.	2023	Cross-modal information fusion for voice spoofing detection
CN112992155B (en)	2022-10-14	Far-field voice speaker recognition method and device based on residual error neural network
Singh	2024	A text independent speaker identification system using ANN, RNN, and CNN classification technique
Wang et al.	2021	Robust speaker recognition based on stacked auto-encoders
Chao et al.	2014	Improving generation performance of speech emotion recognition by denoising autoencoders
Saritha et al.	2024	Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal
Chakravarty et al.	2024	A lightweight feature extraction technique for deepfake audio detection
CN112562725A (en)	2021-03-26	Mixed voice emotion classification method based on spectrogram and capsule network
Radha et al.	2024	Speech and speaker recognition using raw waveform modeling for adult and children’s speech: A comprehensive review
Wu et al.	2019	The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge.
Li et al.	2023	Few-shot speaker identification using lightweight prototypical network with feature grouping and interaction
Ng et al.	2018	Teacher-student training for text-independent speaker recognition
Tailor et al.	2022	Deep learning approach for spoken digit recognition in Gujarati language
Pentapati et al.	2022	Dilated Convolution and MelSpectrum for Speaker Identification using Simple Deep Network
Zhipeng et al.	2019	Voiceprint recognition based on BP Neural Network and CNN
Hu et al.	2021	Speaker Recognition Based on 3DCNN-LSTM.
GS et al.	2022	Synthetic speech classification using bidirectional LSTM Networks
Anguraj et al.	2022	Analysis of influencing features with spectral feature extraction and multi-class classification using deep neural network for speech recognition system