Shastri et al., 2022 - Google Patents

Adversarial Synthesis based Data Augmentation for Speech Classification

Shastri et al., 2022

Document ID: 3736463336945351465
Author: Shastri P; Patil C; Wanere P; Mahajan S; Bhatt A
Publication year: 2022
Publication venue: 2022 International Conference on Signal and Information Processing (IConSIP)

External Links

Cited by

Snippet

Speech classification plays a vital role in modern audio processing, with the rise in technologies like home assistants and speech-based control devices. Deep learning-based algorithms have played a big role in developing such technologies. Deep learning …

Continue reading at ieeexplore.ieee.org (other versions)

230000003416 augmentation 0 title abstract description 22

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
- G10L2015/0636—Threshold criteria for the updating
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility

Similar Documents

Publication	Publication Date	Title
Neumann et al.	2017	Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech
Deb et al.	2017	Emotion classification using segmentation of vowel-like and non-vowel-like regions
Ghahabi et al.	2017	Deep learning backend for single and multisession i-vector speaker recognition
Hsu et al.	2019	Transfer learning from audio-visual grounding to speech recognition
Basak et al.	2023	Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems.
CN105654940A (en)	2016-06-08	Voice synthesis method and device
Scherer et al.	2009	Classifier fusion for emotion recognition from speech
Cord-Landwehr et al.	2023	Frame-wise and overlap-robust speaker embeddings for meeting diarization
CN117253493A (en)	2023-12-19	Audio encoding method for speech generation task, electronic device, and storage medium
Reuter et al.	2023	Multilingual query-by-example keyword spotting with metric learning and phoneme-to-embedding mapping
Kim et al.	2022	Ada-vad: Unpaired adversarial domain adaptation for noise-robust voice activity detection
JP6816047B2 (en)	2021-01-20	Objective utterance estimation model learning device, objective utterance determination device, objective utterance estimation model learning method, objective utterance determination method, program
Monteiro et al.	2019	Combining Speaker Recognition and Metric Learning for Speaker-Dependent Representation Learning.
Yu et al.	2021	Language Recognition Based on Unsupervised Pretrained Models.
Shastri et al.	2022	Adversarial Synthesis based Data Augmentation for Speech Classification
Djeffal et al.	2023	Noise-robust speech recognition: A comparative analysis of LSTM and CNN approaches
CN114333762B (en)	2022-11-18	Expressive force-based speech synthesis method, expressive force-based speech synthesis system, electronic device and storage medium
Sukvichai et al.	2021	Automatic speech recognition for Thai sentence based on MFCC and CNNs
Bhanushali et al.	2024	Adversarial attacks on automatic speech recognition (ASR): A survey
Nahar et al.	2020	Effect of data augmentation on dnn-based vad for automatic speech recognition in noisy environment
Bovbjerg et al.	2024	Self-Supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
Vidal et al.	2023	Mispronunciation detection using self-supervised speech representations
El Hajji et al.	2019	Transfer Learning based Audio Classification for a noisy and speechless recordings detection task, in a classroom context.
Lee et al.	2008	Spoken language recognition using support vector machines with generative front-end
Dhakal	2018	Novel Architectures for Human Voice and Environmental Sound Recognitionusing Machine Learning Algorithms