Meghanani et al., 2023 - Google Patents

Deriving translational acoustic sub-word embeddings

Meghanani et al., 2023

Document ID: 6094570466286242255
Author: Meghanani A; Hain T
Publication year: 2023
Publication venue: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

External Links

Cited by

Snippet

There is a growing interest in understanding the representational geometry of acoustic word embeddings (AWEs), which are fixed-dimensional representations of spoken words. However, not much research has been conducted on acoustic sub-word embeddings …

Continue reading at eprints.whiterose.ac.uk (PDF) (other versions)

238000000354 decomposition reaction 0 abstract description 2

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Baevski et al.	2019	Effectiveness of self-supervised pre-training for speech recognition
Kamper et al.	2017	A segmental framework for fully-unsupervised large-vocabulary speech recognition
Poddar et al.	2018	Speaker verification with short utterances: a review of challenges, trends and opportunities
Li et al.	2020	Speaker-invariant affective representation learning via adversarial training
Sun et al.	2015	Weighted spectral features based on local Hu moments for speech emotion recognition
Pan et al.	2019	Automatic hierarchical attention neural network for detecting AD
Oflazoglu et al.	2013	Recognizing emotion from Turkish speech using acoustic features
India Massana et al.	2017	LSTM neural network-based speaker segmentation using acoustic and language modelling
Jassim et al.	2017	Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features
Guo et al.	2020	Text classification by contrastive learning and cross-lingual data augmentation for alzheimer’s disease detection
Van Staden et al.	2021	A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings
Yang et al.	2022	Autoregressive predictive coding: A comprehensive study
Bhati et al.	2020	Self-expressing autoencoders for unsupervised spoken term discovery
Wataraka Gamage et al.	2018	Speech-based continuous emotion prediction by learning perception responses related to salient events: A study based on vocal affect bursts and cross-cultural affect in AVEC 2018
Tang et al.	2023	A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Birla	2018	A robust unsupervised pattern discovery and clustering of speech signals
Iakushkin et al.	2018	Russian-language speech recognition system based on deepspeech
Vlasenko et al.	2021	Fusion of acoustic and linguistic information using supervised autoencoder for improved emotion recognition
Mestre et al.	2023	Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates
Meghanani et al.	2023	Deriving translational acoustic sub-word embeddings
Singhal et al.	2022	Estimation of Accuracy in Human Gender Identification and Recall Values Based on Voice Signals Using Different Classifiers
Thiruvaran et al.	2015	Spectral shifting of speaker‐specific information for narrow band telephonic speaker recognition
Wazir et al.	2022	Deep learning-based detection of inappropriate speech content for film censorship
Saleem et al.	2019	Voice conversion and spoofed voice detection from parallel English and Urdu corpus using cyclic GANs
Chen et al.	2015	Topic segmentation on spoken documents using self-validated acoustic cuts