Meghanani et al., 2023 - Google Patents
Deriving translational acoustic sub-word embeddingsMeghanani et al., 2023
View PDF- Document ID
- 6094570466286242255
- Author
- Meghanani A
- Hain T
- Publication year
- Publication venue
- 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
External Links
Snippet
There is a growing interest in understanding the representational geometry of acoustic word embeddings (AWEs), which are fixed-dimensional representations of spoken words. However, not much research has been conducted on acoustic sub-word embeddings …
- 238000000354 decomposition reaction 0 abstract description 2
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Baevski et al. | Effectiveness of self-supervised pre-training for speech recognition | |
Kamper et al. | A segmental framework for fully-unsupervised large-vocabulary speech recognition | |
Poddar et al. | Speaker verification with short utterances: a review of challenges, trends and opportunities | |
Li et al. | Speaker-invariant affective representation learning via adversarial training | |
Sun et al. | Weighted spectral features based on local Hu moments for speech emotion recognition | |
Pan et al. | Automatic hierarchical attention neural network for detecting AD | |
Oflazoglu et al. | Recognizing emotion from Turkish speech using acoustic features | |
India Massana et al. | LSTM neural network-based speaker segmentation using acoustic and language modelling | |
Jassim et al. | Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features | |
Guo et al. | Text classification by contrastive learning and cross-lingual data augmentation for alzheimer’s disease detection | |
Van Staden et al. | A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings | |
Yang et al. | Autoregressive predictive coding: A comprehensive study | |
Bhati et al. | Self-expressing autoencoders for unsupervised spoken term discovery | |
Wataraka Gamage et al. | Speech-based continuous emotion prediction by learning perception responses related to salient events: A study based on vocal affect bursts and cross-cultural affect in AVEC 2018 | |
Tang et al. | A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning | |
Birla | A robust unsupervised pattern discovery and clustering of speech signals | |
Iakushkin et al. | Russian-language speech recognition system based on deepspeech | |
Vlasenko et al. | Fusion of acoustic and linguistic information using supervised autoencoder for improved emotion recognition | |
Mestre et al. | Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates | |
Meghanani et al. | Deriving translational acoustic sub-word embeddings | |
Singhal et al. | Estimation of Accuracy in Human Gender Identification and Recall Values Based on Voice Signals Using Different Classifiers | |
Thiruvaran et al. | Spectral shifting of speaker‐specific information for narrow band telephonic speaker recognition | |
Wazir et al. | Deep learning-based detection of inappropriate speech content for film censorship | |
Saleem et al. | Voice conversion and spoofed voice detection from parallel English and Urdu corpus using cyclic GANs | |
Chen et al. | Topic segmentation on spoken documents using self-validated acoustic cuts |