[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Meghanani et al., 2023 - Google Patents

Deriving translational acoustic sub-word embeddings

Meghanani et al., 2023

View PDF
Document ID
6094570466286242255
Author
Meghanani A
Hain T
Publication year
Publication venue
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

External Links

Snippet

There is a growing interest in understanding the representational geometry of acoustic word embeddings (AWEs), which are fixed-dimensional representations of spoken words. However, not much research has been conducted on acoustic sub-word embeddings …
Continue reading at eprints.whiterose.ac.uk (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • G06F17/30675Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Similar Documents

Publication Publication Date Title
Baevski et al. Effectiveness of self-supervised pre-training for speech recognition
Kamper et al. A segmental framework for fully-unsupervised large-vocabulary speech recognition
Poddar et al. Speaker verification with short utterances: a review of challenges, trends and opportunities
Li et al. Speaker-invariant affective representation learning via adversarial training
Sun et al. Weighted spectral features based on local Hu moments for speech emotion recognition
Pan et al. Automatic hierarchical attention neural network for detecting AD
Oflazoglu et al. Recognizing emotion from Turkish speech using acoustic features
India Massana et al. LSTM neural network-based speaker segmentation using acoustic and language modelling
Jassim et al. Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features
Guo et al. Text classification by contrastive learning and cross-lingual data augmentation for alzheimer’s disease detection
Van Staden et al. A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings
Yang et al. Autoregressive predictive coding: A comprehensive study
Bhati et al. Self-expressing autoencoders for unsupervised spoken term discovery
Wataraka Gamage et al. Speech-based continuous emotion prediction by learning perception responses related to salient events: A study based on vocal affect bursts and cross-cultural affect in AVEC 2018
Tang et al. A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Birla A robust unsupervised pattern discovery and clustering of speech signals
Iakushkin et al. Russian-language speech recognition system based on deepspeech
Vlasenko et al. Fusion of acoustic and linguistic information using supervised autoencoder for improved emotion recognition
Mestre et al. Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates
Meghanani et al. Deriving translational acoustic sub-word embeddings
Singhal et al. Estimation of Accuracy in Human Gender Identification and Recall Values Based on Voice Signals Using Different Classifiers
Thiruvaran et al. Spectral shifting of speaker‐specific information for narrow band telephonic speaker recognition
Wazir et al. Deep learning-based detection of inappropriate speech content for film censorship
Saleem et al. Voice conversion and spoofed voice detection from parallel English and Urdu corpus using cyclic GANs
Chen et al. Topic segmentation on spoken documents using self-validated acoustic cuts