Dumpala et al., 2021 - Google Patents

Significance of speaker embeddings and temporal context for depression detection

Dumpala et al., 2021

Document ID: 13349071616198953094
Author: Dumpala S; Rodriguez S; Rempel S; Uher R; Oore S
Publication year: 2021
Publication venue: arXiv preprint arXiv:2107.13969

External Links

Cited by

Snippet

Depression detection from speech has attracted a lot of attention in recent years. However, the significance of speaker-specific information in depression detection has not yet been explored. In this work, we analyze the significance of speaker embeddings for the task of …

Continue reading at arxiv.org (PDF) (other versions)

238000001514 detection method 0 title abstract description 58

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems

Similar Documents

Publication	Publication Date	Title
Shahin et al.	2019	Emotion recognition using hybrid Gaussian mixture model and deep neural network
Nasir et al.	2016	Multimodal and multiresolution depression detection from speech and facial landmark features
Tirumala et al.	2017	Speaker identification features extraction methods: A systematic review
Alonso et al.	2015	New approach in quantification of emotional intensity from the speech signal: emotional temperature
Jin et al.	2015	Speech emotion recognition with acoustic and lexical features
Li et al.	2019	An automated assessment framework for atypical prosody and stereotyped idiosyncratic phrases related to autism spectrum disorder
Rohanian et al.	2019	Detecting Depression with Word-Level Multimodal Fusion.
Sharma et al.	2013	Acoustic model adaptation using in-domain background models for dysarthric speech recognition
Levitan et al.	2016	Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.
Dumpala et al.	2021	Significance of speaker embeddings and temporal context for depression detection
Temko et al.	2008	Fuzzy integral based information fusion for classification of highly confusable non-speech sounds
Dumpala et al.	2022	Detecting depression with a temporal context of speaker embeddings
Das et al.	2024	A deep learning model for depression detection based on MFCC and CNN generated spectrogram features
Farhadipour et al.	2018	Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks
Miao et al.	2022	Fusing features of speech for depression classification based on higher-order spectral analysis
Campbell et al.	2020	Alzheimer's dementia detection from audio and text modalities
Velichko et al.	2022	Complex Paralinguistic Analysis of Speech: Predicting Gender, Emotions and Deception in a Hierarchical Framework.
Tao et al.	2023	The androids corpus: A new publicly available benchmark for speech based depression detection
Aldeneh et al.	2021	You're Not You When You're Angry: Robust Emotion Features Emerge by Recognizing Speakers
Rangra et al.	2023	Emotional speech-based personality prediction using NPSO architecture in deep learning
Rodellar‐Biarge et al.	2015	Towards the search of detection in speech‐relevant features for stress
Deb et al.	2016	Classification of speech under stress using harmonic peak to energy ratio
Yadav et al.	2022	A filter-based feature selection approach for the prediction of Alzheimer's diseases through audio classification
Campbell et al.	2021	Alzheimer's Dementia Detection from Audio and Language Modalities in Spontaneous Speech.
Zourmand et al.	2013	Gender classification in children based on speech characteristics: using fundamental and formant frequencies of Malay vowels