Dávid Sztahó et al., 2021 - Google Patents

Deep learning solution for pathological voice detection using LSTM-based autoencoder hybrid with multi-task learning

Dávid Sztahó et al., 2021

Document ID: 10750870862611866017
Author: Dávid Sztahó K; Gábriel T
Publication year: 2021
Publication venue: I14th International Joint Conference on Biomedical Engineering Systems and Technologies

External Links

Cited by

Snippet

In this paper, a deep learning approach is introduced to detect pathological voice disorders from continuous speech. Speech as bio-signal is getting more and more attention as a discriminant for different diseases. To exploit information in speech, a long-short term …

Continue reading at www.researchgate.net (PDF) (other versions)

230000001575 pathological 0 title abstract description 11

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines

Similar Documents

Publication	Publication Date	Title
Dávid Sztahó et al.	2021	Deep learning solution for pathological voice detection using LSTM-based autoencoder hybrid with multi-task learning
Alnuaim et al.	2022	Human‐computer interaction for recognizing speech emotions using multilayer perceptron classifier
Alonso et al.	2015	New approach in quantification of emotional intensity from the speech signal: emotional temperature
Madanian et al.	2023	Speech emotion recognition using machine learning—A systematic review
Fu et al.	2021	Sch-net: a deep learning architecture for automatic detection of schizophrenia
Venu	2022	IOT Based Speech Recognition System to Improve the Performance of Emotion Detection
Bhattacharjee et al.	2022	VoiceLens: A multi-view multi-class disease classification model through daily-life speech data
Kaushik et al.	2021	SLINet: Dysphasia detection in children using deep neural network
Lalitha et al.	2021	Mental Illness Disorder Diagnosis Using Emotion Variation Detection from Continuous English Speech.
Junior et al.	2023	Multiple voice disorders in the same individual: investigating handcrafted features, multi-label classification algorithms, and base-learners
Feng	2022	Toward knowledge-driven speech-based models of depression: Leveraging spectrotemporal variations in speech vowels
Bhanja et al.	2022	Deep neural network based two-stage Indian language identification system using glottal closure instants as anchor points
Karan et al.	2022	An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients
Deepa et al.	2022	Speech technology in healthcare
Deshpande et al.	2021	COVID-19 biomarkers in speech: on source and filter components
Radha et al.	2023	Towards modeling raw speech in gender identification of children using sincNet over ERB scale
Rangra et al.	2023	Emotional speech-based personality prediction using NPSO architecture in deep learning
Klempíř et al.	2023	Evaluating the Performance of wav2vec Embedding for Parkinson's Disease Detection
Deb et al.	2016	Classification of speech under stress using harmonic peak to energy ratio
Hamza et al.	2023	Machine learning approaches for automated detection and classification of dysarthria severity
Bandela et al.	2023	Stressed Speech Emotion Recognition Using Teager Energy and Spectral Feature Fusion with Feature Optimization
Jenei et al.	2022	Detection of speech related disorders by pre-trained embedding models extracted biomarkers
Gaikwad et al.	2023	Speech recognition-based prediction for mental health and depression: a review
Kumar et al.	2023	Analysis and classification of electroglottography signals for the detection of speech disorders
Brueckner et al.	2024	Audio-Based Detection of Anxiety and Depression via Vocal Biomarkers