Dávid Sztahó et al., 2021 - Google Patents
Deep learning solution for pathological voice detection using LSTM-based autoencoder hybrid with multi-task learningDávid Sztahó et al., 2021
View PDF- Document ID
- 10750870862611866017
- Author
- Dávid Sztahó K
- Gábriel T
- Publication year
- Publication venue
- I14th International Joint Conference on Biomedical Engineering Systems and Technologies
External Links
Snippet
In this paper, a deep learning approach is introduced to detect pathological voice disorders from continuous speech. Speech as bio-signal is getting more and more attention as a discriminant for different diseases. To exploit information in speech, a long-short term …
- 230000001575 pathological 0 title abstract description 11
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dávid Sztahó et al. | Deep learning solution for pathological voice detection using LSTM-based autoencoder hybrid with multi-task learning | |
Alnuaim et al. | Human‐computer interaction for recognizing speech emotions using multilayer perceptron classifier | |
Alonso et al. | New approach in quantification of emotional intensity from the speech signal: emotional temperature | |
Madanian et al. | Speech emotion recognition using machine learning—A systematic review | |
Fu et al. | Sch-net: a deep learning architecture for automatic detection of schizophrenia | |
Venu | IOT Based Speech Recognition System to Improve the Performance of Emotion Detection | |
Bhattacharjee et al. | VoiceLens: A multi-view multi-class disease classification model through daily-life speech data | |
Kaushik et al. | SLINet: Dysphasia detection in children using deep neural network | |
Lalitha et al. | Mental Illness Disorder Diagnosis Using Emotion Variation Detection from Continuous English Speech. | |
Junior et al. | Multiple voice disorders in the same individual: investigating handcrafted features, multi-label classification algorithms, and base-learners | |
Feng | Toward knowledge-driven speech-based models of depression: Leveraging spectrotemporal variations in speech vowels | |
Bhanja et al. | Deep neural network based two-stage Indian language identification system using glottal closure instants as anchor points | |
Karan et al. | An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients | |
Deepa et al. | Speech technology in healthcare | |
Deshpande et al. | COVID-19 biomarkers in speech: on source and filter components | |
Radha et al. | Towards modeling raw speech in gender identification of children using sincNet over ERB scale | |
Rangra et al. | Emotional speech-based personality prediction using NPSO architecture in deep learning | |
Klempíř et al. | Evaluating the Performance of wav2vec Embedding for Parkinson's Disease Detection | |
Deb et al. | Classification of speech under stress using harmonic peak to energy ratio | |
Hamza et al. | Machine learning approaches for automated detection and classification of dysarthria severity | |
Bandela et al. | Stressed Speech Emotion Recognition Using Teager Energy and Spectral Feature Fusion with Feature Optimization | |
Jenei et al. | Detection of speech related disorders by pre-trained embedding models extracted biomarkers | |
Gaikwad et al. | Speech recognition-based prediction for mental health and depression: a review | |
Kumar et al. | Analysis and classification of electroglottography signals for the detection of speech disorders | |
Brueckner et al. | Audio-Based Detection of Anxiety and Depression via Vocal Biomarkers |