Gong et al., 2022 - Google Patents
Vocalsound: A dataset for improving human vocal sounds recognitionGong et al., 2022
View PDF- Document ID
- 6618456087060315486
- Author
- Gong Y
- Yu J
- Glass J
- Publication year
- Publication venue
- ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
External Links
Snippet
Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring. However, existing datasets have a relatively small number of vocal sound samples or noisy …
- 230000001755 vocal 0 title abstract description 49
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gong et al. | Vocalsound: A dataset for improving human vocal sounds recognition | |
CN110728997B (en) | Multi-modal depression detection system based on context awareness | |
Kourkounakis et al. | Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning | |
Koolagudi et al. | IITKGP-SESC: speech database for emotion analysis | |
CN106782615B (en) | Voice data emotion detection method, device and system | |
CN108648748A (en) | Acoustic events detection method under hospital noise environment | |
CN104903954A (en) | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination | |
Lefter et al. | Automatic stress detection in emergency (telephone) calls | |
CN113314100B (en) | Method, device, equipment and storage medium for evaluating and displaying results of spoken language test | |
CN101609672B (en) | Speech recognition semantic confidence feature extraction method and device | |
Drygajlo | Automatic speaker recognition for forensic case assessment and interpretation | |
Kourkounakis et al. | FluentNet: end-to-end detection of speech disfluency with deep learning | |
CN108646914A (en) | A kind of multi-modal affection data collection method and device | |
Wagner et al. | Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora | |
Qadri et al. | A critical insight into multi-languages speech emotion databases | |
CN114220419A (en) | Voice evaluation method, device, medium and equipment | |
CN113691382A (en) | Conference recording method, conference recording device, computer equipment and medium | |
Chou et al. | Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels. | |
CN116884407A (en) | Lightweight personalized voice awakening method, device and equipment | |
Herms et al. | CoLoSS: Cognitive load corpus with speech and performance data from a symbol-digit dual-task | |
Zhou et al. | Speaker diarization system for autism children's real-life audio data | |
Alshammri | IoT‐Based Voice‐Controlled Smart Homes with Source Separation Based on Deep Learning | |
Kavitha et al. | Deep Learning based Audio Processing Speech Emotion Detection | |
Zheng | [Retracted] An Analysis and Research on Chinese College Students’ Psychological Barriers in Oral English Output from a Cross‐Cultural Perspective | |
Gosztolya et al. | Ensemble Bag-of-Audio-Words representation improves paralinguistic classification accuracy |