Gong et al., 2022 - Google Patents

Vocalsound: A dataset for improving human vocal sounds recognition

Gong et al., 2022

Document ID: 6618456087060315486
Author: Gong Y; Yu J; Glass J
Publication year: 2022
Publication venue: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

External Links

Cited by

Snippet

Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring. However, existing datasets have a relatively small number of vocal sound samples or noisy …

Continue reading at arxiv.org (PDF) (other versions)

230000001755 vocal 0 title abstract description 49

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management

Similar Documents

Publication	Publication Date	Title
Gong et al.	2022	Vocalsound: A dataset for improving human vocal sounds recognition
CN110728997B (en)	2022-03-22	Multi-modal depression detection system based on context awareness
Kourkounakis et al.	2021	Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning
Koolagudi et al.	2009	IITKGP-SESC: speech database for emotion analysis
CN106782615B (en)	2020-06-12	Voice data emotion detection method, device and system
CN108648748A (en)	2018-10-12	Acoustic events detection method under hospital noise environment
CN104903954A (en)	2015-09-09	Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
Lefter et al.	2011	Automatic stress detection in emergency (telephone) calls
CN113314100B (en)	2021-10-08	Method, device, equipment and storage medium for evaluating and displaying results of spoken language test
CN101609672B (en)	2011-09-07	Speech recognition semantic confidence feature extraction method and device
Drygajlo	2012	Automatic speaker recognition for forensic case assessment and interpretation
Kourkounakis et al.	2020	FluentNet: end-to-end detection of speech disfluency with deep learning
CN108646914A (en)	2018-10-12	A kind of multi-modal affection data collection method and device
Wagner et al.	2018	Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora
Qadri et al.	2019	A critical insight into multi-languages speech emotion databases
CN114220419A (en)	2022-03-22	Voice evaluation method, device, medium and equipment
CN113691382A (en)	2021-11-23	Conference recording method, conference recording device, computer equipment and medium
Chou et al.	2020	Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels.
CN116884407A (en)	2023-10-13	Lightweight personalized voice awakening method, device and equipment
Herms et al.	2018	CoLoSS: Cognitive load corpus with speech and performance data from a symbol-digit dual-task
Zhou et al.	2016	Speaker diarization system for autism children's real-life audio data
Alshammri	2023	IoT‐Based Voice‐Controlled Smart Homes with Source Separation Based on Deep Learning
Kavitha et al.	2022	Deep Learning based Audio Processing Speech Emotion Detection
Zheng	2022	[Retracted] An Analysis and Research on Chinese College Students’ Psychological Barriers in Oral English Output from a Cross‐Cultural Perspective
Gosztolya et al.	2020	Ensemble Bag-of-Audio-Words representation improves paralinguistic classification accuracy