[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
Dongyuan LiYing ZhangYusong WangKotaro FunakoshiManabu Okumura
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 3 Pages 825-867

Details
Abstract

Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called After, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training speech recognition task and the downstream speech emotion recognition task. Then, AL methods are employed to iteratively select a subset of the most informative and diverse samples for fine-tuning, thereby reducing time consumption. Experiments demonstrate that our proposed method After, using only 20% of samples, improves accuracy by 8.45% and reduces time consumption by 79%. The additional extension of After and ablation studies further confirm its effectiveness and applicability to various real-world scenarios.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top