Authors:
Toyoaki Kuwahara
;
Ryohei Orihara
;
Yuichi Sei
;
Yasuyuki Tahara
and
Akihiko Ohsuga
Affiliation:
Graduate School of Information and Engineering, University of Electro-Communications, Tokyo, Japan
Keyword(s):
Deep Learning, Cross Corpus, Virtual Adversarial Training, Emotion Recognition, Speech Processing, Spontaneity.
Abstract:
Speech-based emotion estimation increases accuracy through the development of deep learning. However, most emotion estimation using deep learning requires supervised learning, and it is difficult to obtain large datasets used for training. In addition, if the training data environment and the actual data environment are significantly different, the problem is that the accuracy of emotion estimation is reduced. Therefore, in this study, to solve these problems, we propose a emotion estimation model using virtual adversarial training (VAT), a semi-supervised learning method that improves the robustness of the model. Furthermore, research on the spontaneity of speech has progressed year by year, and recent studies have shown that the accuracy of emotion classification is improved when spontaneity is taken into account. We would like to investigate the effect of the spontaneity in a cross-language situation. First, VAT hyperparameters were first set by a preliminary experiment using a si
ngle corpus. Next, the robustness of the model generated by the evaluation experiment by the cross corpus was shown. Finally, we evaluate the accuracy of emotion estimation by considering spontaneity and showed improvement in the accuracy of the model using VAT by considering spontaneity.
(More)