[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Multilingual Speech Emotion Recognition on Japanese, English, and German

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13452))

  • 470 Accesses

Abstract

The current study focuses on human emotion recognition based on speech, and particularly on multilingual speech emotion recognition using Japanese, English, and German emotional corpora. The proposed method exploits conditional random fields (CRF) classifiers in a two-level classification scheme. Specifically, in the first level, the language spoken is identified, and in the second level, speech emotion recognition is carried out using emotion models specific to the identified language. In both the first and second levels, CRF classifiers fed with acoustic features are applied. The CRF classifier is a popular probabilistic method for structured prediction, and is widely applied in natural language processing, computer vision, and bioinformatics. In the current study, the use of CRF in speech emotion recognition when limited training data are available is experimentally investigated. The results obtained show the effectiveness of using CRF when only a small amount of training data are available and methods based on a deep neural networks (DNN) are less effective. Furthermore, the proposed method is also compared with two popular classifiers, namely, support vector machines (SVM), and probabilistic linear discriminant analysis (PLDA) and higher accuracy was obtained using the proposed method. For the classification of four emotions (i.e., neutral, happy, angry, sad) the proposed method based on CRF achieved classification rates of 93.8% for English, 95.0% for German, and 88.8% for Japanese. These results are very promising, and superior to the results obtained in other similar studies on multilingual or even monolingual speech emotion recognition .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Busso, C., Bulut, M., Narayanan, S.: Toward effective automatic recognition systems of emotion in speech. In: Gratch, J., Marsella, S. (eds.) Social emotions in nature and artifact: emotions in human and human-computer interaction, pp. 110–127. Oxford University Press, New York (2013)

    Chapter  Google Scholar 

  2. Tang, H., Chu, S., Johnson, M.H.: Emotion recognition from speech via boosted gaussian mixture models. In: Proceedings of ICME, pp. 294–297 (2009)

    Google Scholar 

  3. Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)

    Google Scholar 

  4. Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. Neural Comput. Appli. 9(4), 290–296 (2000)

    Article  Google Scholar 

  5. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of Interspeech, pp. 223–227 (2014)

    Google Scholar 

  6. Polzehl, T., Schmitt, A., Metze, F.: Approaching multi-lingual emotion recognition from speech-on language dependency of acoustic prosodic features for anger detection. In: Proceedings of Speech Prosody (2010)

    Google Scholar 

  7. Bhaykar, M., Yadav, J., Rao, K.S.: Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. In: 2013 National Conference on Communications (NCC), pp. 1–5. IEEE (2013)

    Google Scholar 

  8. Eyben, F., Batliner, A., Schuller, B., Seppi, D., Steidl, S.: Crosscorpus classification of realistic emotions - some pilot experiments. In: Proceedings of the Third International Workshop on EMOTION (satellite of LREC) (2010)

    Google Scholar 

  9. Sagha, H., Matejka, P., Gavryukova, M., Povolny, F., Marchi, E., Schuller, B.: Enhancing multilingual recognition of emotion in speech by language identification. In: Proceedings of Interspeech (2016)

    Google Scholar 

  10. Li, X., Akagi, M.: A three-layer emotion perception model for valence and arousal-based detection from multilingual speech. In: Proceedings of Interspeech, pp. 3643–3647 (2018)

    Google Scholar 

  11. Li, H., Ma, B., Lee, K.A.: Spoken language recognition: From fundamentals to practice. In: Proceedings of the IEEE, vol. 101(5), pp. 1136–1159 (2013)

    Google Scholar 

  12. Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. lEEE Trans. Speech Audio Process. 4(1), 31–44 (1996)

    Google Scholar 

  13. Caseiro, D., Trancoso, I.: Spoken language identification using the speechdat corpus. In: Proceedings of ICSLP 1998 (1998)

    Google Scholar 

  14. Siniscalchi, S.M., Reed, J., Svendsen, T., Lee, C.-H.: Universal attribute characterization of spoken languages for automatic spoken language recognition. Comput. Speech Lang. 27, 209–227 (2013)

    Article  Google Scholar 

  15. Lee, C.-H.: principles of spoken language recognition. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 785–796. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_39

    Chapter  Google Scholar 

  16. Reynolds, D.A., Campbell, W.M., Shen, W., Singer, E.: Automatic language recognition via spectral and token based approaches. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 811–824. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_41

    Chapter  Google Scholar 

  17. Cole, R., Inouye, J., Muthusamy, Y., Gopalakrishnan, M.: Language identification with neural networks: a feasibility study. In: Proceedings of IEEE Pacific Rim Conference, pp. 525–529 (1989)

    Google Scholar 

  18. Leena, M., Rao, K.S., Yegnanarayana, B.: Neural network classifiers for language identification using phonotactic and prosodic features. In: Proceedings of Intelligent Sensing and Information Processing, pp. 404–408(2005)

    Google Scholar 

  19. Montavon, G.: Deep learning for spoken language identification. In: NIPS workshop on Deep Learning for Speech Recognition and Related Applications (2009)

    Google Scholar 

  20. Moreno, I.L., Dominguez, J.G., Plchot, O., Martinez, D., Rodriguez, J.G., Moreno, P.: Automatic language identification using deep neural networks. In: Proceedings of ICASSP, pp. 5337–5341 (2014)

    Google Scholar 

  21. Heracleous, P., Takai, K., Yasuda, K., Mohammad, Y., Yoneyama, A.: Comparative Study on Spoken Language Identification Based on Deep Learning. In: Proceedings of EUSIPCO (2018)

    Google Scholar 

  22. Jiang, B., Song, Y., Wei, S., Liu, J.-H., McLoughlin, I.V., Dai, L.-R.: Deep bottleneck features for spoken language identification. PLoS ONE 9(7), 1–11 (2010)

    Google Scholar 

  23. Zazo, R., Diez, A.L., Dominguez, J.G., Toledano, D.T., Rodriguez, J.G.: Language identification in short utterances using long short-term memory (lstm) recurrent neural networks. PLoS ONE 11(1), e0146917 (2016)

    Article  Google Scholar 

  24. Heracleous, P., Mohammad, Y., Takai, K., Yasuda, K., Yoneyama, A.: Spoken Language Identification Based on I-vectors and Conditional Random Fields. In: Proceedings of IWCMC, pp. 1443–1447 (2018)

    Google Scholar 

  25. Cristianini, N., Taylor, J.S.: Support vector machines. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  26. Dehak, N., Carrasquillo, P.A.T., Reynolds, D., Dehak, R.: Language recognition via ivectors and dimensionality reduction. In: Proceedings of Interspeech, pp. 857–860 (2011)

    Google Scholar 

  27. Shen, P., Lu, X., Liu, L., Kawai, H.: Local fisher discriminant analysis for spoken language identification. In: Proceedings of ICASSP, pp. 5825–5829 (2016)

    Google Scholar 

  28. Livingstone, S.R., Peck, K., F.A., Russo: RAVDESS: The ryerson audio-visual database of emotional speech and song. In: 22nd Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS), Kingston, ON (2012)

    Google Scholar 

  29. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech (2005)

    Google Scholar 

  30. Reiter, S., Schuller, B., Rigoll, G.: Hidden conditional random fields for meeting segmentation. In: Proceedings of ICME, pp. 639–642 (2007)

    Google Scholar 

  31. Gunawardana, A., Mahajan, M., Acero, A., Platt, J.C.: Hidden conditional random fields for phone classification. In: Proceedings of Interspeech, pp. 1117–1120 (2005)

    Google Scholar 

  32. Llorens, H., Saquete, E., Colorado, B.N.: TimeML events recognition and classification: learning crf models with semantic roles. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 725–733 (2010)

    Google Scholar 

  33. Yu, D., Wang, S., Karam, Z., Deng, L.: Language recognition using deep-structured conditional random fields. In: Proceedings of ICASSP, pp. 5030–5033 (2010)

    Google Scholar 

  34. Quattoni, A., Collins, M., Darrell, T.: Conditional random fields for object recognition. In: Saul, L.K., Weiss, Y., Bottou, L., (eds.) Advances in Neural Information Processing Systems 17, MIT Press, pp. 1097–1104 (2005)

    Google Scholar 

  35. Yu, C., Liu, G., Hansen, J.H.L.: Acoustic feature transformation using ubm-based lda for speaker recognition. In: Proceedings of Interspeech, pp. 1851–1854 (2014)

    Google Scholar 

  36. Li, X., Akagi, M.: Multilingual speech emotion recognition system based on a three-layer model. In: Proceedimgs of Interspeech, pp. 3606–3612 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panikos Heracleous .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Heracleous, P., Yasuda, K., Yoneyama, A. (2023). Multilingual Speech Emotion Recognition on Japanese, English, and German. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24340-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24339-4

  • Online ISBN: 978-3-031-24340-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics