Abstract
Emotion recognition is a relevant task in human-computer interaction. Several pattern recognition and machine learning techniques have been applied so far in order to assign input audio and/or video sequences to specific emotional classes. This paper introduces a novel approach to the problem, suitable also to more generic sequence recognition tasks. The approach relies on the combination of the recurrent reservoir of an echo state network with a connectionist density estimation module. The reservoir realizes an encoding of the input sequences into a fixed-dimensionality pattern of neuron activations. The density estimator, consisting of a constrained radial basis functions network, evaluates the likelihood of the echo state given the input. Unsupervised training is accomplished within a maximum-likelihood framework. The architecture can then be used for estimating class-conditional probabilities in order to carry out emotion classification within a Bayesian setup. Preliminary experiments in emotion recognition from speech signals from the WaSeP© dataset show that the proposed approach is effective, and it may outperform state-of-the-art classifiers.
Chapter PDF
Similar content being viewed by others
Keywords
References
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, San Francisco, pp. 148–156 (1996)
Hermansky, H., Hanson, B., Wakita, H.: Perceptually based linear predictive analysis of speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1985, April 1985, vol. 10, pp. 509–512 (1985)
Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: Rasta-plp speech analysis. Technical report, ICSI Technical Report TR-91-069 (1991)
Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: Rasta-plp speech analysis technique. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992, vol. 1, pp. 121–124 (1992)
Jaeger, H.: Tutorial on training recurrent neural networks, covering bppt, rtrl, ekf and the echo state network approach. Technical Report 159, Fraunhofer-Gesellschaft, St. Augustin Germany (2002)
Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004)
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.S.: Emotion recognition based on phoneme classes. In: Proceedings of ICSLP 2004 (2004)
McLachlan, G.J., Basford, K.E. (eds.): Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York (1988)
Rabiner, L.R.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Robinson, D.W., Dadson, R.S.: A re-determination of the equal-loudness relations for pure tones. British Journal of Applied Physics 7(5), 166–181 (1956)
Scherer, K.R., Johnstone, T., Klasmeyer, G.: Vocal expression of emotion. In: Davidson, R.J., Scherer, K.R., Goldsmith, H.H. (eds.) Handbook of Affective Sciences, Affective Science, pp. 433–456. Oxford University Press, Oxford (2003)
Scherer, S., Oubbati, M., Schwenker, F., Palm, G.: Real-time emotion recognition from speech using echo state networks. In: Prevost, L., Marinai, S., Schwenker, F. (eds.) ANNPR 2008. LNCS (LNAI), vol. 5064, pp. 205–216. Springer, Heidelberg (2008)
Scherer, S., Schwenker, F., Campbell, W.N., Palm, G.: Multimodal laughter detection in natural discourses. In: Proceedings of 3rd International Workshop on Human-Centered Robotic Systems, HCRS 2009 (2009)
Scherer, S., Schwenker, F., Palm, G.: Classifier fusion for emotion recognition from speech. In: 3rd IET International Conference on Intelligent Environments 2007 (IE 2007), pp. 152–155. IEEE, Los Alamitos (2007)
Wendt, B., Scheich, H.: The magdeburger prosodie korpus - a spoken language corpus for fmri-studies. In: Speech Prosody 2002, SProSIG (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Trentin, E., Scherer, S., Schwenker, F. (2010). Maximum Echo-State-Likelihood Networks for Emotion Recognition. In: Schwenker, F., El Gayar, N. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2010. Lecture Notes in Computer Science(), vol 5998. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12159-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-12159-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12158-6
Online ISBN: 978-3-642-12159-3
eBook Packages: Computer ScienceComputer Science (R0)