[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICASSP.2017.7953132guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Effective emotion recognition in movie audio tracks

Published: 05 March 2017 Publication History

Abstract

This paper addresses the problem of speech emotion recognition from movie audio tracks. The recently collected Acted Facial Expression in the Wild 5.0 database is used. The aim is to discriminate among angry, happy, and neutral. We extract a relatively small number of features, a subset of which is not commonly used for the emotion recognition task. Those features are fed as input to an ensemble classifier that combines random forests with support vector machines. An accuracy of 65.63% is reported, outperforming a baseline system that uses the K-nearest neighbor classifier and has an accuracy of 56.88%. To verify the suitability of the exploited features, the same ensemble classification schema is applied on the feature set similar those employed in Audio/Visual Emotion Challenge 2011. In the latter case, an accuracy of 61.25% is achieved using a large set of 1582 features, as opposed to just 86 features in our case that lead to a relative improvement of 7.15% in accuracy.

6. References

[1]
R. W. Picard, Affective Computing, MIT Press, Cambridge, MA, USA, 1997.
[2]
G. Evangelopoulos, A. Zlatintsi, A. Potamianos, P. Maragos, K. Rapantzikos, G. Skoumas, and Y. Avrithis, “Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention”, IEEE Transactions on Multimedia, vol. 15, no. 7, pp. 1553–1568, Nov. 2013.
[3]
P. Song, S. Ou, W. Zheng, Y. Jin, and L. Zhao, “Speech emotion recognition using transfer non-negative matrix factorization”, in ICASSP 2016, Mar. 2016, pp. 5180–5184.
[4]
Y. E. Kim, E. M. Schmidt, R. Migneco, B.G. Morton, P. Richardson, J. Scott, J. A. Speck, and D. Turnbull, “Mu-sic emotion recognition: A state of the art review”, in ISMIR 2010, Utrecht, The Netherlands, Aug. 2010, pp. 255–266.
[5]
Y. Xiaobu, “Lightly-supervised utterance-level emotion identification using latent topic modeling of multimodal words”, in ICSI/CCI 2015, Jun. 2015, pp. 297–308.
[6]
F. B. Pokorny, F. Graf, F. Pernkopf, and B. W. Schuller, “Detection of negative emotions in speech signals using bags-of-audio-words”, in ACII 2015, Sep. 2015, pp. 879–884.
[7]
C.N. Anagnostopoulos, T. Iliou, and I. Giannoukos, “Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011”, Artificial Intelligence Review, vol. 43, no. 2, pp. 155–177, Feb. 2015.
[8]
Z. Yang and S. Narayanan, “Lightly-supervised utterance-level emotion identification using latent topic modeling of multimodal words”, in ICASSP 2016, Mar. 2016, pp. 2767–2771.
[9]
F. Eyben, K. Scherer, B. Schuller, J. Sundberg, E. Andre, C. Busso, L. Devillers, J. Epps, P. Laukka, S. Narayanan, and K. Truong, “The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing”, IEEE Transactions on Affective Computing, 2016, to appear.
[10]
T. Kostoulas, T. Ganchev, A. Lazaridis, and N. Fakotakis, Text, Speech and Dialogue: 13th International Conference, TSD 2010, Brno, Czech Republic, September 6–10, 2010. Proceedings, chapter Enhancing Emotion Recognition from Speech through Feature Selection, pp. 338–344, Springer Berlin Heidelberg Berlin, Heidelberg, 2010.
[11]
B. Schuller, G. Rigoll, and M. Lang, “Hidden markov model-based speech emotion recognition”, in ICASSP 2003, Apr. 2003, vol. 2, pp. 11–1-4 vol. 2.
[12]
B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz, A. Wendemuth, and G. Rigoll, “Cross-corpus acoustic emotion recognition: Variances and strategies”, IEEE Transactions on Affective Computing, vol. 1, no. 2, pp. 119–131, Jul. 2010.
[13]
G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network”, in ICASSP 2016, Mar. 2016.
[14]
T. L. Pao, W. Y. Liao, Y. T. Chen, J. H. Yeh, Y. M. Cheng, and C. S. Chien, “Comparison of several classifiers for emotion recognition from noisy mandarin speech”, in IIHMSP 2007, Nov. 2007, vol. 1, pp. 23–26.
[15]
H. Kaya, F. Gürpinar S. Afshar, and A. A. Salah, “Contrasting and combining least squares based learners for emotion recognition in the wild”, in ICMI'15, New York, NY, USA, 2015, ICMI'15, pp. 459–466, ACM.
[16]
A. Dhall, O.V. R. Murthy, R. Goecke, J. Joshi, and T. Gedeon, “Video and image based emotion recognition challenges in the wild: EmotiW 2015”, in ICMI 2015, NY, USA, 2015, pp. 423–426.
[17]
J. Wu and H. Lin, Z. and Zha, “Multiple models fusion for emotion recognition in the wild”, in ICMI 2015.
[18]
B. Sun, L. Li, G. Zhou, X. Wu, J. He, L. Yu, D. Li, and Q. Wei, “Combining multimodal features within a fusion network for emotion recognition in the wild”, in ICMI 2015, New York, NY, USA, 2015, ICMI'15, pp. 497–502, ACM.
[19]
A. Criminisi, J. Shotton, and E. Konukoglu, “Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning”, Foundations and Trends in ComputerGraphics and Vision, vol. 7, no. 2–3, pp. 81–227, Feb. 2012.
[20]
T. Drugman, Y. Stylianou, Y. Kida, and M. Akamine, “Voice activity detection: Merging source and filter-based information”, IEEE Signal Processing Letters, vol. 23, no. 2, pp. 252–256, Feb. 2016.
[21]
S. O. Sadjadi and J. H. L. Hansen, “Unsupervised speech activity detection using voicing measures and perceptual spectral flux”, IEEE Signal Processing Letters, vol. 20, no. 3, pp. 197–200, Mar. 2013.
[22]
M. Kotti and F. Paternò “Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema”, International Journal of Speech Technology, vol. 15, no. 2, pp. 131–150, 2012.
[23]
F. Eyben, M. Wllmer and B. Schuller, “openEAR-introducing the munich open-source emotion and affect recognition toolkit”, in ACII 2009, Sep. 2009, pp. 1–6.
[24]
F. Eyben, M. Wöllmer and B. Schuller, “openSMILE: The Munich versatile and fast open-source audio feature extractor”, in MM 2010, New York, NY, USA, 2010, MM'10, pp. 1459–1462, ACM.
[25]
B. Schuller, M. Valstar F. Eyben, G. McKeown, R. Cowie, and M. Pantic “AVEC 2011 the first internationalaudio/visual emotion challenge”, in ACII 2009, Oct. 2011, pp. 415–424.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Mar 2017
6527 pages

Publisher

IEEE Press

Publication History

Published: 05 March 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media