Combining Acoustic Features for Improved Emotion Recognition in Mandarin Speech

Tsang-Long Pao¹⁹,
Yu-Te Chen¹⁹,
Jun-Heng Yeh¹⁹ &
…
Wen-Yuan Liao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

5434 Accesses
1 Altmetric

Abstract

Combining different feature streams to obtain a more accurate experimental result is a well-known technique. The basic argument is that if the recognition errors of systems using the individual streams occur at different points, there is at least a chance that a combined system will be able to correct some of these errors by reference to the other streams. In the emotional speech recognition system, there are many ways in which this general principle can be applied. In this paper, we proposed using feature selection and feature combination to improve the speaker-dependent emotion recognition in Mandarin speech. Five basic emotions are investigated including anger, boredom, happiness, neutral and sadness. Combining multiple feature streams is clearly highly beneficial in our system. The best accuracy recognizing five different emotions can be achieved 99.44% by using MFCC, LPCC, RastaPLP, LFPC feature streams and the nearest class mean classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 103.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Speech Emotion Recognition Using Multiple Classifiers

Speech emotion recognition using multimodal feature fusion with machine learning approach

Article 21 April 2023

References

Rabiner, L.R., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Lee, C.M., Narayanan, S.: Towards detecting emotion in spoken dialogs. IEEE Trans. on Speech & Audio Processing (in press)
Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, A., Taylor, J.: Emotion Recognition in Human-Computer Interactions. IEEE Sig. Proc. Mag. 18, 32–80 (2001)
Article Google Scholar
Litman, D., ad Forbes, K.: Recognizing Emotions from Student Speech in Tutoring Dialogues. In: Proceedings of the ASRU 2003 (2003)
Google Scholar
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 614–636 (1996)
Google Scholar
Le, X.H., Quenot, G., Castelli, E.: Recognizing emotions for the audio-visual document indexing. In: Proceedings of Computers and Communications, ISCC, 2004, pp. 580–584 (2004)
Google Scholar
Nwe, T.L., Wei, F.S., De Silva, L.C.: Speech Emotion Recognition using Hidden Markov models. Speech Communication (2003)
Google Scholar
Hermansky, H., Morgan, N.: RASTA Processing of Speech. IEEE Transactions on Speech and Audio Processing 2(4) (October 1994)
Google Scholar
Ellis, D.P.W.: Stream combination before and/or after the acoustic model. In: Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2000 (2000a)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Tatung University,
Tsang-Long Pao, Yu-Te Chen, Jun-Heng Yeh & Wen-Yuan Liao

Authors

Tsang-Long Pao
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Te Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Heng Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Yuan Liao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pao, TL., Chen, YT., Yeh, JH., Liao, WY. (2005). Combining Acoustic Features for Improved Emotion Recognition in Mandarin Speech. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_36

Download citation

DOI: https://doi.org/10.1007/11573548_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics