[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Ensemble methods for spoken emotion recognition in call-centres

Published: 01 February 2007 Publication History

Abstract

Machine-based emotional intelligence is a requirement for more natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have not been applied in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods.

References

[1]
Instance-based learning algorithms. Machine Learning. v6. 37-66.
[2]
Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 2002), Denver, Colorado.
[3]
Elementary Linear Algebra. Von Hoffman Press.
[4]
How to find trouble in communication. Speech Communication. v40. 117-143.
[5]
Selection of relevant features and examples in machine learning. Aritificial Intelligence. v97. 245-271.
[6]
Recognition of affective communicative intent in robot-directed speech. Autonomous Robots. v12. 83-104.
[7]
Random forests. Machine Learning. v45. 5-32.
[8]
Cleary, J.G., Trigg, L.E., K¿: An instance-based learner using and entropic distance measure. In: ICML, 1995, pp. 108-114.
[9]
Nearest neighbour pattern classification. IEEE Transactions on Information Theory. v13. 21-27.
[10]
Cowan, M., xxx. Pitch and intensity characteristics of stage speech, Arch. Speech, Supplement to December Issue.
[11]
Personality, perceptual, and cognitive correlates of emotional sensitivity. In: Davitz, J.R. (Ed.), The Communication of Emotional Meaning, McGraw-Hill, New York.
[12]
Dellaert, F., Polzin, T., Waibel, A., 1996. Recognizing emotion in speech. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 1996), Philadelphia, PA, pp. 1970-1973.
[13]
Devillers, L., Vasilescu, I., Lamel, L., 2002. Annotation and detection of emotion in a task-oriented human-human dialog corpus. In: Proceedings of ISLE Workshop on Dialogue Tagging, Edinburgh.
[14]
Dieterle, F., 2003. Multianalyte quantifications by means of integration of artificial neural networks, genetic algorithms and chemometrics for time-resolved analytical data, Ph.D. thesis.
[15]
Ensemble learning. In: Arbib, M.A. (Ed.), The Handbook of Brain Theory and Neural Networks, The MIT Press, Cambridge, Massachusetts. pp. 405-408.
[16]
On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychological Bulletin. v128. 203-235.
[17]
Emmanouilidis, C., Hunter, A., MacIntyre, J., Cox, C., 1999. Multiple-criteria genetic algorithms for feature selection in neurofuzzy modeling. In: Proceedings of the International Joint Conference on Neural Networks, Washington, USA, pp. 4387-4392.
[18]
An experimental study of the durational characteristics of the voice during the expression of emotion. Speech Monograph. v8. 85-91.
[19]
An experimental study of the pitch characteristics of the voice during the expression of emotion. Speech Monograph. v6. 87-104.
[20]
A new method of investigating the perception of prosodic features. Language and Speech. v21. 34-49.
[21]
Fonagy, I., 1981. Emotions, voice and music. In: Sundberg, J. (Eds.), Research Aspects on Singing, Royal Swedish Academy of Music No. 33, pp. 51-79.
[22]
Emotional patterns in intonation and music. Kommunikationsforsch. v16. 293-326.
[23]
The prosodic expression of anger: differentiating thread and frustration. Aggressive Behaviour. v12. 121-128.
[24]
Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, Mass.
[25]
Neural Networks: A Comprehensive Foundation. Prentice-Hall, Upper Saddle River, New Jersey.
[26]
You beep machine - emotion in automatic speech understanding systems. In: Proceedings of the Workshop on Text, Speech, and Dialog, Masark University. pp. 223-228.
[27]
Huber, R., Batliner, A., Buckow, J., Noth, E., Warnke, V., Niemann, H., 2000. Recognition of emotion in a realistic dialogue scenario. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 2000), Vol. 1, Beijing, China, pp. 665-668.
[28]
Recognition of emotion from vocal cues. Arch. Gen. Psych. v43. 280-283.
[29]
Lee, C.M., Narayanan, S., xxx. Towards detecting emotion in spoken dialogs 13 (2).
[30]
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S., 2004. Emotion recognition based on phoneme classes. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 2004), Jeju Island, Korea.
[31]
Using context to improve emotion detection in spoken dialog systems. Interspeech. 1845-1848.
[32]
McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., Stroeve, S., 2000. Approaching automatic recognition of emotion from voice: a rough benchmark. In: Proceedings of the ISCA Workshop on Speech and Emotion, Belfast, Northern Ireland, pp. 200-205.
[33]
Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of the Acoustical Society of America. v93. 1097-1108.
[34]
Nakatsu, R., Nicholson, J., Tosa, N., 1999. Emotion recognition and its application to computer agents with spontaneous interactive capabilities. In: Proceedings of the International Conference on Multimedia Computing and Systems, Florence, Italy.
[35]
Nwe, T.L., 2003. Analysis and detection of human emotion and stress from speech signals, Ph.D. thesis, Department of Electrical and Computer Engineering, National University of Singapore.
[36]
Speech emotion recognition using hidden markov models. Speech Communication. v41. 603-623.
[37]
Speech Communications: Human and Machine. Second edition. IEEE Press, New York.
[38]
The identification of the mood of a speaker by hearing impaired listeners. Speech Transmission Lab. - Q. Prog. Stat. Rep. v4. 79-90.
[39]
Petrushin, V., 2000. Emotion recognition in speech signal: experimental study, development, and application. In: Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China.
[40]
Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (Eds.), Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge, Massachusetts.
[41]
Polzin, T.S., Waibel, A., 2000. Emotion-sensitive human-computer interfaces. In: Proceedings of the ISCA Workshop on Speech and Emotion, Belfast, Northern Ireland.
[42]
Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs, New Jersey.
[43]
Emotional intelligence: what do we know?. In: Manstead, A.S.R., Frijda, N.H., Fischer, A.H. (Eds.), Feelings and Emotions: The Amsterdam Symposium, Cambridge University Press, Cambridge, UK. pp. 321-340.
[44]
Scherer, K.R., 1996. Adding the affective dimension: a new look in speech analysis and synthesis. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 1996), Philadelphia, PA.
[45]
Vocal communication of emotion: a review of research paradigms. Speech Communication. v40. 227-256.
[46]
Seewald, A.K., 2002. How to make stacking better and faster while also taking care of an unknown weakness. In: Proceedings of the 19th International Conference on Machine Learning, San Francisco, California.
[47]
Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion. v3. 135-148.
[48]
A calibrated recording and analysis of the pitch, force and quality of vocal tones expressing happiness and sadness. Speech Monograph. v2. 81-137.
[49]
A robust algorithm for pitch tracking (rapt). In: Kleijn, W., Paliwal, K. (Eds.), Speech Coding and Synthesis, Elsevier Science B.V., The Netherlands. pp. 495-518.
[50]
Vafaie, H., De Jong, K., 1992. Genetic algorithms as a tool for feature selection in machine learning. In: Proceedings of the 4th International Conference on Tools with Artificial Intelligence, Arlington, VA.
[51]
The Nature of Statistical Learning Theory. Springer-Verlag, NY.
[52]
Emotions and speech: some acoustical correlates. In: Nonverbal Communication: Readings with Commentary, Oxford University Press, New York.
[53]
Stacked generalization. Neural Networks. v5. 241-260.
[54]
Yacoub, S., Simske, S., Lin, X., Burns, J., 2003. Recognition of emotion in interactive voice systems. In: Proceedings of Eurospeech 2003, 8th European Conference on Speech Communication and Technology, Geneva, Switzerland.

Cited By

View all
  • (2024)PCQ: Emotion Recognition in Speech via Progressive Channel QueryingAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5588-2_23(264-275)Online publication date: 5-Aug-2024
  • (2023)Using Voice and Biofeedback to Predict User Engagement during Product Feedback InterviewsACM Transactions on Software Engineering and Methodology10.1145/363571233:4(1-36)Online publication date: 6-Dec-2023
  • (2023)CCTG-NET: Contextualized Convolutional Transformer-GRU Network for speech emotion recognitionInternational Journal of Speech Technology10.1007/s10772-023-10080-726:4(1099-1116)Online publication date: 1-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Speech Communication
Speech Communication  Volume 49, Issue 2
February, 2007
76 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 February 2007

Author Tags

  1. Affect recognition
  2. Emotion recognition
  3. Ensemble methods
  4. Speech databases
  5. Speech processing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PCQ: Emotion Recognition in Speech via Progressive Channel QueryingAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5588-2_23(264-275)Online publication date: 5-Aug-2024
  • (2023)Using Voice and Biofeedback to Predict User Engagement during Product Feedback InterviewsACM Transactions on Software Engineering and Methodology10.1145/363571233:4(1-36)Online publication date: 6-Dec-2023
  • (2023)CCTG-NET: Contextualized Convolutional Transformer-GRU Network for speech emotion recognitionInternational Journal of Speech Technology10.1007/s10772-023-10080-726:4(1099-1116)Online publication date: 1-Dec-2023
  • (2023)An efficient speech emotion recognition based on a dual-stream CNN-transformer fusion networkInternational Journal of Speech Technology10.1007/s10772-023-10035-y26:2(541-557)Online publication date: 6-Jul-2023
  • (2023)A review of natural language processing in contact centre automationPattern Analysis & Applications10.1007/s10044-023-01182-826:3(823-846)Online publication date: 1-Aug-2023
  • (2023)Improved Speech Emotion Recognition Using Channel-wise Global Head Pooling (CwGHP)Circuits, Systems, and Signal Processing10.1007/s00034-023-02367-642:9(5500-5522)Online publication date: 6-Apr-2023
  • (2023)Investigating Acoustic Cues of Emotional Valence in Mandarin Speech Prosody - A Corpus ApproachChinese Lexical Semantics10.1007/978-981-97-0586-3_25(316-330)Online publication date: 19-May-2023
  • (2022)Ensemble Learning by High-Dimensional Acoustic Features for Emotion Recognition from Speech Audio SignalSecurity and Communication Networks10.1155/2022/87770262022Online publication date: 1-Jan-2022
  • (2022)Neural network-based blended ensemble learning for speech emotion recognitionMultidimensional Systems and Signal Processing10.1007/s11045-022-00845-933:4(1323-1348)Online publication date: 1-Dec-2022
  • (2022)Handling high dimensional features by ensemble learning for emotion identification from speech signalInternational Journal of Speech Technology10.1007/s10772-021-09916-x25:4(837-851)Online publication date: 1-Dec-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media