[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2993148.2993151acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Emotion spotting: discovering regions of evidence in audio-visual emotion expressions

Published: 31 October 2016 Publication History

Abstract

Research has demonstrated that humans require different amounts of information, over time, to accurately perceive emotion expressions. This varies as a function of emotion classes. For example, recognition of happiness requires a longer stimulus than recognition of anger. However, previous automatic emotion recognition systems have often overlooked these differences. In this work, we propose a data-driven framework to explore patterns (timings and durations) of emotion evidence, specific to individual emotion classes. Further, we demonstrate that these patterns vary as a function of which modality (lower face, upper face, or speech) is examined, and consistent patterns emerge across different folds of experiments. We also show similar patterns across emotional corpora (IEMOCAP and MSP-IMPROV). In addition, we show that our proposed method, which uses only a portion of the data (59% for the IEMOCAP), achieves comparable accuracy to a system that uses all of the data within each utterance. Our method has a higher accuracy when compared to a baseline method that randomly chooses a portion of the data. We show that the performance gain of the method is mostly from prototypical emotion expressions (defined as expressions with rater consensus). The innovation in this study comes from its understanding of how multimodal cues reveal emotion over time.

References

[1]
M. R. Amer, B. Siddiquie, S. Khan, A. Divakaran, and H. Sawhney. Multimodal fusion using dynamic hybrid models. In IEEE Winter Conference on Applications of Computer Vision, pages 556–563. IEEE, 2014.
[2]
P. Boersma and D. Weenink. Praat: doing phonetics by computer (version 6.0.17){computer program}. retrieved 21 april 2016 from http://www.praat.org/.
[3]
C. Busso, M. Bulut, C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee, and S. Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4):335–359, 2008.
[4]
C. Busso, S. Lee, and S. S. Narayanan. Using neutral speech models for emotional speech analysis. In Interspeech, pages 2225–2228, 2007.
[5]
C. Busso and S. S. Narayanan. Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Transactions on Audio, Speech, and Language Processing, pages 2331–2347, 2007.
[6]
C. Busso, S. Parthasarathy, A. Burmania, M. AbdelWahab, N. Sadoughi, and E. Mower Provost. MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 2015.
[7]
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011.
[8]
R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor. Emotion recognition in human-computer interaction. Signal Processing Magazine, IEEE, 18(1):32–80, 2001.
[9]
E. Cvejic, J. Kim, and C. Davis. Temporal relationship between auditory and visual prosodic cues. In Twelfth Annual Conference of the International Speech Communication Association, 2011.
[10]
S. Ebrahimi Kahou, V. Michalski, K. Konda, R. Memisevic, and C. Pal. Recurrent neural networks for emotion recognition in video. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 467–474. ACM, 2015.
[11]
K. Edwards. The face of time: Temporal cues in facial expressions of emotion. Psychological science, 9(4):270–276, 1998.
[12]
A. Freitas-Magalh˜ aes. Microexpression and macroexpression. Encyclopedia of human behavior, 2:173–183, 2012.
[13]
E. A. Haggard and K. S. Isaacs. Micromomentary facial expressions as indicators of ego mechanisms in psychotherapy. In Methods of research in psychotherapy, pages 154–165. Springer, 1966.
[14]
M. G. Helander. Handbook of human-computer interaction. Elsevier, 2014.
[15]
A. Ito, X. Wang, M. Suzuki, and S. Makino. Smile and laughter recognition using speech processing and face recognition from conversation video. In Cyberworlds. International Conference on, pages 8–pp. IEEE, 2005.
[16]
J. A. Jacko. Human Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications. CRC press, 2012.
[17]
M. Kächele, P. Thiam, G. Palm, F. Schwenker, and M. Schels. Ensemble methods for continuous affect recognition: Multi-modality, temporality, and challenges. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pages 9–16. ACM, 2015.
[18]
Y. Kim and E. Mower Provost. Say cheese vs. smile: Reducing speech-related variability for facial emotion recognition. In Proceedings of the ACM International Conference on Multimedia (ACM MM), pages 27–36. ACM, 2014.
[19]
Y. Kim and E. Mower Provost. Emotion recognition during speech using dynamics of multiple regions of the face. ACM Trans. Multimedia Comput. Commun. Appl., 12(1s):25:1–25:23, Oct. 2015.
[20]
G. Littlewort, J. Whitehill, T. Wu, I. Fasel, M. Frank, J. Movellan, and M. Bartlett. The computer expression recognition toolbox (cert). In IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG), pages 298–305, 2011.
[21]
M. Lugger, M.-E. Janoir, and B. Yang. Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In European Signal Processing Conference, pages 1225–1229. IEEE, 2009.
[22]
M. Mansoorizadeh and N. M. Charkari. Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools and Applications, 49(2):277–297, 2010.
[23]
S. Mariooryad and C. Busso. Feature and model level compensation of lexical content for facial emotion recognition. In IEEE International Conference on Automatic Face and Gesture Recognition (FG).
[24]
A. Metallinou, C. Busso, S. Lee, and S. Narayanan. Visual emotion recognition using compact facial representations and viseme information. In ICASSP.
[25]
A. Metallinou, S. Lee, and S. Narayanan. Decision level combination of multiple modalities for recognition and analysis of emotional expression. In International Conference on Acoustics Speech and Signal Processing, pages 2462–2465. IEEE, 2010.
[26]
A. Metallinou, M. Wöllmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan. Context-sensitive learning for enhanced audiovisual emotion classification. Affective Computing, IEEE Transactions on, 3(2):184–198, 2012.
[27]
E. Mower, M. J. Matari´ c, and S. Narayanan. A framework for automatic human emotion classification using emotion profiles. Transactions on Audio, Speech, and Language Processing.
[28]
E. Mower, A. Metallinou, C.-C. Lee, A. Kazemzadeh, C. Busso, S. Lee, and S. Narayanan. Interpreting ambiguous emotional expressions. In International Conference on Affective Computing and Intelligent Interaction and Workshops, pages 1–8. IEEE, 2009.
[29]
E. Mower and S. Narayanan. A hierarchical static-dynamic framework for emotion classification. In ICASSP, pages 2372–2375. IEEE, 2011.
[30]
E. Mower Provost. Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective flow. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 3682–3686. IEEE, 2013.
[31]
A. V. Nefian, L. Liang, X. Pi, L. Xiaoxiang, C. Mao, and K. Murphy. A coupled hmm for audio-visual speech recognition.
[32]
M. Pantic and L. J. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9):1370–1390, 2003.
[33]
M. D. Pell and S. A. Kotz. On the time course of vocal emotion recognition. PLoS One, 6(11):e27256, 2011.
[34]
S. Polikovsky, Y. Kameda, and Y. Ohta. Facial micro-expressions recognition using high speed camera and 3d-gradient descriptor. In International Conference on Crime Detection and Prevention (ICDP), pages 1–6. IET, 2009.
[35]
K. S. Rao and S. G. Koolagudi. Robust Emotion Recognition using Spectral and Prosodic Features. Springer Science & Business Media, 2013.
[36]
O. Rudovic, V. Pavlovic, and M. Pantic. Context-sensitive conditional ordinal random fields for facial action intensity estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 492–499, 2013.
[37]
A. Savran, B. Sankur, and M. T. Bilge. Regression-based intensity estimation of facial action units. Image and Vision Computing, 30(10):774–784, 2012.
[38]
B. Schuller, A. Batliner, D. Seppi, S. Steidl, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, N. Amir, L. Kessous, et al. The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In INTERSPEECH, pages 2253–2256.
[39]
Citeseer, 2007.
[40]
S. Steidl, M. Levit, A. Batliner, E. Nöth, and H. Niemann. “of all things the measure is man”: Automatic classification of emotions and inter-labeler consistency. In ICASSP, pages 317–320. Citeseer, 2005.
[41]
J. Wagner, E. Andre, F. Lingenfelser, and J. Kim. Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Transactions on Affective Computing, 2(4):206–218, 2011.

Cited By

View all
  • (2024)Pelios—Emotion Detection Using Machine Learning in Real TimeIntelligent Electrical Systems and Industrial Automation10.1007/978-981-97-6806-6_27(325-334)Online publication date: 29-Nov-2024
  • (2024)Speech Emotion Classification Using Deep LearningProceedings of 27th International Symposium on Frontiers of Research in Speech and Music10.1007/978-981-97-1549-7_2(19-31)Online publication date: 19-Jun-2024
  • (2023)Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.322174914:4(3164-3176)Online publication date: 1-Oct-2023
  • Show More Cited By

Index Terms

  1. Emotion spotting: discovering regions of evidence in audio-visual emotion expressions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction
    October 2016
    605 pages
    ISBN:9781450345569
    DOI:10.1145/2993148
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Audio-Visual
    2. Emotion
    3. Emotion Classification
    4. Emotion Spotting
    5. Temporal Evidence

    Qualifiers

    • Research-article

    Conference

    ICMI '16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Pelios—Emotion Detection Using Machine Learning in Real TimeIntelligent Electrical Systems and Industrial Automation10.1007/978-981-97-6806-6_27(325-334)Online publication date: 29-Nov-2024
    • (2024)Speech Emotion Classification Using Deep LearningProceedings of 27th International Symposium on Frontiers of Research in Speech and Music10.1007/978-981-97-1549-7_2(19-31)Online publication date: 19-Jun-2024
    • (2023)Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.322174914:4(3164-3176)Online publication date: 1-Oct-2023
    • (2023)Chunk-Level Speech Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal ModelingIEEE Transactions on Affective Computing10.1109/TAFFC.2021.308382114:2(1215-1227)Online publication date: 1-Apr-2023
    • (2023)Detecting emotion change instant in speech signal using spectral patterns in pitch coherent single frequency filtering spectrogramExpert Systems with Applications10.1016/j.eswa.2023.120882232(120882)Online publication date: Dec-2023
    • (2022)Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2020.298366913:2(992-1004)Online publication date: 1-Apr-2022
    • (2022)Refined Feature Vectors for Human Emotion Classifier by combining multiple learning strategies with Recurrent Neural Networks2022 International Conference on Breakthrough in Heuristics And Reciprocation of Advanced Technologies (BHARAT)10.1109/BHARAT53139.2022.00042(160-165)Online publication date: Apr-2022
    • (2022)A novel approach to detect instant emotion change through spectral variation in single frequency filtering spectrogram of each pitch cycleMultimedia Tools and Applications10.1007/s11042-022-13731-082:6(9413-9429)Online publication date: 9-Sep-2022
    • (2022)Joint modelling of audio-visual cues using attention mechanisms for emotion recognitionMultimedia Tools and Applications10.1007/s11042-022-13557-w82:8(11239-11264)Online publication date: 5-Aug-2022
    • (2021)Emotion Recognition using Speech Data with Convolutional Neural Network2021 IEEE 2nd International Conference on Signal, Control and Communication (SCC)10.1109/SCC53769.2021.9768372(182-187)Online publication date: 20-Dec-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media