[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3242969.3242992acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Generating fMRI-Enriched Acoustic Vectors using a Cross-Modality Adversarial Network for Emotion Recognition

Published: 02 October 2018 Publication History

Abstract

Automatic emotion recognition has long been developed by concentrating on modeling human expressive behavior. At the same time, neuro-scientific evidences have shown that the varied neuro-responses (i.e., blood oxygen level-dependent (BOLD) signals measured from the functional magnetic resonance imaging (fMRI)) is also a function on the types of emotion perceived. While past research has indicated that fusing acoustic features and fMRI improves the overall speech emotion recognition performance, obtaining fMRI data is not feasible in real world applications. In this work, we propose a cross modality adversarial network that jointly models the bi-directional generative relationship between acoustic features of speech samples and fMRI signals of human percetual responses by leveraging a parallel dataset. We encode the acoustic descriptors of a speech sample using the learned cross modality adversarial network to generate the fMRI-enriched acoustic vectors to be used in the emotion classifier. The generated fMRI-enriched acoustic vector is evaluated not only in the parallel dataset but also in an additional dataset without fMRI scanning. Our proposed framework significantly outperform using acoustic features only in a four-class emotion recognition task for both datasets, and the use of cyclic loss in learning the bi-directional mapping is also demonstrated to be crucial in achieving improved recognition rates.

References

[1]
Mohammed Abdelwahab and Carlos Busso . 2015. Supervised domain adaptation for emotion recognition from speech Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 5058--5062.
[2]
Matthew Black, Athanasios Katsamanis, Chi-Chun Lee, Adam C Lammert, Brian R Baucom, Andrew Christensen, Panayiotis G Georgiou, and Shrikanth S Narayanan . 2010. Automatic classification of married couples' behavior using audio features Eleventh Annual Conference of the International Speech Communication Association.
[3]
P Boersma and D Weenink . 2001. Praat speech processing software. Institute of Phonetics Sciences of the University of Amsterdam. http://www. praat. org (2001).
[4]
Tony W Buchanan, Kai Lutz, Shahram Mirzazade, Karsten Specht, N Jon Shah, Karl Zilles, and Lutz J"ancke . 2000. Recognition of emotional prosody and verbal components of spoken language: an fMRI study. Cognitive Brain Research Vol. 9, 3 (2000), 227--238.
[5]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan . 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation Vol. 42, 4 (2008), 335.
[6]
Carlos Busso, Zhigang Deng, Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Sungbok Lee, Ulrich Neumann, and Shrikanth Narayanan . 2004. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on Multimodal interfaces. ACM, 205--211.
[7]
Jing Cai, Guangyuan Liu, and Min Hao . 2009. The research on emotion recognition from ECG signal Information Technology and Computer Science, 2009. ITCS 2009. International Conference on, Vol. Vol. 1. IEEE, 497--500.
[8]
Agisilaos Chartsias, Thomas Joyce, Rohan Dharmakumar, and Sotirios A Tsaftaris . 2017. Adversarial Image Synthesis for Unpaired Multi-modal Cardiac Data International Workshop on Simulation and Synthesis in Medical Imaging. Springer, 3--13.
[9]
Hsuan-Yu Chen, Yu-Hsien Liao, Heng-Tai Jan, Li-Wei Kuo, and Chi-Chun Lee . 2016. A Gaussian mixture regression approach toward modeling the affective dynamics between acoustically-derived vocal arousal score (VC-AS) and internal brain fMRI bold signal response. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 5775--5779.
[10]
Liyanage C De Silva, Tsutomu Miyasato, and Ryohei Nakatsu . 1997. Facial emotion recognition using multi-modal information Information, Communications and Signal Processing, 1997. ICICS., Proceedings of 1997 International Conference on, Vol. Vol. 1. IEEE, 397--401.
[11]
George H Dunteman . 1989. Principal components analysis. Number 69. Sage.
[12]
Thomas Ethofer, Dimitri Van De Ville, Klaus Scherer, and Patrik Vuilleumier . 2009. Decoding of emotional information in voice-sensitive cortices. Current Biology Vol. 19, 12 (2009), 1028--1033.
[13]
Daniel Joseph France, Richard G Shiavi, Stephen Silverman, Marilyn Silverman, and M Wilkes . 2000. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE transactions on Biomedical Engineering Vol. 47, 7 (2000), 829--837.
[14]
Sascha Frühholz, Wiebke Trost, and Didier Grandjean . 2014. The role of the medial temporal limbic system in processing emotions in voice and music. Progress in neurobiology Vol. 123 (2014), 1--17.
[15]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio . 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[16]
Didier Grandjean, David Sander, Gilles Pourtois, Sophie Schwartz, Mohamed L Seghier, Klaus R Scherer, and Patrik Vuilleumier . 2005. The voices of wrath: brain responses to angry prosody in meaningless speech. Nature neuroscience Vol. 8, 2 (2005), 145.
[17]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville . 2017. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems. 5769--5779.
[18]
Kun Han, Dong Yu, and Ivan Tashev . 2014. Speech emotion recognition using deep neural network and extreme learning machine Fifteenth Annual Conference of the International Speech Communication Association.
[19]
Wan-Ting Hsieh, Hao-Chun Yang, Ya-Tse Wu, Fu-Sheng Tsai, Li-Wei Kuo, and Chi-Chun Lee . 2018. Integrating Perceivers Neural-Perceptual Responses Using A Deep Voting Fusion Network For Automatic Vocal Emotion Decoding. In Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE.
[20]
Tom Johnstone, Carien M Van Reekum, Terrence R Oakes, and Richard J Davidson . 2006. The voice of emotion: an FMRI study of neural responses to angry and happy vocal expressions. Social Cognitive and Affective Neuroscience Vol. 1, 3 (2006), 242--249.
[21]
Ma Li, Quek Chai, Teo Kaixiang, Abdul Wahab, and Hüseyin Abut . 2009. EEG emotion recognition system. In In-vehicle corpus and signal processing for driver behavior. Springer, 125--135.
[22]
Josh Merel, Yuval Tassa, Sriram Srinivasan, Jay Lemmon, Ziyu Wang, Greg Wayne, and Nicolas Heess . 2017. Learning human behaviors from motion capture by adversarial imitation. arXiv preprint arXiv:1707.02201 (2017).
[23]
Wolfgang Minker and Samir Bennacef . 2004. Speech and human-machine dialog. Vol. Vol. 770. Springer Science & Business Media.
[24]
Mehdi Mirza and Simon Osindero . 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
[25]
Bengt Muthén and Kerby Shedden . 1999. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics Vol. 55, 2 (1999), 463--469.
[26]
Valery Petrushin . 1999. Emotion in speech: Recognition and application to call centers Proceedings of Artificial Neural Networks in Engineering, Vol. Vol. 710.
[27]
Tran Minh Quan, Thanh Nguyen-Duc, and Won-Ki Jeong . 2018. Compressed Sensing MRI Reconstruction using a Generative Adversarial Network with a Cyclic Loss. IEEE Transactions on Medical Imaging (2018).
[28]
James A Russell . 1980. A circumplex model of affect. Journal of personality and social psychology Vol. 39, 6 (1980), 1161.
[29]
Somchanok Tivatansakul, Michiko Ohkura, Supadchaya Puangpontip, and Tiranee Achalakul . 2014. Emotional healthcare system: Emotion detection by facial expressions using Japanese database. In Computer Science and Electronic Engineering Conference (CEEC), 2014 6th. IEEE, 41--46.
[30]
Samarth Tripathi and Homayoon Beigi . 2018. Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning. arXiv preprint arXiv:1804.05788 (2018).
[31]
Nathalie Tzourio-Mazoyer, Brigitte Landeau, Dimitri Papathanassiou, Fabrice Crivello, Olivier Etard, Nicolas Delcroix, Bernard Mazoyer, and Marc Joliot . 2002. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage Vol. 15, 1 (2002), 273--289.
[32]
Valentin Vielzeuf, Stéphane Pateux, and Frédéric Jurie . 2017. Temporal multimodal fusion for video emotion classification in the wild Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 569--576.
[33]
Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, and Shrikanth Narayanan . 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In Proc. INTERSPEECH 2010, Makuhari, Japan. 2362--2365.
[34]
Ya-Tse Wu, Hsuan-Yu Chen, Yu-Hsien Liao, Li-Wei Kuo, and Chi-Chun Lee . 2017. Modeling Perceivers Neural-Responses using Lobe-dependent Convolutional Neural Network to Improve Speech Emotion Recognition. Proc. Interspeech 2017 (2017), 3261--3265.
[35]
Chaogan Yan and Yufeng Zang . 2010. DPARSF: a MATLAB toolbox for" pipeline" data analysis of resting-state fMRI. Frontiers in systems neuroscience Vol. 4 (2010), 13.
[36]
Jingwei Yan, Wenming Zheng, Zhen Cui, Chuangao Tang, Tong Zhang, Yuan Zong, and Ning Sun . 2016. Multi-clue fusion for emotion recognition in the wild Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 458--463.
[37]
Zhihong Zeng, Maja Pantic, Glenn I Roisman, and Thomas S Huang . 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence Vol. 31, 1 (2009), 39--58.
[38]
Tinghui Zhou, Philipp Krahenbuhl, Mathieu Aubry, Qixing Huang, and Alexei A Efros . 2016. Learning dense correspondence via 3d-guided cycle consistency Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 117--126.
[39]
Tiantong Zhou, Hailing Wang, Ling Zou, Renlai Zhou, and Nong Qian . 2013. A study of neural mechanism in emotion regulation by simultaneous recording of EEG and fMRI based on ICA. In International Symposium on Neural Networks. Springer, 44--51.

Cited By

View all
  • (2025)Generative technology for human emotion recognition: A scoping reviewInformation Fusion10.1016/j.inffus.2024.102753115(102753)Online publication date: Mar-2025
  • (2023)EEG To FMRI Synthesis: Is Deep Learning a Candidate?Proceedings of the 31st International Conference on Information Systems Development10.62036/ISD.2023.26Online publication date: 2023
  • (2019)Effect of Feedback on Users’ Immediate Emotions: Analysis of Facial Expressions during a Simulated Target Detection Task2019 International Conference on Multimodal Interaction10.1145/3340555.3353732(49-58)Online publication date: 14-Oct-2019
  • Show More Cited By

Index Terms

  1. Generating fMRI-Enriched Acoustic Vectors using a Cross-Modality Adversarial Network for Emotion Recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
    October 2018
    687 pages
    ISBN:9781450356923
    DOI:10.1145/3242969
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    • SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. acoustic representation
    2. cross-modality adversarial network
    3. fmri
    4. speech emotion recognition

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICMI '18
    Sponsor:
    • SIGCHI

    Acceptance Rates

    ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Generative technology for human emotion recognition: A scoping reviewInformation Fusion10.1016/j.inffus.2024.102753115(102753)Online publication date: Mar-2025
    • (2023)EEG To FMRI Synthesis: Is Deep Learning a Candidate?Proceedings of the 31st International Conference on Information Systems Development10.62036/ISD.2023.26Online publication date: 2023
    • (2019)Effect of Feedback on Users’ Immediate Emotions: Analysis of Facial Expressions during a Simulated Target Detection Task2019 International Conference on Multimodal Interaction10.1145/3340555.3353732(49-58)Online publication date: 14-Oct-2019
    • (2019)CorrFeat: Correlation-based Feature Extraction Algorithm using Skin Conductance and Pupil Diameter for Emotion Recognition2019 International Conference on Multimodal Interaction10.1145/3340555.3353716(404-408)Online publication date: 14-Oct-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media