[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3395035.3425358acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

MEMOS: A Multi-modal Emotion Stream Database for Temporal Spontaneous Emotional State Detection

Published: 27 December 2020 Publication History

Abstract

Mental health applications are increasingly interested in using audio-visual and physiological measurements to detect the emotional state of a person, where significant researches aim to detect episodic emotional state. The availability of wearable devices and advanced signals is attracting researchers to explore the detection of a continuous sequence of emotion categories, referred to as emotion stream, for understanding mental health. Currently, there are no established databases for experimenting with emotion streams. In this paper, we make two contributions. First, we collect a Multi-modal EMOtion Stream (MEMOS) database in the scenario of social games. Audio-video recordings of the players are made via mobile phones and aligned Electrocardiogram (ECG) signals are collected by wearable sensors. Totally 40 multi-modal sessions have been recorded, each lasting between 25 to 70 minutes. Emotional states with time boundaries are self-reported and annotated by the participants while watching the video recordings. Secondly, we propose a two-step emotional state detection framework to automatically determine the emotion categories with their time boundaries along the video recordings. Experiments on the MEMOS database provide the baseline result for temporal emotional state detection research, with average mean-average-precision (mAP) score as 8.109% on detecting the five emotions (happiness, sadness, anger, surprise, other negative emotions) in videos. It is higher than 5.47% where the emotions are detected by averaging the frame-level confidence scores (obtained by Face++ emotion recognition API) in the segments from a sliding window. We expect that this paper will introduce a novel research problem and provide a database for related research.

References

[1]
Samuel Albanie, Arsha Nagrani, Andrea Vedaldi, and Andrew Zisserman. 2018. Emotion recognition in speech using cross-modal transfer in the wild. In Proceedings of the 26th ACM international conference on Multimedia. 292--301.
[2]
American Psychiatric Association et al. 2013. Diagnostic and statistical manual of mental disorders (DSM-5®) .American Psychiatric Pub.
[3]
Tadas Baltruvs aitis, Peter Robinson, and Louis-Philippe Morency. 2016. Openface: an open source facial behavior analysis toolkit. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1--10.
[4]
Tanja B"anziger, Hannes Pirker, and K Scherer. 2006. GEMEP-GEneva Multimodal Emotion Portrayals: A corpus for the study of multimodal emotional expressions. In Proceedings of LREC, Vol. 6. 15--019.
[5]
Jiali Bian, Xue Mei, Yu Xue, Liang Wu, and Yao Ding. 2019. Efficient hierarchical temporal segmentation method for facial expression sequences. Turkish Journal of Electrical Engineering & Computer Sciences, Vol. 27, 3 (2019), 1680--1695.
[6]
W Björn and Liu Schuller. 2018. Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends, J. Communications of the Acm, Vol. 61, 5 (2018), 90--99.
[7]
De'Aira Bryant and Ayanna Howard. 2019. A Comparative Analysis of Emotion-Detecting AI Systems with Respect to Algorithm Performance and Dataset Diversity. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 377--382.
[8]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, Vol. 42, 4 (2008), 335.
[9]
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the ieee conference on computer vision and pattern recognition. 961--970.
[10]
Juan Abdon Miranda Correa, Mojtaba Khomami Abadi, Niculae Sebe, and Ioannis Patras. 2018. Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing (2018).
[11]
Fernando De la Torre, Joan Campoy, Zara Ambadar, and Jeffrey F Cohn. 2007. Temporal segmentation of facial behavior. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.
[12]
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2016. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceedings of the 18th ACM international conference on multimodal interaction. 427--432.
[13]
Abhinav Dhall, Roland Goecke, Simon Lucey, and Tom Gedeon. 2012. Collecting large, richly annotated facial-expression databases from movies. IEEE multimedia 3 (2012), 34--41.
[14]
Paul Ekman, E Richard Sorenson, and Wallace V Friesen. 1969. Pan-cultural elements in facial displays of emotion. Science, Vol. 164, 3875 (1969), 86--88.
[15]
Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, and Cuong Duc Dao. 2018. The activitynet large-scale activity recognition challenge 2018 summary. arXiv preprint arXiv:1808.03766 (2018).
[16]
Laleh Jalali, Da Huo, Hyungik Oh, Mengfan Tang, Siripen Pongpaichet, and Ramesh Jain. 2014. Personicle: personal chronicle of life events. In Workshop on Personal Data Analytics in the Internet of Things (PDA@ IOT) at the 40th International Conference on Very Large Databases (VLDB), Hangzhou, China .
[17]
Hae-Young Kim. 2014. Analysis of variance (ANOVA) comparing means of more than two groups. Restorative dentistry & endodontics, Vol. 39, 1 (2014), 74--77.
[18]
Dimitrios Kollias and Stefanos Zafeiriou. 2018. Aff-wild2: Extending the aff-wild database for affect recognition. arXiv preprint arXiv:1811.07770 (2018).
[19]
Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Antoine Toisoul, Bjoern W Schuller, et al. 2019. SEWA DB: A rich database for audio-visual emotion and sentiment research in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).
[20]
Ya Li, Jianhua Tao, Linlin Chao, Wei Bao, and Yazhu Liu. 2017. CHEAVD: a Chinese natural emotional audio--visual database. Journal of Ambient Intelligence and Humanized Computing, Vol. 8, 6 (2017), 913--924.
[21]
Ya Li, Jianhua Tao, Björn Schuller, Shiguang Shan, Dongmei Jiang, and Jia Jia. 2018. Mec 2017: Multimodal emotion recognition challenge. In 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia). IEEE, 1--5.
[22]
Olivier Martin, Irene Kotsia, Benoit Macq, and Ioannis Pitas. 2006. The eNTERFACE'05 audio-visual emotion database. In 22nd International Conference on Data Engineering Workshops (ICDEW'06). IEEE, 8--8.
[23]
Gary McKeown, Michel Valstar, Roddy Cowie, Maja Pantic, and Marc Schroder. 2011. The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE transactions on affective computing, Vol. 3, 1 (2011), 5--17.
[24]
Nitish Nag and Ramesh Jain. 2019. A Navigational Approach to Health: Actionable Guidance for Improved Quality of Life. Computer, Vol. 52, 4 (2019), 12--20.
[25]
Hyungik Oh and Ramesh Jain. 2017. From multimedia logs to personal chronicles. In Proceedings of the 25th ACM international conference on Multimedia. 881--889.
[26]
Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic. 2015. AVEC 2015: The 5th international audio/visual emotion challenge and workshop. In Proceedings of the 23rd ACM international conference on Multimedia. 1335--1336.
[27]
Fabien Ringeval, Björn Schuller, Michel Valstar, Nicholas Cummins, et al. 2019. AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop. 3--12.
[28]
Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, 1--8.
[29]
Björn Schuller, Stefan Steidl, and Anton Batliner. 2009. The interspeech 2009 emotion challenge. In Tenth Annual Conference of the International Speech Communication Association .
[30]
Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, et al. 2013. The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France .
[31]
Saul Shiffman, Arthur A Stone, and Michael R Hufford. 2008. Ecological momentary assessment. Annu. Rev. Clin. Psychol., Vol. 4 (2008), 1--32.
[32]
Zheng Shou, Dongang Wang, and Shih-Fu Chang. 2016. Temporal action localization in untrimmed videos via multi-stage cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1049--1058.
[33]
Mohammad Soleymani, Jeroen Lichtenauer, Thierry Pun, and Maja Pantic. 2011. A multimodal database for affect recognition and implicit tagging. IEEE transactions on affective computing, Vol. 3, 1 (2011), 42--55.
[34]
Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. International journal of computer vision, Vol. 104, 2 (2013), 154--171.
[35]
Michel F Valstar, Bihan Jiang, Marc Mehu, Maja Pantic, and Klaus Scherer. 2011. The first facial expression recognition and analysis challenge. In Face and Gesture 2011. IEEE, 921--926.
[36]
Chi-Keng Wu, Pau-Choo Chung, and Chi-Jen Wang. 2012. Representative segment-based emotion analysis and classification with automatic respiration signal segmentation. IEEE Transactions on Affective Computing, Vol. 3, 4 (2012), 482--495.
[37]
Linhong Xu, Hongfei Lin, Yu Pan, Hui Ren, and Jianmei Chen. 2008. Constructing the affective lexicon ontology. Journal of the China society for scientific and technical information, Vol. 27, 2 (2008), 180--185.

Cited By

View all
  • (2025)AVES: An Audio-Visual Emotion Stream Dataset for Temporal Emotion DetectionIEEE Transactions on Affective Computing10.1109/TAFFC.2024.344092416:1(438-450)Online publication date: Jan-2025
  • (2024)Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning AlgorithmsICST Transactions on Scalable Information Systems10.4108/eetsis.4805Online publication date: 22-Apr-2024
  • (2024)How do emotions evolve? The effect of emotions in the interactive scenes of a new product trialThe Design Journal10.1080/14606925.2024.237217827:5(934-953)Online publication date: 10-Jul-2024

Index Terms

  1. MEMOS: A Multi-modal Emotion Stream Database for Temporal Spontaneous Emotional State Detection
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction
            October 2020
            548 pages
            ISBN:9781450380027
            DOI:10.1145/3395035
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 27 December 2020

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. emotion stream
            2. emotional state detection
            3. multi-modal database

            Qualifiers

            • Research-article

            Conference

            ICMI '20
            Sponsor:
            ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
            October 25 - 29, 2020
            Virtual Event, Netherlands

            Acceptance Rates

            Overall Acceptance Rate 453 of 1,080 submissions, 42%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)32
            • Downloads (Last 6 weeks)2
            Reflects downloads up to 01 Mar 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2025)AVES: An Audio-Visual Emotion Stream Dataset for Temporal Emotion DetectionIEEE Transactions on Affective Computing10.1109/TAFFC.2024.344092416:1(438-450)Online publication date: Jan-2025
            • (2024)Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning AlgorithmsICST Transactions on Scalable Information Systems10.4108/eetsis.4805Online publication date: 22-Apr-2024
            • (2024)How do emotions evolve? The effect of emotions in the interactive scenes of a new product trialThe Design Journal10.1080/14606925.2024.237217827:5(934-953)Online publication date: 10-Jul-2024

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media