[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3678957.3685717acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article
Open access

Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual Data

Published: 04 November 2024 Publication History

Abstract

Stress detection in real-world settings presents significant challenges due to the complexity of human emotional expression influenced by biological, psychological, and social factors. While traditional methods like EEG, ECG, and EDA sensors provide direct measures of physiological responses, they are unsuitable for everyday environments due to their intrusive nature. Therefore, using non-contact, commonly available sensors like cameras and microphones to detect stress would be helpful. In this work, we use stress indicators from four key affective modalities extracted from audio-visual data: facial expressions, vocal prosody, textual sentiment, and physical fidgeting. To achieve this, we first labeled 353 video clips featuring individuals in monologue scenarios discussing personal experiences, indicating whether or not the individual is stressed based on our four modalities. Then, to effectively integrate signals from the four modalities, we extract stress signals from our audio-visual data using unimodal classifiers. Finally, to explore how the different modalities would interact to predict if a person is stressed, we compare the performance of three multimodal fusion methods: intermediate fusion, voting-based late fusion, and learning-based late fusion. Results indicate that combining multiple modes of information can effectively leverage the strengths of different modalities and achieve an F1 score of 0.85 for binary stress detection. Moreover, an ablation study shows that the more modalities are integrated, the higher the F1 score for detecting stress across all fusion techniques, demonstrating that our selected modalities possess complementary stress indicators.

References

[1]
[n. d.]. Next-Generation Pose Detection with MoveNet and TensorFlow.js. https://blog.tensorflow.org/2021/05/next-generation-pose-detection-with-movenet-and-tensorflowjs.html
[2]
Muhammad Abdullah, Mobeen Ahmad, and Dongil Han. 2021. Hierarchical attention approach in multimodal emotion recognition for human robot interaction. In 2021 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC). IEEE, 1–4.
[3]
Timothy Adamson, Debasmita Ghose, Shannon C Yasuda, Lucas Jehu Silva Shepard, Michal A Lewkowicz, Joyce Duan, and Brian Scassellati. 2021. Why we should build robots that both teach and learn. In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction. 187–196.
[4]
Fares Al-Shargie, Masashi Kiguchi, Nasreen Badruddin, Sarat C Dass, Ahmad Fadzil Mohammad Hani, and Tong Boon Tang. 2016. Mental stress assessment using simultaneous measurement of EEG and fNIRS. Biomedical optics express 7, 10 (2016), 3882–3898.
[5]
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 (2020), 12449–12460.
[6]
Serdar Baltaci and Didem Gokcay. 2016. Stress detection in human–computer interaction: Fusion of pupil dilation and facial temperature features. International Journal of Human–Computer Interaction 32, 12 (2016), 956–966.
[7]
Tanja Bänziger and Klaus R Scherer. 2010. Introducing the geneva multimodal emotion portrayal (gemep) corpus. Blueprint for affective computing: A sourcebook 2010 (2010), 271–94.
[8]
Pablo Barros, Nikhil Churamani, Egor Lakomkin, Henrique Sequeira, Alexander Sutherland, and Stefan Wermter. 2018. The OMG-Emotion Behavior Dataset. In 2018 International Joint Conference on Neural Networks (IJCNN) (Rio de Janeiro, Brazil). IEEE, 1408–1414. https://doi.org/10.1109/IJCNN.2018.8489099
[9]
Anton Batliner, Christian Hacker, Stefan Steidl, Elmar Nöth, Shona D’Arcy, Martin J. Russell, and Michael Wong. 2004. “You Stupid Tin Box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus. In International Conference on Language Resources and Evaluation. https://api.semanticscholar.org/CorpusID:1027542
[10]
Linda Becker, Alexander Heimerl, and Elisabeth André. 2023. ForDigitStress: presentation and evaluation of a new laboratory stressor using a digital job interview-scenario. Frontiers in Psychology 14 (2023). https://doi.org/10.3389/fpsyg.2023.1182959
[11]
Laura Boccanfuso, Quan Wang, Iolanda Leite, Beibin Li, Colette Torres, Lisa Chen, Nicole Salomons, Claire Foster, Erin Barney, Yeojin Amy Ahn, 2016. A thermal emotion classifier for improved human-robot interaction. In 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 718–723.
[12]
Margaret M Bradley and Peter J Lang. 2000. Measuring emotion: Behavior, feeling, and physiology. (2000).
[13]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42 (2008), 335–359.
[14]
Sara Campanella, Ayham Altaleb, Alberto Belli, Paola Pierleoni, and Lorenzo Palma. 2023. A method for stress detection using empatica E4 bracelet and machine-learning techniques. Sensors 23, 7 (2023), 3565.
[15]
Houwei Cao, David G Cooper, Michael K Keutmann, Ruben C Gur, Ani Nenkova, and Ragini Verma. 2014. Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions on affective computing 5, 4 (2014), 377–390.
[16]
Haoyu Chen, Henglin Shi, Xin Liu, Xiaobai Li, and Guoying Zhao. 2023. Smg: A micro-gesture dataset towards spontaneous body gestures for emotional stress state analysis. International Journal of Computer Vision 131, 6 (2023), 1346–1366.
[17]
Juan Abdon Miranda Correa, Mojtaba Khomami Abadi, Niculae Sebe, and I. Patras. 2017. AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups. IEEE Transactions on Affective Computing 12 (2017), 479–493. https://api.semanticscholar.org/CorpusID:8743034
[18]
Lynn A. Fairbanks, Michael T. McGuire, and Candace J. Harris. 1982. Nonverbal interaction of patients and therapists during psychiatric interviews.Journal of Abnormal Psychology 91, 2 (1982), 109–119. https://doi.org/10.1037/0021-843X.91.2.109
[19]
Gabriele Fanelli, Juergen Gall, Harald Romsdorfer, Thibaut Weise, and Luc Van Gool. 2010. A 3-D Audio-Visual Corpus of Affective Communication. IEEE Transactions on Multimedia 12 (2010), 591–598. https://api.semanticscholar.org/CorpusID:5326393
[20]
Gabriele Fanelli, Juergen Gall, Harald Romsdorfer, Thibaut Weise, and Luc Van Gool. 2010. A 3-d audio-visual corpus of affective communication. IEEE Transactions on Multimedia 12, 6 (2010), 591–598.
[21]
Gunnar Farnebäck. 2003. Two-frame motion estimation based on polynomial expansion. In Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, June 29–July 2, 2003 Proceedings 13. Springer, 363–370.
[22]
Jing Gao, Peng Li, Zhikui Chen, and Jianing Zhang. 2020. A survey on deep learning for multimodal data fusion. Neural Computation 32, 5 (2020), 829–864.
[23]
Mihai Gavrilescu and Nicolae Vizireanu. 2019. Predicting depression, anxiety, and stress levels from videos using the facial action coding system. Sensors 19, 17 (2019), 3693.
[24]
Debasmita Ghose, Shasvat M Desai, Sneha Bhattacharya, Deep Chakraborty, Madalina Fiterau, and Tauhidur Rahman. 2019. Pedestrian detection in thermal images using saliency maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.
[25]
Giorgos Giannakakis, Dimitris Manousos, Vaggelis Chaniotakis, and Manolis Tsiknakis. 2018. Evaluation of head pose features for stress detection and classification. In 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI). 406–409. https://doi.org/10.1109/BHI.2018.8333454
[26]
Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, 2013. Challenges in representation learning: A report on three machine learning contests. In Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, November 3-7, 2013. Proceedings, Part III 20. Springer, 117–124.
[27]
Rain Eric Haamer, Eka Rusadze, Iiris Lüsi, Tauseef Ahmed, Sergio Escalera, and Gholamreza Anbarjafari. 2017. Review on Emotion Recognition Databases. In Human-Robot Interaction, Gholamreza Anbarjafari and Sergio Escalera (Eds.). IntechOpen, Rijeka, Chapter 3. https://doi.org/10.5772/intechopen.72748
[28]
Yu He, Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao, Meng Wang, and Yuan Cheng. 2022. Multimodal Temporal Attention in Sentiment Analysis. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge. 61–66.
[29]
Muhammad Syazani Hafiy Hilmy, Ani Liza Asnawi, Ahmad Zamani Jusoh, Khaizuran Abdullah, Siti Noorjannah Ibrahim, Huda Adibah Mohd Ramli, and Nor Fadhillah Mohamed Azmin. 2021. Stress classification based on speech analysis of MFCC feature via machine learning. In 2021 8th International Conference on Computer and Communication Engineering (ICCCE). IEEE, 339–343.
[30]
Mimansa Jaiswal, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea, and Emily Mower Provost. 2020. MuSE: a Multimodal Dataset of Stressed Emotion. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association, Marseille, France, 1499–1510. https://aclanthology.org/2020.lrec-1.187
[31]
Rebecca Jürgens, Annika Grass, Matthis Drolet, and Julia Fischer. 2015. Effect of Acting Experience on Emotion Expression and Recognition in Voice: Non-Actors Provide Better Stimuli than Expected. Journal of Nonverbal Behavior 39, 3 (01 Sep 2015), 195–214. https://doi.org/10.1007/s10919-015-0209-5
[32]
C Jyotsna, J Amudha, Amritanshu Ram, and Giandomenico Nollo. 2023. IntelEye: An intelligent tool for the detection of stressful state based on eye gaze data while watching video. Procedia Computer Science 218 (2023), 1270–1279.
[33]
Manasa Kalanadhabhatta, Shaily Roy, Trevor Grant, Asif Salekin, Tauhidur Rahman, and Dessa Bergen-Cico. 2023. Detecting PTSD Using Neural and Physiological Signals: Recommendations from a Pilot Study. In 2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1–8.
[34]
Kyriaki Kalimeri and Charalampos Saitis. 2016. Exploring multimodal biosignal features for stress detection during indoor mobility. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (Tokyo, Japan) (ICMI ’16). Association for Computing Machinery, New York, NY, USA, 53–60. https://doi.org/10.1145/2993148.2993159
[35]
Mitchel Kappen, Kristof Hoorelbeke, Nilesh Madhu, Kris Demuynck, and Marie-Anne Vanderhasselt. 2022. Speech as an indicator for psychosocial stress: A network analytic approach. Behavior Research Methods (2022), 1–12.
[36]
N Keshan, PV Parimi, and Isabelle Bichindaritz. 2015. Machine learning for stress detection from ECG signals in automobile drivers. In 2015 IEEE International conference on big data (Big Data). IEEE, 2661–2669.
[37]
Soheil Khorram, Mimansa Jaiswal, John Gideon, Melvin G. McInnis, and Emily Mower Provost. 2018. The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild. ArXiv abs/1806.10658 (2018). https://api.semanticscholar.org/CorpusID:49523812
[38]
Hye-Geum Kim, Eun-Jin Cheon, Dai-Seg Bai, Young Hwan Lee, and Bon-Hoon Koo. 2018. Stress and heart rate variability: a meta-analysis and review of the literature. Psychiatry investigation 15, 3 (2018), 235.
[39]
Dimitrios Kollias and Stefanos Zafeiriou. 2019. Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition. arxiv:1811.07770 [cs.CV]
[40]
Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Björn Schuller, Kam Star, Elnar Hajiyev, and Maja Pantic. 2019. SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2019), 1022–1040. https://api.semanticscholar.org/CorpusID:57759395
[41]
Satish Kumar, ASM Iftekhar, Michael Goebel, Tom Bullock, Mary H MacLean, Michael B Miller, Tyler Santander, Barry Giesbrecht, Scott T Grafton, and BS Manjunath. 2021. StressNet: detecting stress in thermal videos. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 999–1009.
[42]
Iulia Lefter, Gertjan J Burghouts, and Leon JM Rothkrantz. 2014. An audio-visual dataset of human–human interactions in stressful situations. Journal on Multimodal User Interfaces 8 (2014), 29–41.
[43]
Iulia Lefter, Gertjan J. Burghouts, and Léon J. M. Rothkrantz. 2014. An audio-visual dataset of human–human interactions in stressful situations. Journal on Multimodal User Interfaces 8 (2014), 29–41. https://api.semanticscholar.org/CorpusID:207402069
[44]
Wei Li, Farnaz Abtahi, Christina Tsangouri, and Zhigang Zhu. 2016. Towards an “In-the-Wild” Emotion Dataset Using a Game-Based Framework. 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2016), 1526–1534. https://api.semanticscholar.org/CorpusID:4629457
[45]
Yuan Li, Du Li, Yiyu Xu, Xuelin Yuan, and Xiangwei Zhu. 2024. Human State Recognition Using Ultra Wideband Radar Based On CvT. IEEE Internet of Things Journal (2024).
[46]
Chung-Yen Liao, Rung-Ching Chen, and Shao-Kuo Tai. 2018. Emotion stress detection using EEG signal and deep learning technologies. In 2018 IEEE International Conference on Applied System Invention (ICASI). IEEE, 90–93.
[47]
Weizhe Lin, Indigo Orton, Mingyu Liu, and Marwa Mahmoud. 2020. Automatic Detection of Self-Adaptors for Psychological Distress. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 371–378. https://doi.org/10.1109/FG47880.2020.00032
[48]
Steven R Livingstone and Frank A Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS one 13, 5 (2018), e0196391.
[49]
Yunfei Luo, Iman Deznabi, Abhinav Shaw, Natcha Simsiri, Tauhidur Rahman, and Madalina Fiterau. 2024. Dynamic clustering via branched deep learning enhances personalization of stress prediction from mobile sensor data. Scientific Reports 14, 1 (2024), 6631.
[50]
Marwa Mahmoud, Louis-Philippe Morency, and Peter Robinson. 2013. Automatic multimodal descriptors of rhythmic body movement. In Proceedings of the 15th ACM on International conference on multimodal interaction. 429–436.
[51]
Marwa Mahmoud, Louis-Philippe Morency, and Peter Robinson. 2013. Automatic Multimodal Descriptors of Rhythmic Body Movement. In International Conference on Multimodal Interaction.
[52]
Kayla Matheus, Ellie Mamantov, Marynel Vázquez, and Brian Scassellati. 2023. Deep Breathing Phase Classification with a Social Robot for Mental Health. In Proceedings of the 25th International Conference on Multimodal Interaction. 153–162.
[53]
Kayla Matheus, Marynel Vázquez, and Brian Scassellati. 2022. A social robot for anxiety reduction via deep breathing. In 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 89–94.
[54]
Naoshi Matsuo, Nobuyuki Washio, Shouji Harada, Akira Kamano, Shoji Hayakawa, and Kazuya Takeda. 2011. A study of psychological stress detection based on the non-verbal information. IEICE Technical Report; IEICE Tech. Rep. 111, 97 (2011), 29–33.
[55]
Albert Mehrabian and Shan L Friedman. 1986. An analysis of fidgeting and associated individual differences. Journal of Personality 54, 2 (1986), 406–429. https://doi.org/10.1111/j.1467-6494.1986.tb00402.x arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-6494.1986.tb00402.x
[56]
Scott M. Monroe. 2008. Modern Approaches to Conceptualizing and Measuring Human Life Stress. Annual Review of Clinical Psychology 4, Volume 4, 2008 (2008), 33–52. https://doi.org/10.1146/annurev.clinpsy.4.022007.141207
[57]
Tanya Nijhawan, Girija Attigeri, and T Ananthakrishna. 2022. Stress detection using natural language processing and machine learning over social interactions. Journal of Big Data 9, 1 (2022), 33.
[58]
Rosalind W Picard. 2016. Automating the recognition of stress and emotion: From lab to real-world impact. IEEE MultiMedia 23, 3 (2016), 3–7.
[59]
Anthony J Porcelli and Mauricio R Delgado. 2017. Stress and decision making: effects on valuation, learning, and risk-taking. Current Opinion in Behavioral Sciences 14 (2017), 33–39. https://doi.org/10.1016/j.cobeha.2016.11.015 Stress and behavior.
[60]
Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Information fusion 37 (2017), 98–125.
[61]
Colin Puri, Leslie Olson, Ioannis Pavlidis, James Levine, and Justin Starren. 2005. StressCam: non-contact measurement of users’ emotional states through thermal imaging. In CHI’05 extended abstracts on Human factors in computing systems. 1725–1728.
[62]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning. PMLR, 28492–28518.
[63]
Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, 2021. SpeechBrain: A general-purpose speech toolkit. arXiv preprint arXiv:2106.04624 (2021).
[64]
Fabien Ringeval, Andreas Sonderegger, Jürgen S. Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) (2013), 1–8. https://api.semanticscholar.org/CorpusID:206651806
[65]
Sunita Sahu, Ekta Kithani, Manav Motwani, Sahil Motwani, and Aadarsh Ahuja. 2021. Stress Detection of Office Employees Using Sentiment Analysis. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2020, Volume 2. Springer, 143–153.
[66]
Lizawati Salahuddin and Desok Kim. 2006. Detection of acute stress by heart rate variability using a prototype mobile ECG sensor. In 2006 International Conference on Hybrid Information Technology, Vol. 2. IEEE Computer Society, 453–459.
[67]
Nicole Salomons, Tom Wallenstein, Debasmita Ghose, and Brian Scassellati. 2022. The Impact of an In-Home Co-Located Robotic Coach in Helping People Make Fewer Exercise Mistakes. In 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 149–154.
[68]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
[69]
Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. [n. d.]. CARER: Contextualized Affect Representations for Emotion Recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
[70]
Pritam Sarkar, A L Posen, and Ali Etemad. 2022. AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:248811751
[71]
Brian Scassellati, Laura Boccanfuso, Chien-Ming Huang, Marilena Mademtzi, Meiying Qin, Nicole Salomons, Pamela Ventola, and Frederick Shic. 2018. Improving social skills in children with ASD using a long-term, in-home social robot. Science Robotics 3, 21 (2018), eaat7544.
[72]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815–823.
[73]
Hashini Senaratne, Kisrten Ellis, Sharon Oviatt, and Glenn Melvin. 2020. Detecting and differentiating leg bouncing behaviour from everyday movements using tri-axial accelerometer data. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers. 127–130.
[74]
Cornelia Setz, Bert Arnrich, Johannes Schumm, Roberto La Marca, Gerhard Tröster, and Ulrike Ehlert. 2009. Discriminating stress from cognitive load using a wearable EDA device. IEEE Transactions on information technology in biomedicine 14, 2 (2009), 410–417.
[75]
Abhinav Shaw, Natcha Simsiri, Iman Deznaby, Madalina Fiterau, and Tauhidur Rahaman. 2019. Personalized student stress prediction with deep multitask network. arXiv preprint arXiv:1906.11356 (2019).
[76]
Jonghoon Shin, Junhyung Moon, Beomsik Kim, Jihwan Eom, Noseong Park, and Kyoungwoo Lee. 2021. Attention-based stress detection exploiting non-contact monitoring of movement patterns with IR-UWB radar. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. 637–640.
[77]
Reza Arefi Shirvan, Seyed Kamaledin Setaredan, and Ali Motie Nasrabadi. 2018. Classification of mental stress levels by analyzing fNIRS signal using linear and non-linear features. International Clinical Neuroscience Journal 5, 2 (2018), 55.
[78]
M. Soleymani, Jeroen Lichtenauer, Thierry Pun, and Maja Pantic. 2012. A Multimodal Database for Affect Recognition and Implicit Tagging. IEEE Transactions on Affective Computing 3 (2012), 42–55. https://api.semanticscholar.org/CorpusID:2820480
[79]
Savita Sondhi, Munna Khan, Ritu Vijay, Ashok K Salhan, 2015. Vocal indicators of emotional stress. International Journal of Computer Applications 122, 15 (2015), 38–43.
[80]
Lukas Stappen, Alice Baird, Lea Schumann, and Björn W. Schuller. 2021. The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements. IEEE Transactions on Affective Computing 14 (2021), 1334–1350. https://api.semanticscholar.org/CorpusID:231627534
[81]
Nattapong Thammasan, Koichi Moriyama, Ken-ichi Fukui, and Masayuki Numao. 2017. Familiarity effects in EEG-based emotion recognition. Brain informatics 4 (2017), 39–50.
[82]
Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. 2020. Label studio: Data labeling software. Open source software available from https://github. com/heartexlabs/label-studio 2022 (2020).
[83]
Andrea Vidal, Ali N. Salman, Wei-Cheng Lin, and Carlos Busso. 2020. MSP-Face Corpus: A Natural Audiovisual Emotional Database. Proceedings of the 2020 International Conference on Multimodal Interaction (2020). https://api.semanticscholar.org/CorpusID:224816670
[84]
Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. 2014. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Seattle, Washington) (UbiComp ’14). Association for Computing Machinery, New York, NY, USA, 3–14. https://doi.org/10.1145/2632048.2632054
[85]
David H Wolpert. 1992. Stacked generalization. Neural networks 5, 2 (1992), 241–259.
[86]
Kieran Woodward and Eiman Kanjo. 2020. iFidgetCube: Tangible Fidgeting Interfaces (TFIs) to Monitor and Improve Mental Wellbeing. IEEE Sensors Journal 21, 13 (2020), 14300–14307.
[87]
Yi Xiao, Felipe Codevilla, Akhil Gurram, Onay Urfalioglu, and Antonio M López. 2020. Multimodal end-to-end autonomous driving. IEEE Transactions on Intelligent Transportation Systems 23, 1 (2020), 537–547.
[88]
Yi Xiao, Harshit Sharma, Zhongyang Zhang, Dessa Bergen-Cico, Tauhidur Rahman, and Asif Salekin. 2024. Reading Between the Heat: Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 4 (2024), 1–30.
[89]
Yiqun Yao, Michalis Papakostas, Mihai Burzo, Mohamed Abouelenien, and Rada Mihalcea. 2021. MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task. CoRR abs/2105.08146 (2021). arXiv:2105.08146https://arxiv.org/abs/2105.08146
[90]
Jin Zhang, Xue Mei, Huan Liu, Shenqiang Yuan, and Tiancheng Qian. 2019. Detecting negative emotional stress based on facial expression in real time. In 2019 IEEE 4th international conference on signal and image processing (ICSIP). IEEE, 430–434.
[91]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters 23, 10 (2016), 1499–1503.
[92]
Zheng Zhang, Jeffrey M. Girard, Yue Wu, Xing Zhang, Peng Liu, Umur Aybars Ciftci, Shaun J. Canavan, Michael J. Reale, Andy Horowitz, Huiyuan Yang, Jeffrey F. Cohn, Qiang Ji, and Lijun Yin. 2016. Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 3438–3446. https://api.semanticscholar.org/CorpusID:6578368

Index Terms

  1. Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual Data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMI '24: Proceedings of the 26th International Conference on Multimodal Interaction
    November 2024
    725 pages
    ISBN:9798400704628
    DOI:10.1145/3678957
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 November 2024

    Check for updates

    Author Tags

    1. affective computing
    2. multimodal fusion
    3. stress detection

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICMI '24
    ICMI '24: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
    November 4 - 8, 2024
    San Jose, Costa Rica

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 212
      Total Downloads
    • Downloads (Last 12 months)212
    • Downloads (Last 6 weeks)110
    Reflects downloads up to 23 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media