More Web Proxy on the site http://driver.im/

research-article

Public Access

Online Affect Tracking with Multimodal Kalman Filters

Authors:

Krishna Somandepalli,

Brandon M. Booth,

Shrikanth S. NarayananAuthors Info & Claims

AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

Pages 59 - 66

https://doi.org/10.1145/2988257.2988259

Published: 16 October 2016 Publication History

Abstract

Arousal and valence have been widely used to represent emotions dimensionally and measure them continuously in time. In this paper, we introduce a computational framework for tracking these affective dimensions from multimodal data as an entry to the Multimodal Affect Recognition Sub-Challenge of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We propose a linear dynamical system approach with a late fusion method that accounts for the dynamics of the affective state evolution (i.e., arousal or valence). To this end, single-modality predictions are modeled as observations in a Kalman filter formulation in order to continuously track each affective dimension. Leveraging the inter-correlations between arousal and valence, we use the predicted arousal as an additional feature to improve valence predictions. Furthermore, we propose a conditional framework to select Kalman filters of different modalities while tracking. This framework employs voicing probability and facial posture cues to detect the absence or presence of each input modality. Our multimodal fusion results on the development and the test set provide a statistically significant improvement over the baseline system from AVEC2016. The proposed approach can be potentially extended to other multimodal tasks with inter-correlated behavioral dimensions.

References

[1]

J. Russell, "A circumplex model of affect," Journal of personality and social psychology, vol. 39, no. 6, pp. 1161--1178, 1980.

[2]

L. F. Barrett, "Discrete emotions or dimensions? the role of valence focus and arousal focus," Cognition and Emotion, vol. 12, no. 4, pp. 579--599, 1998.

[3]

M. F. Valstar, J. Gratch, B. W. Schuller, F. Ringeval, D. Lalanne, M. Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic, "AVEC 2016 - depression, mood, and emotion recognition workshop and challenge," CoRR, vol. abs/1605.01600, 2016.

[4]

F. Ringeval, B. Schuller, M. Valstar, S. Jaiswal, E. Marchi, D. Lalanne, R. Cowie, and M. Pantic, "Avec 2015: The first affect recognition challenge bridging across audio, video, and physiological data," in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, ser. AVEC '15. New York, NY, USA: ACM, 2015, pp. 3--8.

Digital Library

[5]

L. He, D. Jiang, L. Yang, E. Pei, P. Wu, and H. Sahli, "Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks," in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, ser. AVEC '15. New York, NY, USA: ACM, 2015, pp. 73--80.

Digital Library

[6]

F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, "Introducing the recola multimodal corpus of remote collaborative and affective interactions," in Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, April 2013, pp. 1--8.

[7]

R. E. Kalman, "A new approach to linear filtering and prediction problems," Transactions of the ASME--Journal of Basic Engineering, vol. 82, no. Series D, pp. 35--45, 1960.

[8]

R. Gupta, N. Malandrakis, B. Xiao, T. Guha, M. Van Segbroeck, M. Black, A. Potamianos, and S. Narayanan, "Multimodal prediction of affective dimensions and depression in human-computer interactions," in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 2014, pp. 33--40.

Digital Library

[9]

M. K\"achele, M. Schels, and F. Schwenker, "Inferring depression and affect from application dependent meta knowledge," in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 2014, pp. 41--48.

Digital Library

[10]

H. Kaya, F. Eyben, A. A. Salah, and B. Schuller, "CCA based feature selection with application to continuous depression recognition from acoustic speech features," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 3729--3733.

[11]

L. Chao, J. Tao, M. Yang, Y. Li, and Z. Wen, "Multi-scale temporal modeling for dimensional emotion recognition in video," in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 2014, pp. 11--18.

Digital Library

[12]

Y. Gu, E. Postma, and H.-X. Lin, "Vocal emotion recognition with log-gabor filters," in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. ACM, 2015, pp. 25--31.

Digital Library

[13]

L. Chao, J. Tao, M. Yang, Y. Li, and Z. Wen, "Long short term memory recurrent neural network based multimodal dimensional emotion recognition," in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. ACM, 2015, pp. 65--72.

Digital Library

[14]

M. A. Nicolaou, H. Gunes, and M. Pantic, "Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space," IEEE Transactions on Affective Computing, vol. 2, no. 2, pp. 92--105, 2011.

Digital Library

[15]

T. R. Almaev and M. F. Valstar, "Local Gabor binary patterns from three orthogonal planes for automatic facial expression recognition," in Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ser. ACII '13. Washington, DC, USA: IEEE Computer Society, 2013, pp. 356--361.

Digital Library

[16]

A. Metallinou, C.-C. Lee, C. Busso, S. Carnicke, and S. S. Narayanan, "The USC creativeit database: A multimodal database of theatrical improvisation," in Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (MMC), Valletta, Malta, May 2010.

[17]

S. C. Müller and T. Fritz, "Stuck and frustrated or in flow and happy: Sensing developers' emotions and progress," in Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 2015, pp. 688--699.

Digital Library

[18]

V. Vapnik, S. E. Golowich, and A. Smola, "Support vector method for function approximation, regression estimation, and signal processing," in Advances in Neural Information Processing Systems 9. MIT Press, 1996, pp. 281--287.

Digital Library

[19]

M. Grimm, K. Kroschel, E. Mower, and S. Narayanan, "Primitives-based evaluation and estimation of emotions in speech," Speech Commun., vol. 49, no. 10--11, pp. 787--800, Oct. 2007.

Digital Library

[20]

B. Han, S. Rho, R. Dannenberg, and E. Hwang, SMERS: Music emotion recognition using support vector regression, 12 2009, pp. 651--656.

[21]

H. Xianyu, X. Li, W. Chen, F. Meng, J. Tian, M. Xu, and L. Cai, "SVR based double-scale regression for dynamic emotion prediction in music," in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp. 549--553.

[22]

G. Caridakis, L. Malatesta, L. Kessous, N. Amir, A. Raouzaiou, and K. Karpouzis, "Modeling naturalistic affective states via facial and vocal expressions recognition," in Proceedings of the 8th International Conference on Multimodal Interfaces, ser. ICMI '06. New York, NY, USA: ACM, 2006, pp. 146--154.

Digital Library

[23]

M. A. Nicolaou, H. Gunes, and M. Pantic, "Audio-visual classification and fusion of spontaneous affective data in likelihood space," in Pattern Recognition (ICPR), 2010 20th International Conference on, Aug 2010, pp. 3695--3699.

Digital Library

[24]

D. Kulic and E. A. Croft, "Affective state estimation for human #x2013;robot interaction," IEEE Transactions on Robotics, vol. 23, no. 5, pp. 991--1000, Oct 2007.

Digital Library

[25]

A. Metallinou, M. Wöllmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan, "Context-sensitive learning for enhanced audiovisual emotion classification," IEEE Transactions on Affective Computing, vol. 3, no. 2, pp. 184--198, April 2012.

Digital Library

[26]

A. Metallinou, A. Katsamanis, and S. Narayanan, "Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information," Image Vision Comput., vol. 31, no. 2, pp. 137--152, Feb. 2013.

Digital Library

[27]

M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, and R. Cowie, "Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies," in Proceedings Interspeech, 2008.

[28]

E. M. Schmidt and Y. E. Kim, "Prediction of time-varying musical mood distributions using Kalman filtering," Dec 2010, pp. 655--660.

Digital Library

[29]

F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. André, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan, and K. P. Truong, "The Geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing," IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190--202, April 2016.

Digital Library

[30]

A. Li, S. Shan, X. Chen, and W. Gao, "Maximizing intra-individual correlations for face recognition across pose differences," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, June 2009, pp. 605--611.

[31]

D. E. King, "Dlib-ml: A machine learning toolkit," Journal of Machine Learning Research, vol. 10, pp. 1755--1758, 2009.

Digital Library

[32]

B. Schuller and G. Rigoll, "Timing levels in segment-based speech emotion recognition." in INTERSPEECH, 2006.

[33]

B. Schuller, S. Hantke, F. Weninger, W. Han, Z. Zhang, and S. Narayanan, "Automatic recognition of emotion evoked by general sound events," in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012, pp. 341--344.

[34]

F. Eyben, S. Buchholz, and N. Braunschweiler, "Unsupervised clustering of emotion and voice styles for expressive TTS," in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012, pp. 4009--4012.

[35]

P. Ghahremani, B. BabaAli, D. Povey, K. Riedhammer, J. Trmal, and S. Khudanpur, "A pitch extraction algorithm tuned for automatic speech recognition," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 2494--2498.

[36]

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., "The Kaldi speech recognition toolkit," in IEEE 2011 workshop on automatic speech recognition and understanding, no. EPFL-CONF-192584. IEEE Signal Processing Society, 2011.

[37]

D. McDuff, A. Karlson, A. Kapoor, A. Roseway, and M. Czerwinski, "Affectaura: An intelligent system for emotional memory," in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI '12. New York, NY, USA: ACM, 2012, pp. 849--858.

Digital Library

[38]

T. Chaspari, A. Tsiartas, L. I. Stein, S. A. Cermak, and S. S. Narayanan, "Sparse representation of electrodermal activity with knowledge-driven dictionaries," IEEE Transactions on Biomedical Engineering, vol. 62, no. 3, pp. 960--971, 2015.

[39]

M. Benedek and C. Kaernbach, "A continuous measure of phasic electrodermal activity," Journal of neuroscience methods, vol. 190, no. 1, pp. 80--91, 2010.

[40]

J. J. Braithwaite, D. G. Watson, R. Jones, and M. Rowe, "A guide for analysing electrodermal activity (eda) & skin conductance responses (scrs) for psychological experiments," Psychophysiology, vol. 49, pp. 1017--1034, 2013.

[41]

A. Georgogiannis and V. Digalakis, "Speech emotion recognition using non-linear teager energy based features in noisy environments," in Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European. IEEE, 2012, pp. 2045--2049.

[42]

J. F. Kaiser, "On a simple algorithm to calculate the 'energy' of a signal," in Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on, Apr 1990, pp. 381--384 vol.1.

[43]

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The weka data mining software: An update," SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10--18, Nov. 2009.

Digital Library

[44]

R. H. Shumway and D. S. Stoffer, "An approach to time series smoothing and forecasting using the em algorithm," Journal of Time Series Analysis, vol. 3, no. 4, pp. 253--264, 1982.

[45]

Z. Ghahramani and G. E. Hinton, "Parameter estimation for linear dynamical systems," University of Toronto, Tech. Rep., 1996.

[46]

L. I.-K. Lin, "A concordance correlation coefficient to evaluate reproducibility," Biometrics, vol. 45, no. 1, pp. 255--268, 1989.

[47]

J. Cohen and P. Cohen, Applied multiple regressionl correlation analysis for the behavioral sciences. Lawrence Erlbaum Associates., 1983.

Cited By

Joudeh ICretu ABouchard SGuimond S(2023)Prediction of Continuous Emotional Measures through Physiological and Visual DataSensors10.3390/s2312561323:12(5613)Online publication date: 15-Jun-2023
https://doi.org/10.3390/s23125613
Pei EZhao YOveneke MJiang DSahli H(2023)A Bayesian Filtering Framework for Continuous Affect Recognition From Facial ImagesIEEE Transactions on Multimedia10.1109/TMM.2022.316424825(3709-3722)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3164248
Kansizoglou IMisirlis ETsintotas KGasteratos A(2022)Continuous Emotion Recognition for Long-Term Behavior Modeling through Recurrent Neural NetworksTechnologies10.3390/technologies1003005910:3(59)Online publication date: 12-May-2022
https://doi.org/10.3390/technologies10030059
Show More Cited By

Index Terms

Online Affect Tracking with Multimodal Kalman Filters
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Ensemble methods

Recommendations

PanoEmo, a set of affective 360-degree panoramas: a psychophysiological study
Abstract
There is a significant increase in the use of virtual reality in scientific experiments in the fields of ergonomics, education, and psychology among others. Many researchers successfully provoked different affective states in participants in order ...
How do emotions elicited in virtual reality affect our memory? A systematic review
Abstract
Emotion has been shown to have a significant influence on memory. However, most of the investigations that led to this conclusion used non-virtual reality procedures. Virtual reality, as an emotional medium, is suitable for the study ...
Highlights
- Studies investigating how emotions affect memory using VR range from basic (e.g., episodic memories) to applied research.
Affect representation and recognition in 3D continuous valence---arousal---dominance space

Currently, the focus of research on human affect recognition has shifted from six basic emotions to complex affect recognition in continuous two or three dimensional space due to the following challenges: (i) the difficulty in representing and analyzing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

October 2016

114 pages

ISBN:9781450345163

DOI:10.1145/2988257

General Chairs:
Michel Valstar
University of Nottingham, UK
,
Jonathan Gratch
University of Southern California, USA
,
Björn Schuller
University of Pasau/Imperial College London, DE/UK
,
Fabien Ringeval
Université Grenoble Alpes, FR
,
Roddy Cowie
Queen's University Belfast, UK
,
Maja Pantic
Imperial College London/Twente University, UK/The Netherlands

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MM '16

Sponsor:

SIGMM

MM '16: ACM Multimedia Conference

October 16, 2016

Amsterdam, The Netherlands

Acceptance Rates

AVEC '16 Paper Acceptance Rate 12 of 14 submissions, 86%;

Overall Acceptance Rate 52 of 98 submissions, 53%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
573
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)9

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Joudeh ICretu ABouchard SGuimond S(2023)Prediction of Continuous Emotional Measures through Physiological and Visual DataSensors10.3390/s2312561323:12(5613)Online publication date: 15-Jun-2023
https://doi.org/10.3390/s23125613
Pei EZhao YOveneke MJiang DSahli H(2023)A Bayesian Filtering Framework for Continuous Affect Recognition From Facial ImagesIEEE Transactions on Multimedia10.1109/TMM.2022.316424825(3709-3722)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3164248
Kansizoglou IMisirlis ETsintotas KGasteratos A(2022)Continuous Emotion Recognition for Long-Term Behavior Modeling through Recurrent Neural NetworksTechnologies10.3390/technologies1003005910:3(59)Online publication date: 12-May-2022
https://doi.org/10.3390/technologies10030059
Ong DWu ZZhi-Xuan TReddan MKahhale IMattek AZaki J(2021)Modeling Emotion in Complex Stories: The Stanford Emotional Narratives DatasetIEEE Transactions on Affective Computing10.1109/TAFFC.2019.295594912:3(579-594)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TAFFC.2019.2955949
Somandepalli KGuha TMartinez VKumar NAdam HNarayanan S(2021)Computational Media Intelligence: Human-Centered Machine Analysis of MediaProceedings of the IEEE10.1109/JPROC.2020.3047978109:5(891-910)Online publication date: May-2021
https://doi.org/10.1109/JPROC.2020.3047978
Miller LShaikh SJeong DWang LGillig TGodoy CAppleby PCorsbie-Massay CMarsella SChristensen JRead S(2020)Causal Inference in Generalizable Environments: Systematic Representative DesignPsychological Inquiry10.1080/1047840X.2019.169386630:4(173-202)Online publication date: 4-Jan-2020
https://doi.org/10.1080/1047840X.2019.1693866
Pei EJiang DSahli H(2020)An efficient model-level fusion approach for continuous affect recognition from audiovisual signalsNeurocomputing10.1016/j.neucom.2019.09.037376:C(42-53)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1016/j.neucom.2019.09.037
Huang YXiao JTian KWu AZhang G(2019)Research on Robustness of Emotion Recognition Under Environmental Noise ConditionsIEEE Access10.1109/ACCESS.2019.29443867(142009-142021)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2944386
Pei EJiang DAlioscha-Perez MSahli H(2019)Continuous affect recognition with weakly supervised learningMultimedia Tools and Applications10.1007/s11042-019-7313-178:14(19387-19412)Online publication date: 2-Aug-2019
https://dl.acm.org/doi/10.1007/s11042-019-7313-1
Dang TSethu VAmbikairajah E(2018)Dynamic Multi-Rater Gaussian Mixture Regression Incorporating Temporal Dependencies of Emotion Uncertainty Using Kalman Filters2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2018.8461321(4929-4933)Online publication date: Apr-2018
https://doi.org/10.1109/ICASSP.2018.8461321
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents