[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2988257.2988259acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Public Access

Online Affect Tracking with Multimodal Kalman Filters

Published: 16 October 2016 Publication History

Abstract

Arousal and valence have been widely used to represent emotions dimensionally and measure them continuously in time. In this paper, we introduce a computational framework for tracking these affective dimensions from multimodal data as an entry to the Multimodal Affect Recognition Sub-Challenge of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We propose a linear dynamical system approach with a late fusion method that accounts for the dynamics of the affective state evolution (i.e., arousal or valence). To this end, single-modality predictions are modeled as observations in a Kalman filter formulation in order to continuously track each affective dimension. Leveraging the inter-correlations between arousal and valence, we use the predicted arousal as an additional feature to improve valence predictions. Furthermore, we propose a conditional framework to select Kalman filters of different modalities while tracking. This framework employs voicing probability and facial posture cues to detect the absence or presence of each input modality. Our multimodal fusion results on the development and the test set provide a statistically significant improvement over the baseline system from AVEC2016. The proposed approach can be potentially extended to other multimodal tasks with inter-correlated behavioral dimensions.

References

[1]
J. Russell, "A circumplex model of affect," Journal of personality and social psychology, vol. 39, no. 6, pp. 1161--1178, 1980.
[2]
L. F. Barrett, "Discrete emotions or dimensions? the role of valence focus and arousal focus," Cognition and Emotion, vol. 12, no. 4, pp. 579--599, 1998.
[3]
M. F. Valstar, J. Gratch, B. W. Schuller, F. Ringeval, D. Lalanne, M. Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic, "AVEC 2016 - depression, mood, and emotion recognition workshop and challenge," CoRR, vol. abs/1605.01600, 2016.
[4]
F. Ringeval, B. Schuller, M. Valstar, S. Jaiswal, E. Marchi, D. Lalanne, R. Cowie, and M. Pantic, "Avec 2015: The first affect recognition challenge bridging across audio, video, and physiological data," in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, ser. AVEC '15. New York, NY, USA: ACM, 2015, pp. 3--8.
[5]
L. He, D. Jiang, L. Yang, E. Pei, P. Wu, and H. Sahli, "Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks," in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, ser. AVEC '15. New York, NY, USA: ACM, 2015, pp. 73--80.
[6]
F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, "Introducing the recola multimodal corpus of remote collaborative and affective interactions," in Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, April 2013, pp. 1--8.
[7]
R. E. Kalman, "A new approach to linear filtering and prediction problems," Transactions of the ASME--Journal of Basic Engineering, vol. 82, no. Series D, pp. 35--45, 1960.
[8]
R. Gupta, N. Malandrakis, B. Xiao, T. Guha, M. Van Segbroeck, M. Black, A. Potamianos, and S. Narayanan, "Multimodal prediction of affective dimensions and depression in human-computer interactions," in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 2014, pp. 33--40.
[9]
M. K\"achele, M. Schels, and F. Schwenker, "Inferring depression and affect from application dependent meta knowledge," in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 2014, pp. 41--48.
[10]
H. Kaya, F. Eyben, A. A. Salah, and B. Schuller, "CCA based feature selection with application to continuous depression recognition from acoustic speech features," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 3729--3733.
[11]
L. Chao, J. Tao, M. Yang, Y. Li, and Z. Wen, "Multi-scale temporal modeling for dimensional emotion recognition in video," in Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 2014, pp. 11--18.
[12]
Y. Gu, E. Postma, and H.-X. Lin, "Vocal emotion recognition with log-gabor filters," in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. ACM, 2015, pp. 25--31.
[13]
L. Chao, J. Tao, M. Yang, Y. Li, and Z. Wen, "Long short term memory recurrent neural network based multimodal dimensional emotion recognition," in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. ACM, 2015, pp. 65--72.
[14]
M. A. Nicolaou, H. Gunes, and M. Pantic, "Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space," IEEE Transactions on Affective Computing, vol. 2, no. 2, pp. 92--105, 2011.
[15]
T. R. Almaev and M. F. Valstar, "Local Gabor binary patterns from three orthogonal planes for automatic facial expression recognition," in Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ser. ACII '13. Washington, DC, USA: IEEE Computer Society, 2013, pp. 356--361.
[16]
A. Metallinou, C.-C. Lee, C. Busso, S. Carnicke, and S. S. Narayanan, "The USC creativeit database: A multimodal database of theatrical improvisation," in Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (MMC), Valletta, Malta, May 2010.
[17]
S. C. Müller and T. Fritz, "Stuck and frustrated or in flow and happy: Sensing developers' emotions and progress," in Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 2015, pp. 688--699.
[18]
V. Vapnik, S. E. Golowich, and A. Smola, "Support vector method for function approximation, regression estimation, and signal processing," in Advances in Neural Information Processing Systems 9. MIT Press, 1996, pp. 281--287.
[19]
M. Grimm, K. Kroschel, E. Mower, and S. Narayanan, "Primitives-based evaluation and estimation of emotions in speech," Speech Commun., vol. 49, no. 10--11, pp. 787--800, Oct. 2007.
[20]
B. Han, S. Rho, R. Dannenberg, and E. Hwang, SMERS: Music emotion recognition using support vector regression, 12 2009, pp. 651--656.
[21]
H. Xianyu, X. Li, W. Chen, F. Meng, J. Tian, M. Xu, and L. Cai, "SVR based double-scale regression for dynamic emotion prediction in music," in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp. 549--553.
[22]
G. Caridakis, L. Malatesta, L. Kessous, N. Amir, A. Raouzaiou, and K. Karpouzis, "Modeling naturalistic affective states via facial and vocal expressions recognition," in Proceedings of the 8th International Conference on Multimodal Interfaces, ser. ICMI '06. New York, NY, USA: ACM, 2006, pp. 146--154.
[23]
M. A. Nicolaou, H. Gunes, and M. Pantic, "Audio-visual classification and fusion of spontaneous affective data in likelihood space," in Pattern Recognition (ICPR), 2010 20th International Conference on, Aug 2010, pp. 3695--3699.
[24]
D. Kulic and E. A. Croft, "Affective state estimation for human #x2013;robot interaction," IEEE Transactions on Robotics, vol. 23, no. 5, pp. 991--1000, Oct 2007.
[25]
A. Metallinou, M. Wöllmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan, "Context-sensitive learning for enhanced audiovisual emotion classification," IEEE Transactions on Affective Computing, vol. 3, no. 2, pp. 184--198, April 2012.
[26]
A. Metallinou, A. Katsamanis, and S. Narayanan, "Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information," Image Vision Comput., vol. 31, no. 2, pp. 137--152, Feb. 2013.
[27]
M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, and R. Cowie, "Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies," in Proceedings Interspeech, 2008.
[28]
E. M. Schmidt and Y. E. Kim, "Prediction of time-varying musical mood distributions using Kalman filtering," Dec 2010, pp. 655--660.
[29]
F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. André, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan, and K. P. Truong, "The Geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing," IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190--202, April 2016.
[30]
A. Li, S. Shan, X. Chen, and W. Gao, "Maximizing intra-individual correlations for face recognition across pose differences," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, June 2009, pp. 605--611.
[31]
D. E. King, "Dlib-ml: A machine learning toolkit," Journal of Machine Learning Research, vol. 10, pp. 1755--1758, 2009.
[32]
B. Schuller and G. Rigoll, "Timing levels in segment-based speech emotion recognition." in INTERSPEECH, 2006.
[33]
B. Schuller, S. Hantke, F. Weninger, W. Han, Z. Zhang, and S. Narayanan, "Automatic recognition of emotion evoked by general sound events," in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012, pp. 341--344.
[34]
F. Eyben, S. Buchholz, and N. Braunschweiler, "Unsupervised clustering of emotion and voice styles for expressive TTS," in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012, pp. 4009--4012.
[35]
P. Ghahremani, B. BabaAli, D. Povey, K. Riedhammer, J. Trmal, and S. Khudanpur, "A pitch extraction algorithm tuned for automatic speech recognition," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 2494--2498.
[36]
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., "The Kaldi speech recognition toolkit," in IEEE 2011 workshop on automatic speech recognition and understanding, no. EPFL-CONF-192584. IEEE Signal Processing Society, 2011.
[37]
D. McDuff, A. Karlson, A. Kapoor, A. Roseway, and M. Czerwinski, "Affectaura: An intelligent system for emotional memory," in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI '12. New York, NY, USA: ACM, 2012, pp. 849--858.
[38]
T. Chaspari, A. Tsiartas, L. I. Stein, S. A. Cermak, and S. S. Narayanan, "Sparse representation of electrodermal activity with knowledge-driven dictionaries," IEEE Transactions on Biomedical Engineering, vol. 62, no. 3, pp. 960--971, 2015.
[39]
M. Benedek and C. Kaernbach, "A continuous measure of phasic electrodermal activity," Journal of neuroscience methods, vol. 190, no. 1, pp. 80--91, 2010.
[40]
J. J. Braithwaite, D. G. Watson, R. Jones, and M. Rowe, "A guide for analysing electrodermal activity (eda) & skin conductance responses (scrs) for psychological experiments," Psychophysiology, vol. 49, pp. 1017--1034, 2013.
[41]
A. Georgogiannis and V. Digalakis, "Speech emotion recognition using non-linear teager energy based features in noisy environments," in Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European. IEEE, 2012, pp. 2045--2049.
[42]
J. F. Kaiser, "On a simple algorithm to calculate the 'energy' of a signal," in Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on, Apr 1990, pp. 381--384 vol.1.
[43]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The weka data mining software: An update," SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10--18, Nov. 2009.
[44]
R. H. Shumway and D. S. Stoffer, "An approach to time series smoothing and forecasting using the em algorithm," Journal of Time Series Analysis, vol. 3, no. 4, pp. 253--264, 1982.
[45]
Z. Ghahramani and G. E. Hinton, "Parameter estimation for linear dynamical systems," University of Toronto, Tech. Rep., 1996.
[46]
L. I.-K. Lin, "A concordance correlation coefficient to evaluate reproducibility," Biometrics, vol. 45, no. 1, pp. 255--268, 1989.
[47]
J. Cohen and P. Cohen, Applied multiple regressionl correlation analysis for the behavioral sciences. Lawrence Erlbaum Associates., 1983.

Cited By

View all
  • (2023)Prediction of Continuous Emotional Measures through Physiological and Visual DataSensors10.3390/s2312561323:12(5613)Online publication date: 15-Jun-2023
  • (2023)A Bayesian Filtering Framework for Continuous Affect Recognition From Facial ImagesIEEE Transactions on Multimedia10.1109/TMM.2022.316424825(3709-3722)Online publication date: 1-Jan-2023
  • (2022)Continuous Emotion Recognition for Long-Term Behavior Modeling through Recurrent Neural NetworksTechnologies10.3390/technologies1003005910:3(59)Online publication date: 12-May-2022
  • Show More Cited By

Index Terms

  1. Online Affect Tracking with Multimodal Kalman Filters

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge
    October 2016
    114 pages
    ISBN:9781450345163
    DOI:10.1145/2988257
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. arousal
    2. kalman filters
    3. linear dynamical systems
    4. multimodal affective computing
    5. valence

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '16
    Sponsor:
    MM '16: ACM Multimedia Conference
    October 16, 2016
    Amsterdam, The Netherlands

    Acceptance Rates

    AVEC '16 Paper Acceptance Rate 12 of 14 submissions, 86%;
    Overall Acceptance Rate 52 of 98 submissions, 53%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Prediction of Continuous Emotional Measures through Physiological and Visual DataSensors10.3390/s2312561323:12(5613)Online publication date: 15-Jun-2023
    • (2023)A Bayesian Filtering Framework for Continuous Affect Recognition From Facial ImagesIEEE Transactions on Multimedia10.1109/TMM.2022.316424825(3709-3722)Online publication date: 1-Jan-2023
    • (2022)Continuous Emotion Recognition for Long-Term Behavior Modeling through Recurrent Neural NetworksTechnologies10.3390/technologies1003005910:3(59)Online publication date: 12-May-2022
    • (2021)Modeling Emotion in Complex Stories: The Stanford Emotional Narratives DatasetIEEE Transactions on Affective Computing10.1109/TAFFC.2019.295594912:3(579-594)Online publication date: 1-Jul-2021
    • (2021)Computational Media Intelligence: Human-Centered Machine Analysis of MediaProceedings of the IEEE10.1109/JPROC.2020.3047978109:5(891-910)Online publication date: May-2021
    • (2020)Causal Inference in Generalizable Environments: Systematic Representative DesignPsychological Inquiry10.1080/1047840X.2019.169386630:4(173-202)Online publication date: 4-Jan-2020
    • (2020)An efficient model-level fusion approach for continuous affect recognition from audiovisual signalsNeurocomputing10.1016/j.neucom.2019.09.037376:C(42-53)Online publication date: 1-Feb-2020
    • (2019)Research on Robustness of Emotion Recognition Under Environmental Noise ConditionsIEEE Access10.1109/ACCESS.2019.29443867(142009-142021)Online publication date: 2019
    • (2019)Continuous affect recognition with weakly supervised learningMultimedia Tools and Applications10.1007/s11042-019-7313-178:14(19387-19412)Online publication date: 2-Aug-2019
    • (2018)Dynamic Multi-Rater Gaussian Mixture Regression Incorporating Temporal Dependencies of Emotion Uncertainty Using Kalman Filters2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2018.8461321(4929-4933)Online publication date: Apr-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media