[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2522848.2531741acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Multiple kernel learning for emotion recognition in the wild

Published: 09 December 2013 Publication History

Abstract

We propose a method to automatically detect emotions in unconstrained settings as part of the 2013 Emotion Recognition in the Wild Challenge [16], organized in conjunction with the ACM International Conference on Multimodal Interaction (ICMI 2013). Our method combines multiple visual descriptors with paralinguistic audio features for multimodal classification of video clips. Extracted features are combined using Multiple Kernel Learning and the clips are classified using an SVM into one of the seven emotion categories: Anger, Disgust, Fear, Happiness, Neutral, Sadness and Surprise. The proposed method achieves competitive results, with an accuracy gain of approximately 10% above the challenge baseline.

References

[1]
M. Aly, M. Munich, and P. Perona. Multiple dictionaries for bag of words large scale image search. Probe, 1, 2011.
[2]
H. Atassi, A. Esposito, and Z. Smekal. Analysis of high-level features for vocal emotion recognition. In Proc. of the 34th International Conference on Telecommunications and Signal Processing (TSP), pages 361--366, 2011.
[3]
T. Balomenos, A. Raouzaiou, S. Ioannou, A. Drosopoulos, K. Karpouzis, and S. Kollias. Emotion analysis in manmachine interaction systems. In Proc. of MLMI, pages 318--328, 2004.
[4]
M. Black, D. Fleet, and Y. Yacoob. A framework for modeling appearance change in image sequences. In Computer Vision, 1998. Sixth International Conference on, pages 660--667, 1998.
[5]
J. Blitzer, K. Q. Weinberger, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems, pages 1473--1480, 2005.
[6]
A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In ACM International Conference on Image and Video Retrieval, 2007.
[7]
A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proc. of the 6th ACM International Conference on Image and Video Retrieval, pages 401--408. ACM, 2007.
[8]
S. Bucak, R. Jin, and A. K. Jain. Multi-label multiple kernel learning by stochastic approximation: Application to visual object recognition. In Advances in Neural Information Processing Systems, pages 325--333, 2010.
[9]
C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on Multimodal interfaces, ICMI '04, pages 205--211, New York, NY, USA, 2004. ACM.
[10]
C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on Multimodal interfaces, pages 205--211. ACM, 2004.
[11]
K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. BMVC, 2011.
[12]
D. Cristinacce and T. F. Cootes. Feature detection and tracking with constrained local models. In BMVC, volume 17, pages 929--938, 2006.
[13]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 886--893 vol. 1, 2005.
[14]
L. C. De Silva, T. Miyasato, and R. Nakatsu. Facial emotion recognition using multi-modal information. In Proc. of the IEEE International Conference on Information, Communications and Signal Processing (ICICS), volume 1, pages 397--401. IEEE, 1997.
[15]
A. Dhall, A. Asthana, R. Goecke, and T. Gedeon. Emotion recognition using phog and lpq features. In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pages 878--883. IEEE, 2011.
[16]
A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon. Emotion recognition in the wild challenge 2013. In ACM International Conference on Multimodal Interaction, 2013.
[17]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. A semi-automatic method for collecting richly labelled large facial expression databases from movies. IEEE Multimedia, 19:34--41, 2012.
[18]
G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski. Classifying facial actions. IEEE Trans. Pattern Anal. Mach. Intell., 21(10):974--989, Oct. 1999.
[19]
P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, 1978.
[20]
L. S. hsien Chen. Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction. Technical report, University of Illinois at Urbana-Champaign, 2000.
[21]
G. R. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5:27--72, 2004.
[22]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2169--2178, 2006.
[23]
D. G. Lowe. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, volume 2, pages 1150--1157. Ieee, 1999.
[24]
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews. The extended cohn-kanade dataset (ck+) A complete dataset for action unit and emotion-specified expression. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pages 94--101. IEEE, 2010.
[25]
S. Lucey, I. Matthews, C. Hu, Z. Ambadar, F. De La Torre, and J. Cohn. Aam derived face representations for robust facial action recognition. In Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference on, pages 155--160, 2006.
[26]
M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba. Coding facial expressions with gabor wavelets. In Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, pages 200--205. IEEE, 1998.
[27]
I. Matthews and S. Baker. Active appearance models revisited. International Journal of Computer Vision, 60(2):135--164, 2004.
[28]
T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971--987, 2002.
[29]
V. Ojansivu and J. Heikkilä. Blur insensitive texture classification using local phase quantization. In A. Elmoataz, O. Lezoray, F. Nouboud, and D. Mammass, editors, Image and Signal Processing, volume 5099 of Lecture Notes in Computer Science, pages 236--243. Springer Berlin Heidelberg, 2008.
[30]
A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. Progress in brain research, 155:23--36, 2006.
[31]
J. Päivärinta, E. Rahtu, and J. Heikkilä. Volume local phase quantization for blur-insensitive dynamic texture classification. In Image Analysis, volume 6688, pages 360--369. Springer Berlin Heidelberg, 2011.
[32]
M. Pantic, I. Patras, and L. Rothkruntz. Facial action recognition in face profile image sequences. In Proc. of the IEEE International Conference on Multimedia and Expo (ICME), volume 1, pages 37--40 vol.1, 2002.
[33]
M. Pantic, I. Patras, and M. F. Valstar. Learning spatio-temporal models of facial expressions, 2005.
[34]
M. Pantic, M. Valstar, R. Rademaker, and L. Maat. Web-based database for facial expression analysis. In Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pages 5--pp. IEEE, 2005.
[35]
J. A. Russell, J.-A. Bachorowski, and J.-M. Fernández-Dols. Facial and vocal expressions of emotion. Annual Review of Psychology, 54(1):329--349, 2003.
[36]
K. R. Scherer. Adding the affective dimension: A new look in speech analysis and synthesis, 1996.
[37]
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. A. Müller, and S. S. Narayanan. The interspeech 2010 paralinguistic challenge. In INTERSPEECH, pages 2794--2797, 2010.
[38]
N. Sebe, I. Cohen, T. Gevers, and T. S. Huang. Multimodal approaches for emotion recognition: a survey. In S. Santini, R. Schettini, and T. Gevers, editors, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 5670 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, pages 56--67, Dec. 2004.
[39]
K. Sikka, A. Dhall, and M. Bartlett. Weakly supervised pain localization using multiple instance learning. In IEEE International Conference on Automatic Face and Gesture Recognition, 2013.
[40]
K. Sikka, T. Wu, J. Susskind, and M. Bartlett. Exploring bag of words architectures in the facial expression domain. In A. Fusiello, V. Murino, and R. Cucchiara, editors, Computer Vision -- ECCV Workshops and Demonstrations, pages 250--259. Springer Berlin Heidelberg, 2012.
[41]
J. A. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 9(3):293--300, 1999.
[42]
A. Tawari and M. Trivedi. Audio-visual data association for face expression analysis. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 1120--1123, 2012.
[43]
M. Valstar and M. Pantic. Fully automatic facial action unit detection and temporal analysis. In IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pages 149--149, 2006.
[44]
M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In Proc. of the 26th Annual International Conference on Machine Learning, pages 1065--1072. ACM, 2009.
[45]
A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the ACM International Conference on Multimedia, pages 1469--1472. ACM, 2010.
[46]
P. Viola and M. J. Jones. Robust real-time face detection. International journal of computer vision, 57(2):137--154, 2004.
[47]
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3360--3367. IEEE, 2010.
[48]
M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig. Low-level fusion of audio, video feature for multi-modal emotion recognition. In Proc. 3rd International Conference on Computer Vision Theory and Applications, pages 145--151, 2008.
[49]
X. Xiong and F. De la Torre. Supervised descent method and its applications to face alignment. 2013.
[50]
Y. Yacoob and L. Davis. Computing spatio-temporal representations of human faces. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 70--75, 1994.
[51]
Z. Zeng, M. Pantic, G. Roisman, and T. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39--58, 2009.
[52]
W. Zhang, A. Surve, X. Fern, and T. Dietterich. Learning non-redundant codebooks for classifying complex objects. In Proc. of the 26th Annual International Conference on Machine Learning, pages 1241--1248. ACM, 2009.
[53]
G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915--928, 2007.
[54]
X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2879--2886. IEEE, 2012.

Cited By

View all
  • (2024)Temporal Enhancement for Video Affective Content AnalysisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681631(642-650)Online publication date: 28-Oct-2024
  • (2024)EMERSK -Explainable Multimodal Emotion Recognition With Situational KnowledgeIEEE Transactions on Multimedia10.1109/TMM.2023.330401526(2785-2794)Online publication date: 1-Jan-2024
  • (2024)Enhancing Multimodal Cooperation via Sample-Level Modality Valuation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02581(27328-27337)Online publication date: 16-Jun-2024
  • Show More Cited By

Index Terms

  1. Multiple kernel learning for emotion recognition in the wild

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
    December 2013
    630 pages
    ISBN:9781450321297
    DOI:10.1145/2522848
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 December 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bag of words
    2. feature fusion
    3. multimodal
    4. multiple kernel learning
    5. support vector machine

    Qualifiers

    • Research-article

    Conference

    ICMI '13
    Sponsor:

    Acceptance Rates

    ICMI '13 Paper Acceptance Rate 49 of 133 submissions, 37%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Temporal Enhancement for Video Affective Content AnalysisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681631(642-650)Online publication date: 28-Oct-2024
    • (2024)EMERSK -Explainable Multimodal Emotion Recognition With Situational KnowledgeIEEE Transactions on Multimedia10.1109/TMM.2023.330401526(2785-2794)Online publication date: 1-Jan-2024
    • (2024)Enhancing Multimodal Cooperation via Sample-Level Modality Valuation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02581(27328-27337)Online publication date: 16-Jun-2024
    • (2024)MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01219(12830-12840)Online publication date: 16-Jun-2024
    • (2024)A novel hybrid deep learning IChOA-CNN-LSTM model for modality-enriched and multilingual emotion recognition in social mediaScientific Reports10.1038/s41598-024-73452-214:1Online publication date: 27-Sep-2024
    • (2024)Fusing pairwise modalities for emotion recognition in conversationsInformation Fusion10.1016/j.inffus.2024.102306106(102306)Online publication date: Jun-2024
    • (2023)Feature Refinement via Canonical Correlation Analysis for Multimodal Emotion Recognition2023 23rd International Conference on Control, Automation and Systems (ICCAS)10.23919/ICCAS59377.2023.10316831(838-841)Online publication date: 17-Oct-2023
    • (2023)Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01811(18888-18897)Online publication date: Jun-2023
    • (2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
    • (2023)Prediction of Face Emotion with Labelled Selective Transfer Machine as a Generalized Emotion ClassifierAdvanced Computing10.1007/978-3-031-35644-5_23(294-307)Online publication date: 14-Jul-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media