More Web Proxy on the site http://driver.im/

research-article

Multiple kernel learning for emotion recognition in the wild

Authors:

Karmen Dykstra,

Suchitra Sathyanarayana,

Gwen Littlewort,

Marian BartlettAuthors Info & Claims

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Pages 517 - 524

https://doi.org/10.1145/2522848.2531741

Published: 09 December 2013 Publication History

Abstract

We propose a method to automatically detect emotions in unconstrained settings as part of the 2013 Emotion Recognition in the Wild Challenge [16], organized in conjunction with the ACM International Conference on Multimodal Interaction (ICMI 2013). Our method combines multiple visual descriptors with paralinguistic audio features for multimodal classification of video clips. Extracted features are combined using Multiple Kernel Learning and the clips are classified using an SVM into one of the seven emotion categories: Anger, Disgust, Fear, Happiness, Neutral, Sadness and Surprise. The proposed method achieves competitive results, with an accuracy gain of approximately 10% above the challenge baseline.

References

[1]

M. Aly, M. Munich, and P. Perona. Multiple dictionaries for bag of words large scale image search. Probe, 1, 2011.

[2]

H. Atassi, A. Esposito, and Z. Smekal. Analysis of high-level features for vocal emotion recognition. In Proc. of the 34th International Conference on Telecommunications and Signal Processing (TSP), pages 361--366, 2011.

[3]

T. Balomenos, A. Raouzaiou, S. Ioannou, A. Drosopoulos, K. Karpouzis, and S. Kollias. Emotion analysis in manmachine interaction systems. In Proc. of MLMI, pages 318--328, 2004.

Digital Library

[4]

M. Black, D. Fleet, and Y. Yacoob. A framework for modeling appearance change in image sequences. In Computer Vision, 1998. Sixth International Conference on, pages 660--667, 1998.

Digital Library

[5]

J. Blitzer, K. Q. Weinberger, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems, pages 1473--1480, 2005.

[6]

A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In ACM International Conference on Image and Video Retrieval, 2007.

Digital Library

[7]

A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proc. of the 6th ACM International Conference on Image and Video Retrieval, pages 401--408. ACM, 2007.

Digital Library

[8]

S. Bucak, R. Jin, and A. K. Jain. Multi-label multiple kernel learning by stochastic approximation: Application to visual object recognition. In Advances in Neural Information Processing Systems, pages 325--333, 2010.

[9]

C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on Multimodal interfaces, ICMI '04, pages 205--211, New York, NY, USA, 2004. ACM.

Digital Library

[10]

C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on Multimodal interfaces, pages 205--211. ACM, 2004.

Digital Library

[11]

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. BMVC, 2011.

[12]

D. Cristinacce and T. F. Cootes. Feature detection and tracking with constrained local models. In BMVC, volume 17, pages 929--938, 2006.

[13]

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 886--893 vol. 1, 2005.

Digital Library

[14]

L. C. De Silva, T. Miyasato, and R. Nakatsu. Facial emotion recognition using multi-modal information. In Proc. of the IEEE International Conference on Information, Communications and Signal Processing (ICICS), volume 1, pages 397--401. IEEE, 1997.

[15]

A. Dhall, A. Asthana, R. Goecke, and T. Gedeon. Emotion recognition using phog and lpq features. In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pages 878--883. IEEE, 2011.

[16]

A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon. Emotion recognition in the wild challenge 2013. In ACM International Conference on Multimodal Interaction, 2013.

Digital Library

[17]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. A semi-automatic method for collecting richly labelled large facial expression databases from movies. IEEE Multimedia, 19:34--41, 2012.

Digital Library

[18]

G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski. Classifying facial actions. IEEE Trans. Pattern Anal. Mach. Intell., 21(10):974--989, Oct. 1999.

Digital Library

[19]

P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, 1978.

[20]

L. S. hsien Chen. Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction. Technical report, University of Illinois at Urbana-Champaign, 2000.

[21]

G. R. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5:27--72, 2004.

Digital Library

[22]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2169--2178, 2006.

Digital Library

[23]

D. G. Lowe. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, volume 2, pages 1150--1157. Ieee, 1999.

Digital Library

[24]

P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews. The extended cohn-kanade dataset (ck+) A complete dataset for action unit and emotion-specified expression. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pages 94--101. IEEE, 2010.

[25]

S. Lucey, I. Matthews, C. Hu, Z. Ambadar, F. De La Torre, and J. Cohn. Aam derived face representations for robust facial action recognition. In Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference on, pages 155--160, 2006.

Digital Library

[26]

M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba. Coding facial expressions with gabor wavelets. In Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, pages 200--205. IEEE, 1998.

Digital Library

[27]

I. Matthews and S. Baker. Active appearance models revisited. International Journal of Computer Vision, 60(2):135--164, 2004.

Digital Library

[28]

T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971--987, 2002.

Digital Library

[29]

V. Ojansivu and J. Heikkilä. Blur insensitive texture classification using local phase quantization. In A. Elmoataz, O. Lezoray, F. Nouboud, and D. Mammass, editors, Image and Signal Processing, volume 5099 of Lecture Notes in Computer Science, pages 236--243. Springer Berlin Heidelberg, 2008.

Digital Library

[30]

A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. Progress in brain research, 155:23--36, 2006.

[31]

J. Päivärinta, E. Rahtu, and J. Heikkilä. Volume local phase quantization for blur-insensitive dynamic texture classification. In Image Analysis, volume 6688, pages 360--369. Springer Berlin Heidelberg, 2011.

Digital Library

[32]

M. Pantic, I. Patras, and L. Rothkruntz. Facial action recognition in face profile image sequences. In Proc. of the IEEE International Conference on Multimedia and Expo (ICME), volume 1, pages 37--40 vol.1, 2002.

[33]

M. Pantic, I. Patras, and M. F. Valstar. Learning spatio-temporal models of facial expressions, 2005.

[34]

M. Pantic, M. Valstar, R. Rademaker, and L. Maat. Web-based database for facial expression analysis. In Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pages 5--pp. IEEE, 2005.

[35]

J. A. Russell, J.-A. Bachorowski, and J.-M. Fernández-Dols. Facial and vocal expressions of emotion. Annual Review of Psychology, 54(1):329--349, 2003.

[36]

K. R. Scherer. Adding the affective dimension: A new look in speech analysis and synthesis, 1996.

[37]

B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. A. Müller, and S. S. Narayanan. The interspeech 2010 paralinguistic challenge. In INTERSPEECH, pages 2794--2797, 2010.

[38]

N. Sebe, I. Cohen, T. Gevers, and T. S. Huang. Multimodal approaches for emotion recognition: a survey. In S. Santini, R. Schettini, and T. Gevers, editors, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 5670 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, pages 56--67, Dec. 2004.

[39]

K. Sikka, A. Dhall, and M. Bartlett. Weakly supervised pain localization using multiple instance learning. In IEEE International Conference on Automatic Face and Gesture Recognition, 2013.

[40]

K. Sikka, T. Wu, J. Susskind, and M. Bartlett. Exploring bag of words architectures in the facial expression domain. In A. Fusiello, V. Murino, and R. Cucchiara, editors, Computer Vision -- ECCV Workshops and Demonstrations, pages 250--259. Springer Berlin Heidelberg, 2012.

Digital Library

[41]

J. A. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 9(3):293--300, 1999.

Digital Library

[42]

A. Tawari and M. Trivedi. Audio-visual data association for face expression analysis. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 1120--1123, 2012.

[43]

M. Valstar and M. Pantic. Fully automatic facial action unit detection and temporal analysis. In IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pages 149--149, 2006.

Digital Library

[44]

M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In Proc. of the 26th Annual International Conference on Machine Learning, pages 1065--1072. ACM, 2009.

Digital Library

[45]

A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the ACM International Conference on Multimedia, pages 1469--1472. ACM, 2010.

Digital Library

[46]

P. Viola and M. J. Jones. Robust real-time face detection. International journal of computer vision, 57(2):137--154, 2004.

Digital Library

[47]

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3360--3367. IEEE, 2010.

[48]

M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig. Low-level fusion of audio, video feature for multi-modal emotion recognition. In Proc. 3rd International Conference on Computer Vision Theory and Applications, pages 145--151, 2008.

[49]

X. Xiong and F. De la Torre. Supervised descent method and its applications to face alignment. 2013.

[50]

Y. Yacoob and L. Davis. Computing spatio-temporal representations of human faces. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 70--75, 1994.

[51]

Z. Zeng, M. Pantic, G. Roisman, and T. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39--58, 2009.

Digital Library

[52]

W. Zhang, A. Surve, X. Fern, and T. Dietterich. Learning non-redundant codebooks for classifying complex objects. In Proc. of the 26th Annual International Conference on Machine Learning, pages 1241--1248. ACM, 2009.

Digital Library

[53]

G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915--928, 2007.

Digital Library

[54]

X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2879--2886. IEEE, 2012.

Digital Library

Cited By

Li XWang SHuang XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Temporal Enhancement for Video Affective Content AnalysisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681631(642-650)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681631
Palash MBhargava B(2024)EMERSK -Explainable Multimodal Emotion Recognition With Situational KnowledgeIEEE Transactions on Multimedia10.1109/TMM.2023.330401526(2785-2794)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3304015
Wei YFeng RWang ZHu D(2024)Enhancing Multimodal Cooperation via Sample-Level Modality Valuation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02581(27328-27337)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02581
Show More Cited By

Index Terms

Multiple kernel learning for emotion recognition in the wild
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Emotion Recognition in the Wild with Feature Fusion and Multiple Kernel Learning
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

This paper presents our proposed approach for the second Emotion Recognition in The Wild Challenge. We propose a new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG_TOP) to represent facial expressions. We ...
Emotion recognition in the wild challenge 2013
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Emotion recognition is a very active field of research. The Emotion Recognition In The Wild Challenge and Workshop (EmotiW) 2013 Grand Challenge consists of an audio-video based emotion classification challenges, which mimics real-world conditions. ...
Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Emotion recognition in the wild is a very challenging task. In this paper, we investigate a variety of different multimodal features from video and audio to evaluate their discriminative ability to human emotion analysis. For each clip, we extract SIFT, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

December 2013

630 pages

ISBN:9781450321297

DOI:10.1145/2522848

General Chairs:
Julien Epps
The University of New South Wales, Australia
,
Fang Chen
National ICT Australia, Australia
,
Sharon Oviatt
Incaa Designs, USA
,
Kenji Mase
Nagoya University, Japan
,
Program Chairs:
Andrew Sears
Rochester Institute of Technology, USA
,
Kristiina Jokinen
University of Helsinki, Finland
,
Björn Schuller
Technische Universität München, Germany

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '13

Sponsor:

SIGCHI

ICMI '13: 2013 International Conference on Multimodal Interaction

December 9 - 13, 2013

Sydney, Australia

Acceptance Rates

ICMI '13 Paper Acceptance Rate 49 of 133 submissions, 37%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
609
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li XWang SHuang XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Temporal Enhancement for Video Affective Content AnalysisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681631(642-650)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681631
Palash MBhargava B(2024)EMERSK -Explainable Multimodal Emotion Recognition With Situational KnowledgeIEEE Transactions on Multimedia10.1109/TMM.2023.330401526(2785-2794)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3304015
Wei YFeng RWang ZHu D(2024)Enhancing Multimodal Cooperation via Sample-Level Modality Valuation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02581(27328-27337)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02581
Zhang ZZhao PPark EYang J(2024)MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01219(12830-12840)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01219
Geethanjali RValarmathi A(2024)A novel hybrid deep learning IChOA-CNN-LSTM model for modality-enriched and multilingual emotion recognition in social mediaScientific Reports10.1038/s41598-024-73452-214:1Online publication date: 27-Sep-2024
https://doi.org/10.1038/s41598-024-73452-2
Fan CLin JMao RCambria E(2024)Fusing pairwise modalities for emotion recognition in conversationsInformation Fusion10.1016/j.inffus.2024.102306106(102306)Online publication date: Jun-2024
https://doi.org/10.1016/j.inffus.2024.102306
Cho SWee KChun T(2023)Feature Refinement via Canonical Correlation Analysis for Multimodal Emotion Recognition2023 23rd International Conference on Control, Automation and Systems (ICCAS)10.23919/ICCAS59377.2023.10316831(838-841)Online publication date: 17-Oct-2023
https://doi.org/10.23919/ICCAS59377.2023.10316831
Zhang ZWang LYang J(2023)Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01811(18888-18897)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01811
Rathod BVanzara RPandya D(2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
https://doi.org/10.1016/j.jvcir.2023.103988
Pandit DJadhav S(2023)Prediction of Face Emotion with Labelled Selective Transfer Machine as a Generalized Emotion ClassifierAdvanced Computing10.1007/978-3-031-35644-5_23(294-307)Online publication date: 14-Jul-2023
https://doi.org/10.1007/978-3-031-35644-5_23
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents