More Web Proxy on the site http://driver.im/

research-article

Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild

Authors:

Xuewen WuAuthors Info & Claims

ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Pages 481 - 486

https://doi.org/10.1145/2663204.2666272

Published: 12 November 2014 Publication History

Abstract

Emotion recognition in the wild is a very challenging task. In this paper, we investigate a variety of different multimodal features from video and audio to evaluate their discriminative ability to human emotion analysis. For each clip, we extract SIFT, LBP-TOP, PHOG, LPQ-TOP and audio features. We train different classifiers for every kind of features on the dataset from EmotiW 2014 Challenge, and we propose a novel hierarchical classifier fusion method for all the extracted features. The final achievement we gained on the test set is 47.17% which is much better than the best baseline recognition rate of 33.7%.

References

[1]

M. Knapp and J. Hall, Nonverbal communication in human interaction (Fourth Edition), Harcourt College Pub., 1996.

[2]

Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12), 1424--1445.

Digital Library

[3]

Wu, T., Bartlett, M. S., & Movellan, J. R. (2010, June). Facial expression recognition using Gabor motion energy filters. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 42--47). IEEE.

[4]

Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803--816.

Digital Library

[5]

Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886--893). IEEE.

Digital Library

[6]

Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on pattern analysis and machine intelligence,23(6), 681--685.

Digital Library

[7]

Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural processing letters, 9(3), 293--300.

Digital Library

[8]

Gönen, M., & Alpaydfin, E. (2011). Multiple kernel learning algorithms. The Journal of Machine Learning Research, 12, 2211--2268.

Digital Library

[9]

"The Second Emotion Recognition In The Wild Challenge and Workshop (EmotiW 2014)". http://cs.anu.edu.au/few/ChallengeDetails

[10]

Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on (Vol. 2, pp. 1150--1157). Ieee.

Digital Library

[11]

Vedaldi, A., & Fulkerson, B. (2010, October). VLFeat: An open and portable library of computer vision algorithms. In Proceedings of the international conference on Multimedia (pp. 1469--1472). ACM.

Digital Library

[12]

Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004, May). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (Vol. 1, No. 1--22, pp. 1--2).

[13]

Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International journal of computer vision, 73(2), 213--238.

Digital Library

[14]

Sikka, K., Wu, T., Susskind, J., & Bartlett, M. (2012, January). Exploring bag of words architectures in the facial expression domain. In Computer Vision-ECCV 2012. Workshops and Demonstrations (pp. 250--259). Springer Berlin Heidelberg.

Digital Library

[15]

Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010, June). Locality-constrained linear coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 3360--3367). IEEE.

[16]

Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169--2178). IEEE.

Digital Library

[17]

Yang, J., Yu, K., Gong, Y., & Huang, T. (2009, June). Linear spatial pyramid matching using sparse coding for image classification. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 1794--1801). IEEE.

[18]

K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In British Machine Vision Conference, 2011.

[19]

O.M. Parkhi, A. Vedaldi, A. Zisserman, C.V. Jawahar. Cats and dogs. In IEEE Conference on Computer Vision and Pattern Recognition, 2012

Digital Library

[20]

Ojansivu, V., & Heikkilä, J. (2008). Blur insensitive texture classification using local phase quantization. In Image and signal processing (pp. 236--243). Springer Berlin Heidelberg.

Digital Library

[21]

Bosch, A., Zisserman, A., & Munoz, X. (2007, July). Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval (pp. 401--408). ACM.

Digital Library

[22]

Liu, M., Wang, R., Huang, Z., Shan, S., & Chen, X. (2013, December). Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 525--530). ACM.

Digital Library

[23]

Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(6), 915--928.

Digital Library

[24]

Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7), 971--987.

Digital Library

[25]

Päivärinta, J., Rahtu, E., & Heikkilä, J. (2011). Volume local phase quantization for blur-insensitive dynamic texture classification. In Image Analysis (pp. 360--369). Springer Berlin Heidelberg.

Digital Library

[26]

Eyben, F., Wöllmer, M., & Schuller, B. (2010, October). Opensmile: the Munich versatile and fast open-source audio feature extractor. In Proceedings of the international conference on Multimedia (pp. 1459--1462). ACM.

Digital Library

[27]

Wöllmer, M., Weninger, F., Eyben, F., & Schuller, B. (2011). Acoustic-Linguistic Recognition of Interest in Speech with Bottleneck-BLSTM Nets. In INTERSPEECH (pp. 77--80).

[28]

Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2), 415--425.

Digital Library

Cited By

Pan TYe YZhang YXiao KCai H(2024)Online multi-hypergraph fusion learning for cross-subject emotion recognitionInformation Fusion10.1016/j.inffus.2024.102338108(102338)Online publication date: Aug-2024
https://doi.org/10.1016/j.inffus.2024.102338
Pan TYe YCai HHuang SYang YWang GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Multimodal Physiological Signals Fusion for Online Emotion RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612555(5879-5888)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612555
Zaheer RAhmad IHabibi DIslam KPhung Q(2023)A Survey on Artificial Intelligence-Based Acoustic Source IdentificationIEEE Access10.1109/ACCESS.2023.328398211(60078-60108)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3283982
Show More Cited By

Index Terms

Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild
1. Computing methodologies

Recommendations

Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

In this paper, we describe our work in the third Emotion Recognition in the Wild (EmotiW 2015) Challenge. For each video clip, we extract MSDF, LBP-TOP, HOG, LPQ-TOP and acoustic features to recognize the emotions of film characters. For the static ...
Emotion Recognition in the Wild with Feature Fusion and Multiple Kernel Learning
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

This paper presents our proposed approach for the second Emotion Recognition in The Wild Challenge. We propose a new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG_TOP) to represent facial expressions. We ...
Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

November 2014

558 pages

ISBN:9781450328852

DOI:10.1145/2663204

General Chairs:
Albert Ali Salah
Boğaziçi University, Turkey
,
Jeffrey Cohn
University of Pittsburgh, USA
,
Björn Schuller
University of Passau, Germany and Imperial College London, UK
,
Program Chairs:
Oya Aran
Idiap Research Institute, Switzerland
,
Louis-Philippe Morency
University of Southern California, USA
,
Philip R. Cohen
Adapx, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Fundamental Research Funds for the Central Universities of China

Conference

ICMI '14

Sponsor:

SIGCHI

ICMI '14: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 12 - 16, 2014

Istanbul, Turkey

Acceptance Rates

ICMI '14 Paper Acceptance Rate 51 of 127 submissions, 40%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
551
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pan TYe YZhang YXiao KCai H(2024)Online multi-hypergraph fusion learning for cross-subject emotion recognitionInformation Fusion10.1016/j.inffus.2024.102338108(102338)Online publication date: Aug-2024
Pan TYe YCai HHuang SYang YWang GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Multimodal Physiological Signals Fusion for Online Emotion RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612555(5879-5888)Online publication date: 26-Oct-2023
Zaheer RAhmad IHabibi DIslam KPhung Q(2023)A Survey on Artificial Intelligence-Based Acoustic Source IdentificationIEEE Access10.1109/ACCESS.2023.328398211(60078-60108)Online publication date: 2023
Siddiqui MDhakal PYang XJavaid A(2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
Tian LOviatt SMuszynski MChamberlain BHealey JSano A(2022)Applied Affective ComputingundefinedOnline publication date: 25-Jan-2022
Li KYu W(2021)A Mental Health Assessment Model of College Students Using Intelligent TechnologyWireless Communications and Mobile Computing10.1155/2021/74857962021(1-10)Online publication date: 21-Aug-2021
Liang LLang CLi YFeng SZhao J(2021)Fine-Grained Facial Expression Recognition in the WildIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.300732716(482-494)Online publication date: 2021
Kulkarni KCorneanu COfodile IEscalera SBaro XHyniewska SAllik JAnbarjafari G(2021)Automatic Recognition of Facial Displays of Unfelt EmotionsIEEE Transactions on Affective Computing10.1109/TAFFC.2018.287499612:2(377-390)Online publication date: 1-Apr-2021
Boughida AKouahla MLafifi Y(2021)A novel approach for facial expression recognition based on Gabor filters and genetic algorithmEvolving Systems10.1007/s12530-021-09393-213:2(331-345)Online publication date: 9-Jul-2021
Chen TSteeper BAlsheikh KTao SGuimbretière FZhang CIqbal SMacLean KChevalier FMueller S(2020)C-FaceProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology10.1145/3379337.3415879(112-125)Online publication date: 20-Oct-2020
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten