[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2663204.2666272acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild

Published: 12 November 2014 Publication History

Abstract

Emotion recognition in the wild is a very challenging task. In this paper, we investigate a variety of different multimodal features from video and audio to evaluate their discriminative ability to human emotion analysis. For each clip, we extract SIFT, LBP-TOP, PHOG, LPQ-TOP and audio features. We train different classifiers for every kind of features on the dataset from EmotiW 2014 Challenge, and we propose a novel hierarchical classifier fusion method for all the extracted features. The final achievement we gained on the test set is 47.17% which is much better than the best baseline recognition rate of 33.7%.

References

[1]
M. Knapp and J. Hall, Nonverbal communication in human interaction (Fourth Edition), Harcourt College Pub., 1996.
[2]
Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12), 1424--1445.
[3]
Wu, T., Bartlett, M. S., & Movellan, J. R. (2010, June). Facial expression recognition using Gabor motion energy filters. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 42--47). IEEE.
[4]
Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803--816.
[5]
Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886--893). IEEE.
[6]
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on pattern analysis and machine intelligence,23(6), 681--685.
[7]
Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural processing letters, 9(3), 293--300.
[8]
Gönen, M., & Alpaydfin, E. (2011). Multiple kernel learning algorithms. The Journal of Machine Learning Research, 12, 2211--2268.
[9]
"The Second Emotion Recognition In The Wild Challenge and Workshop (EmotiW 2014)". http://cs.anu.edu.au/few/ChallengeDetails
[10]
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on (Vol. 2, pp. 1150--1157). Ieee.
[11]
Vedaldi, A., & Fulkerson, B. (2010, October). VLFeat: An open and portable library of computer vision algorithms. In Proceedings of the international conference on Multimedia (pp. 1469--1472). ACM.
[12]
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004, May). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (Vol. 1, No. 1--22, pp. 1--2).
[13]
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International journal of computer vision, 73(2), 213--238.
[14]
Sikka, K., Wu, T., Susskind, J., & Bartlett, M. (2012, January). Exploring bag of words architectures in the facial expression domain. In Computer Vision-ECCV 2012. Workshops and Demonstrations (pp. 250--259). Springer Berlin Heidelberg.
[15]
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010, June). Locality-constrained linear coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 3360--3367). IEEE.
[16]
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169--2178). IEEE.
[17]
Yang, J., Yu, K., Gong, Y., & Huang, T. (2009, June). Linear spatial pyramid matching using sparse coding for image classification. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 1794--1801). IEEE.
[18]
K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In British Machine Vision Conference, 2011.
[19]
O.M. Parkhi, A. Vedaldi, A. Zisserman, C.V. Jawahar. Cats and dogs. In IEEE Conference on Computer Vision and Pattern Recognition, 2012
[20]
Ojansivu, V., & Heikkilä, J. (2008). Blur insensitive texture classification using local phase quantization. In Image and signal processing (pp. 236--243). Springer Berlin Heidelberg.
[21]
Bosch, A., Zisserman, A., & Munoz, X. (2007, July). Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval (pp. 401--408). ACM.
[22]
Liu, M., Wang, R., Huang, Z., Shan, S., & Chen, X. (2013, December). Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 525--530). ACM.
[23]
Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(6), 915--928.
[24]
Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7), 971--987.
[25]
Päivärinta, J., Rahtu, E., & Heikkilä, J. (2011). Volume local phase quantization for blur-insensitive dynamic texture classification. In Image Analysis (pp. 360--369). Springer Berlin Heidelberg.
[26]
Eyben, F., Wöllmer, M., & Schuller, B. (2010, October). Opensmile: the Munich versatile and fast open-source audio feature extractor. In Proceedings of the international conference on Multimedia (pp. 1459--1462). ACM.
[27]
Wöllmer, M., Weninger, F., Eyben, F., & Schuller, B. (2011). Acoustic-Linguistic Recognition of Interest in Speech with Bottleneck-BLSTM Nets. In INTERSPEECH (pp. 77--80).
[28]
Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2), 415--425.

Cited By

View all
  • (2024)Online multi-hypergraph fusion learning for cross-subject emotion recognitionInformation Fusion10.1016/j.inffus.2024.102338108(102338)Online publication date: Aug-2024
  • (2023)Multimodal Physiological Signals Fusion for Online Emotion RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612555(5879-5888)Online publication date: 26-Oct-2023
  • (2023)A Survey on Artificial Intelligence-Based Acoustic Source IdentificationIEEE Access10.1109/ACCESS.2023.328398211(60078-60108)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction
        November 2014
        558 pages
        ISBN:9781450328852
        DOI:10.1145/2663204
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 November 2014

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. emotion recognition
        2. feature fusion
        3. hierarchical classifier
        4. multimodal
        5. support vector machine

        Qualifiers

        • Research-article

        Funding Sources

        • The Fundamental Research Funds for the Central Universities of China

        Conference

        ICMI '14
        Sponsor:

        Acceptance Rates

        ICMI '14 Paper Acceptance Rate 51 of 127 submissions, 40%;
        Overall Acceptance Rate 321 of 785 submissions, 41%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)8
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 12 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Online multi-hypergraph fusion learning for cross-subject emotion recognitionInformation Fusion10.1016/j.inffus.2024.102338108(102338)Online publication date: Aug-2024
        • (2023)Multimodal Physiological Signals Fusion for Online Emotion RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612555(5879-5888)Online publication date: 26-Oct-2023
        • (2023)A Survey on Artificial Intelligence-Based Acoustic Source IdentificationIEEE Access10.1109/ACCESS.2023.328398211(60078-60108)Online publication date: 2023
        • (2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
        • (2022)Applied Affective ComputingundefinedOnline publication date: 25-Jan-2022
        • (2021)A Mental Health Assessment Model of College Students Using Intelligent TechnologyWireless Communications and Mobile Computing10.1155/2021/74857962021(1-10)Online publication date: 21-Aug-2021
        • (2021)Fine-Grained Facial Expression Recognition in the WildIEEE Transactions on Information Forensics and Security10.1109/TIFS.2020.300732716(482-494)Online publication date: 2021
        • (2021)Automatic Recognition of Facial Displays of Unfelt EmotionsIEEE Transactions on Affective Computing10.1109/TAFFC.2018.287499612:2(377-390)Online publication date: 1-Apr-2021
        • (2021)A novel approach for facial expression recognition based on Gabor filters and genetic algorithmEvolving Systems10.1007/s12530-021-09393-213:2(331-345)Online publication date: 9-Jul-2021
        • (2020)C-FaceProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology10.1145/3379337.3415879(112-125)Online publication date: 20-Oct-2020
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media