More Web Proxy on the site http://driver.im/

short-paper

Public Access

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions

Authors:

Luisa F. Polanía,

Charles Boncelet,

Kenneth E. BarnerAuthors Info & Claims

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 635 - 639

https://doi.org/10.1145/3242969.3264990

Published: 02 October 2018 Publication History

Abstract

This paper presents a hybrid deep learning network submitted to the 6th Emotion Recognition in the Wild (EmotiW 2018) Grand Challenge [9], in the category of group-level emotion recognition. Advanced deep learning models trained individually on faces, scenes, skeletons and salient regions using visual attention mechanisms are fused to classify the emotion of a group of people in an image as positive, neutral or negative. Experimental results show that the proposed hybrid network achieves 78.98% and 68.08% classification accuracy on the validation and testing sets, respectively. These results outperform the baseline of 64% and 61%, and achieved the first place in the challenge.

References

[1]

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017. Bottom-Up and Top-Down Attention for Image Captioning and VQA. CoRR Vol. abs/1707.07998 (2017). {arxiv}1707.07998 http://arxiv.org/abs/1707.07998.

[2]

J. Bullington. 2005. Affective computing and emotion recognition systems: the future of biometric surveillance? In Proceedings of the 2nd annual conference on Information security curriculum development. ACM, 95--99.

Digital Library

[3]

Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. 2017. VGGFace2: A dataset for recognising faces across pose and age. CoRR Vol. abs/1710.08092 (2017). {arxiv}1710.08092 http://arxiv.org/abs/1710.08092.

[4]

Z. Cao, T. Simon, S. Wei, and Y. Sheikh. 2016. Realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050 (2016).

[5]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database CVPR.

[6]

A. Dhall, A. Asthana, and R. Goecke. 2010. Facial expression based automatic album creation. In International Conference on Neural Information Processing. Springer, 485--492.

Digital Library

[7]

A. Dhall, R. Goecke, and T. Gedeon. 2015. Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing Vol. 6, 1 (2015), 13--26.

Digital Library

[8]

A. Dhall, J. Joshi, K. Sikka, R. Goecke, and N. Sebe. 2015. The more the merrier: Analysing the affect of a group of people in images IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, Vol. Vol. 1. IEEE, 1--8.

[9]

Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction (ACM International Conference on Multimodal Interaction 2018 (in press)). ACM.

Digital Library

[10]

I. J. Goodfellow et al. 2013. Challenges in representation learning: A report on three machine learning contests International Conference on Neural Information Processing. Springer, 117--124.

[11]

X. Guo, L.F. Polan#237;a, and K. E. Barner. 2017. Group-level emotion recognition using deep models on image scene, faces, and skeletons. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 603--608.

Digital Library

[12]

Xin Guo, Luisa F. Polania, and Kenneth E. Barner. 2018. Smile detection in the wild based on transfer learning. (2018).

[13]

Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao. 2016. MS-Celeb-1M: A Dataset and Benchmark for Large Scale Face Recognition European Conference on Computer Vision. Springer.

[14]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[15]

S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. Vol. 9, 8 (Nov. 1997), 1735--1780.

Digital Library

[16]

Jie Hu, Li Shen, and Gang Sun. 2017. Squeeze-and-Excitation Networks. CoRR Vol. abs/1709.01507 (2017). {arxiv}1709.01507 http://arxiv.org/abs/1709.01507.

[17]

Xiaohua Huang, Abhinav Dhall, Guoying Zhao, Roland Goecke, and Matti Pietikäinen. 2015. Riesz-based Volume Local Binary Pattern and A Novel Group Expression Model for Group Happiness Intensity Analysis. In BMVC. 1--9.

[18]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[19]

J. Li, S. Roy, J. Feng, and T. Sim. 2016. Happiness level prediction with sequential inputs via multiple regressions Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 487--493.

Digital Library

[20]

Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-Margin Softmax Loss for Convolutional Neural Networks Proceedings of The 33rd International Conference on Machine Learning. 507--516.

Digital Library

[21]

Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning.

[22]

Volodymyr Mnih, Nicolas Heess, Alex Graves, and koray kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2204--2212. http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf.

Digital Library

[23]

W. Mou, O. Celiktutan, and H. Gunes. 2015. Group-level arousal and valence recognition in static images: Face, body and context IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. Vol. 5. IEEE, 1--6.

[24]

P. M. Niedenthal and M. Brauer. 2012. Social functionality of human emotion. Annual review of psychology Vol. 63 (2012), 259--285.

[25]

O. M. Parkhi, A. Vedaldi, and A. Zisserman. 2015. Deep Face Recognition. In British Machine Vision Conference.

[26]

F. E. Pollick, H. M. Paterson, A. Bruderlin, and A. J. Sanford. 2001. Perceiving affect from arm movement. Cognition Vol. 82, 2 (2001), B51--B61.

[27]

Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2016. Self-critical Sequence Training for Image Captioning. CoRR Vol. abs/1612.00563 (2016). {arxiv}1612.00563 http://arxiv.org/abs/1612.00563.

[28]

K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[29]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.

[30]

L. Tan, K. Zhang, K. Wang, X. Zeng, X. Peng, and Y. Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 549--552.

Digital Library

[31]

S. Tomas, J. Hanbyul, M. Iain, and S. Yaser. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping CVPR.

[32]

T. Vandal, D. McDuff, and R. El Kaliouby. 2015. Event detection: Ultra large-scale clustering of facial expressions IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. Vol. 1. IEEE, 1--8.

[33]

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual Attention Network for Image Classification. CoRR Vol. abs/1704.06904 (2017). {arxiv}1704.06904 http://arxiv.org/abs/1704.06904.

[34]

S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. 2016. Convolutional pose machines. In CVPR.

[35]

J. Whitehill, G. Littlewort, I. Fasel, M. Bartlett, and J. Movellan. 2009. Toward practical smile detection. IEEE transactions on pattern analysis and machine intelligence Vol. 31, 11 (2009), 2106--2111.

Digital Library

[36]

J. Wu and J.M. Rehg. 2011. CENTRIST: A Visual Descriptor for Scene Categorization. IEEE Trans. Pattern Anal. Mach. Intell. Vol. 33, 8 (2011), 1489--1501.

Digital Library

[37]

Huijuan Xu and Kate Saenko. 2016. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII. 451--466.

[38]

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alexander J. Smola. 2015. Stacked Attention Networks for Image Question Answering. CoRR Vol. abs/1511.02274 (2015). http://arxiv.org/abs/1511.02274.

[39]

K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters Vol. 23, 10 (Oct. 2016), 1499--1503.

Cited By

Karas VSchuller DSchuller B(2024)Audiovisual Affect Recognition for Autonomous Vehicles: Applications and Future AgendasIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.333374925:6(4918-4932)Online publication date: Jun-2024
https://doi.org/10.1109/TITS.2023.3333749
Wang XZhang DLee D(2024)Implementing the Affective Mechanism for Group Emotion Recognition With a New Graph Convolutional Network ArchitectureIEEE Transactions on Affective Computing10.1109/TAFFC.2023.332010115:3(1104-1115)Online publication date: Jul-2024
https://doi.org/10.1109/TAFFC.2023.3320101
Xu JHuang X(2024)Group-Level Emotion Recognition Using Hierarchical Dual-Branch Cross Transformer with Semi-Supervised Learning2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674336(252-256)Online publication date: 21-Jun-2024
https://doi.org/10.1109/SEAI62072.2024.10674336
Show More Cited By

Index Terms

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding
        Scene understanding
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning

Recommendations

From individual to group-level emotion recognition: EmotiW 5.0
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

Research in automatic affect recognition has come a long way. This paper describes the fifth Emotion Recognition in the Wild (EmotiW) challenge 2017. EmotiW aims at providing a common benchmarking platform for researchers working on different aspects ...
Group-level emotion recognition using deep models on image scene, faces, and skeletons
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

This paper presents the work submitted to the Group-level Emotion Recognition sub-challenge, which is part of the 5th Emotion Recognition in the Wild (EmotiW 2017) Challenge. The task of this sub-challenge is to classify the emotion of a group of ...
Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Group-level Emotion Recognition (GER) in the wild is a challenging task gaining lots of attention. Most recent works utilized two channels of information, a channel involving only faces and a channel containing the whole image, to solve this problem. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

October 2018

687 pages

ISBN:9781450356923

DOI:10.1145/3242969

General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Science Foundation

Conference

ICMI '18

Sponsor:

SIGCHI

ICMI '18: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 16 - 20, 2018

CO, Boulder, USA

Acceptance Rates

ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
1,156
Total Downloads

Downloads (Last 12 months)206
Downloads (Last 6 weeks)29

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Karas VSchuller DSchuller B(2024)Audiovisual Affect Recognition for Autonomous Vehicles: Applications and Future AgendasIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.333374925:6(4918-4932)Online publication date: Jun-2024
https://doi.org/10.1109/TITS.2023.3333749
Wang XZhang DLee D(2024)Implementing the Affective Mechanism for Group Emotion Recognition With a New Graph Convolutional Network ArchitectureIEEE Transactions on Affective Computing10.1109/TAFFC.2023.332010115:3(1104-1115)Online publication date: Jul-2024
https://doi.org/10.1109/TAFFC.2023.3320101
Xu JHuang X(2024)Group-Level Emotion Recognition Using Hierarchical Dual-Branch Cross Transformer with Semi-Supervised Learning2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674336(252-256)Online publication date: 21-Jun-2024
https://doi.org/10.1109/SEAI62072.2024.10674336
Kumar DDhamdhere PRaman B(2024)Fusing Multimodal Streams for Improved Group Emotion Recognition in VideosPattern Recognition10.1007/978-3-031-78305-0_26(403-418)Online publication date: 4-Dec-2024
https://doi.org/10.1007/978-3-031-78305-0_26
Wang XZhang DTan HLee D(2023)A Self-Fusion Network Based on Contrastive Learning for Group Emotion RecognitionIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.320224910:2(458-469)Online publication date: Apr-2023
https://doi.org/10.1109/TCSS.2022.3202249
Veltmeijer EGerritsen CHindriks K(2023)Automatic Emotion Recognition for Groups: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306572614:1(89-107)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TAFFC.2021.3065726
Zhao CZhang GLv LLiu HLu D(2023)Spatial-temporal Consistency based Crowd Emotion Recognition2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD57460.2023.10152652(685-690)Online publication date: 24-May-2023
https://doi.org/10.1109/CSCWD57460.2023.10152652
Malhotra ASharma GKumar RDhall AHoey J(2023)Social Event Context and Affect Prediction in Group Videos2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)10.1109/ACIIW59127.2023.10388162(1-8)Online publication date: 10-Sep-2023
https://doi.org/10.1109/ACIIW59127.2023.10388162
Rathod BVanzara RPandya D(2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
https://doi.org/10.1016/j.jvcir.2023.103988
Ognibene DDonabauer GTheophilou EBuršić SLomonaco FWilkens RHernández-Leo DKruschwitz U(2023)Moving Beyond Benchmarks and Competitions: Towards Addressing Social Media Challenges in an Educational ContextDatenbank-Spektrum10.1007/s13222-023-00436-323:1(27-39)Online publication date: 24-Feb-2023
https://doi.org/10.1007/s13222-023-00436-3
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents