[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3242969.3264990acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper
Public Access

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions

Published: 02 October 2018 Publication History

Abstract

This paper presents a hybrid deep learning network submitted to the 6th Emotion Recognition in the Wild (EmotiW 2018) Grand Challenge [9], in the category of group-level emotion recognition. Advanced deep learning models trained individually on faces, scenes, skeletons and salient regions using visual attention mechanisms are fused to classify the emotion of a group of people in an image as positive, neutral or negative. Experimental results show that the proposed hybrid network achieves 78.98% and 68.08% classification accuracy on the validation and testing sets, respectively. These results outperform the baseline of 64% and 61%, and achieved the first place in the challenge.

References

[1]
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017. Bottom-Up and Top-Down Attention for Image Captioning and VQA. CoRR Vol. abs/1707.07998 (2017). {arxiv}1707.07998 http://arxiv.org/abs/1707.07998.
[2]
J. Bullington. 2005. Affective computing and emotion recognition systems: the future of biometric surveillance? In Proceedings of the 2nd annual conference on Information security curriculum development. ACM, 95--99.
[3]
Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. 2017. VGGFace2: A dataset for recognising faces across pose and age. CoRR Vol. abs/1710.08092 (2017). {arxiv}1710.08092 http://arxiv.org/abs/1710.08092.
[4]
Z. Cao, T. Simon, S. Wei, and Y. Sheikh. 2016. Realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050 (2016).
[5]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database CVPR.
[6]
A. Dhall, A. Asthana, and R. Goecke. 2010. Facial expression based automatic album creation. In International Conference on Neural Information Processing. Springer, 485--492.
[7]
A. Dhall, R. Goecke, and T. Gedeon. 2015. Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing Vol. 6, 1 (2015), 13--26.
[8]
A. Dhall, J. Joshi, K. Sikka, R. Goecke, and N. Sebe. 2015. The more the merrier: Analysing the affect of a group of people in images IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, Vol. Vol. 1. IEEE, 1--8.
[9]
Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction (ACM International Conference on Multimodal Interaction 2018 (in press)). ACM.
[10]
I. J. Goodfellow et al. 2013. Challenges in representation learning: A report on three machine learning contests International Conference on Neural Information Processing. Springer, 117--124.
[11]
X. Guo, L.F. Polan#237;a, and K. E. Barner. 2017. Group-level emotion recognition using deep models on image scene, faces, and skeletons. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 603--608.
[12]
Xin Guo, Luisa F. Polania, and Kenneth E. Barner. 2018. Smile detection in the wild based on transfer learning. (2018).
[13]
Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao. 2016. MS-Celeb-1M: A Dataset and Benchmark for Large Scale Face Recognition European Conference on Computer Vision. Springer.
[14]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[15]
S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. Vol. 9, 8 (Nov. 1997), 1735--1780.
[16]
Jie Hu, Li Shen, and Gang Sun. 2017. Squeeze-and-Excitation Networks. CoRR Vol. abs/1709.01507 (2017). {arxiv}1709.01507 http://arxiv.org/abs/1709.01507.
[17]
Xiaohua Huang, Abhinav Dhall, Guoying Zhao, Roland Goecke, and Matti Pietikäinen. 2015. Riesz-based Volume Local Binary Pattern and A Novel Group Expression Model for Group Happiness Intensity Analysis. In BMVC. 1--9.
[18]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems. 1097--1105.
[19]
J. Li, S. Roy, J. Feng, and T. Sim. 2016. Happiness level prediction with sequential inputs via multiple regressions Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 487--493.
[20]
Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-Margin Softmax Loss for Convolutional Neural Networks Proceedings of The 33rd International Conference on Machine Learning. 507--516.
[21]
Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning.
[22]
Volodymyr Mnih, Nicolas Heess, Alex Graves, and koray kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2204--2212. http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf.
[23]
W. Mou, O. Celiktutan, and H. Gunes. 2015. Group-level arousal and valence recognition in static images: Face, body and context IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. Vol. 5. IEEE, 1--6.
[24]
P. M. Niedenthal and M. Brauer. 2012. Social functionality of human emotion. Annual review of psychology Vol. 63 (2012), 259--285.
[25]
O. M. Parkhi, A. Vedaldi, and A. Zisserman. 2015. Deep Face Recognition. In British Machine Vision Conference.
[26]
F. E. Pollick, H. M. Paterson, A. Bruderlin, and A. J. Sanford. 2001. Perceiving affect from arm movement. Cognition Vol. 82, 2 (2001), B51--B61.
[27]
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2016. Self-critical Sequence Training for Image Captioning. CoRR Vol. abs/1612.00563 (2016). {arxiv}1612.00563 http://arxiv.org/abs/1612.00563.
[28]
K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[29]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
[30]
L. Tan, K. Zhang, K. Wang, X. Zeng, X. Peng, and Y. Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 549--552.
[31]
S. Tomas, J. Hanbyul, M. Iain, and S. Yaser. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping CVPR.
[32]
T. Vandal, D. McDuff, and R. El Kaliouby. 2015. Event detection: Ultra large-scale clustering of facial expressions IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. Vol. 1. IEEE, 1--8.
[33]
Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual Attention Network for Image Classification. CoRR Vol. abs/1704.06904 (2017). {arxiv}1704.06904 http://arxiv.org/abs/1704.06904.
[34]
S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. 2016. Convolutional pose machines. In CVPR.
[35]
J. Whitehill, G. Littlewort, I. Fasel, M. Bartlett, and J. Movellan. 2009. Toward practical smile detection. IEEE transactions on pattern analysis and machine intelligence Vol. 31, 11 (2009), 2106--2111.
[36]
J. Wu and J.M. Rehg. 2011. CENTRIST: A Visual Descriptor for Scene Categorization. IEEE Trans. Pattern Anal. Mach. Intell. Vol. 33, 8 (2011), 1489--1501.
[37]
Huijuan Xu and Kate Saenko. 2016. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII. 451--466.
[38]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alexander J. Smola. 2015. Stacked Attention Networks for Image Question Answering. CoRR Vol. abs/1511.02274 (2015). http://arxiv.org/abs/1511.02274.
[39]
K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters Vol. 23, 10 (Oct. 2016), 1499--1503.

Cited By

View all
  • (2024)Audiovisual Affect Recognition for Autonomous Vehicles: Applications and Future AgendasIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.333374925:6(4918-4932)Online publication date: Jun-2024
  • (2024)Implementing the Affective Mechanism for Group Emotion Recognition With a New Graph Convolutional Network ArchitectureIEEE Transactions on Affective Computing10.1109/TAFFC.2023.332010115:3(1104-1115)Online publication date: Jul-2024
  • (2024)Group-Level Emotion Recognition Using Hierarchical Dual-Branch Cross Transformer with Semi-Supervised Learning2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674336(252-256)Online publication date: 21-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
October 2018
687 pages
ISBN:9781450356923
DOI:10.1145/3242969
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. emotiw 2018
  2. group-level emotion recognition
  3. multi-model
  4. scene understanding
  5. visual attention

Qualifiers

  • Short-paper

Funding Sources

Conference

ICMI '18
Sponsor:
  • SIGCHI

Acceptance Rates

ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)206
  • Downloads (Last 6 weeks)29
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Audiovisual Affect Recognition for Autonomous Vehicles: Applications and Future AgendasIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.333374925:6(4918-4932)Online publication date: Jun-2024
  • (2024)Implementing the Affective Mechanism for Group Emotion Recognition With a New Graph Convolutional Network ArchitectureIEEE Transactions on Affective Computing10.1109/TAFFC.2023.332010115:3(1104-1115)Online publication date: Jul-2024
  • (2024)Group-Level Emotion Recognition Using Hierarchical Dual-Branch Cross Transformer with Semi-Supervised Learning2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674336(252-256)Online publication date: 21-Jun-2024
  • (2024)Fusing Multimodal Streams for Improved Group Emotion Recognition in VideosPattern Recognition10.1007/978-3-031-78305-0_26(403-418)Online publication date: 4-Dec-2024
  • (2023)A Self-Fusion Network Based on Contrastive Learning for Group Emotion RecognitionIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.320224910:2(458-469)Online publication date: Apr-2023
  • (2023)Automatic Emotion Recognition for Groups: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306572614:1(89-107)Online publication date: 1-Jan-2023
  • (2023)Spatial-temporal Consistency based Crowd Emotion Recognition2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD57460.2023.10152652(685-690)Online publication date: 24-May-2023
  • (2023)Social Event Context and Affect Prediction in Group Videos2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)10.1109/ACIIW59127.2023.10388162(1-8)Online publication date: 10-Sep-2023
  • (2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
  • (2023)Moving Beyond Benchmarks and Competitions: Towards Addressing Social Media Challenges in an Educational ContextDatenbank-Spektrum10.1007/s13222-023-00436-323:1(27-39)Online publication date: 24-Feb-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media