More Web Proxy on the site http://driver.im/

research-article

Open access

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Authors:

Björn W. Schuller,

Jianhua TaoAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 9610 - 9614

https://doi.org/10.1145/3581783.3612836

Published: 27 October 2023 Publication History

Abstract

The first Multimodal Emotion Recognition Challenge (MER 2023)1 was successfully held at ACM Multimedia. The challenge focuses on system robustness and consists of three distinct tracks: (1) MER-MULTI, where participants are required to recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provides a large amount of unlabeled samples for semi-supervised learning. In this paper, we introduce the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants. To continue using this dataset after MER 2023, please sign a new End User License Agreement2 and send it to our official email address3. We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.

References

[1]

Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic, and Christopher Pal. Recurrent neural networks for emotion recognition in video. In Proceedings of the International Conference on Multimodal Interaction, pages 467--474, 2015.

[2]

Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vin-cent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, et al. Emonets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces, 10:99--111, 2016.

[3]

Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3):55--75, 2018.

[4]

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. Iemo-cap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335--359, 2008.

[5]

AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2236--2246, 2018.

[6]

Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 19--26, 2017.

Digital Library

[7]

Md Shad Akhtar, Dushyant Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. Multi-task learning for multi-modal emotion recognition and sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 370--379, 2019.

[8]

Huang-Cheng Chou, Chi-Chun Lee, and Carlos Busso. Exploiting co-occurrence frequency of emotions in perceptual evaluations to train a speech emotion classifier. In Proceedings of the Interspeech, pages 161--165, 2022.

[9]

Kexin Wang, Zheng Lian, Licai Sun, Bin Liu, Jianhua Tao, and Yin Fan. Emotional reaction analysis based on multi-label graph convolutional networks and dynamic facial expression recognition transformer. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, pages 75--80, 2022.

Digital Library

[10]

Jinming Zhao, Ruichen Li, and Qin Jin. Missing modality imagination network for emotion recognition with uncertain missing modalities. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2608--2618, 2021.

[11]

Ziqi Yuan, Wei Li, Hua Xu, and Wenmeng Yu. Transformer-based feature recon- struction network for robust multimodal sentiment analysis. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4400--4407, 2021.

Digital Library

[12]

Licai Sun, Zheng Lian, Bin Liu, and Jianhua Tao. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. arXiv preprint arXiv:2208.07589, 2022.

[13]

Devamanyu Hazarika, Yingting Li, Bo Cheng, Shuai Zhao, Roger Zimmermann, and Soujanya Poria. Analyzing modality robustness in multimodal sentiment analysis. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 685--696, 2022.

[14]

Changqing Zhang, Yajie Cui, Zongbo Han, Joey Tianyi Zhou, Huazhu Fu, and Qinghua Hu. Deep partial multi-view learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(05):2402--2415, 2022.

[15]

Zheng Lian, Lan Chen, Licai Sun, Bin Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.

Digital Library

[16]

Zheng Lian, Jianhua Tao, Bin Liu, and Jian Huang. Unsupervised representation learning with future observation prediction for speech emotion recognition. Proceedings of the Interspeech, pages 3840--3844, 2019.

[17]

Rui Mao, Qian Liu, Kai He, Wei Li, and Erik Cambria. The biases of pre-trained language models: An empirical study on prompt-based sentiment analysis and emotion detection. IEEE Transactions on Affective Computing, 2022.

[18]

Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of transfer learning. Journal of Big data, 3(1):1--40, 2016.

[19]

Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Proceedings of the Advances in Neural Information Processing Systems, 2022.

[20]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15979--15988. IEEE, 2022.

[21]

Björn Schuller, Michel Valstar, Florian Eyben, Gary McKeown, Roddy Cowie, and Maja Pantic. Avec 2011-the first international audio/visual emotion challenge. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, pages 415--424. Springer, 2011.

[22]

Björn Schuller, Michel Valster, Florian Eyben, Roddy Cowie, and Maja Pantic. Avec 2012: the continuous audio/visual emotion challenge. In Proceedings of the International Conference on Multimodal Interaction, pages 449--456. ACM, 2012.

[23]

Michel Valstar, Björn Schuller, Kirsty Smith, Florian Eyben, Bihan Jiang, Sanjay Bilakhia, Sebastian Schnieder, Roddy Cowie, and Maja Pantic. Avec 2013: the continuous audio/visual emotion and depression recognition challenge. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2013.

Digital Library

[24]

Michel Valstar, Björn Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2014.

Digital Library

[25]

Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic. Avec 2015: The 5th international audio/visual emotion challenge and workshop. In Proceedings of the 23rd ACM International Conference on Multimedia, pages 1335--1336, 2015.

Digital Library

[26]

Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2016.

Digital Library

[27]

Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 3--9, 2017.

Digital Library

[28]

Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, et al. Avec 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 3--13, 2018.

Digital Library

[29]

Fabien Ringeval, Björn Schuller, Michel Valstar, Nicholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, et al. Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, pages 3--12, 2019.

Digital Library

[30]

Abhinav Dhall, Roland Goecke, Jyoti Joshi, Michael Wagner, and Tom Gedeon. Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pages 509--516, 2013.

Digital Library

[31]

Abhinav Dhall, Roland Goecke, Jyoti Joshi, Karan Sikka, and Tom Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 461--466, 2014.

Digital Library

[32]

Abhinav Dhall, OV Ramana Murthy, Roland Goecke, Jyoti Joshi, and Tom Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 423--426, 2015.

Digital Library

[33]

Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, pages 427--432, 2016.

Digital Library

[34]

Abhinav Dhall, Roland Goecke, Shreya Ghosh, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 524--528, 2017.

Digital Library

[35]

Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, pages 653--656, 2018.

[36]

Abhinav Dhall. Emotiw 2019: Automatic emotion, engagement and cohesion prediction tasks. In Proceedings of the International Conference on Multimodal Interaction, pages 546--550, 2019.

[37]

Abhinav Dhall, Garima Sharma, Roland Goecke, and Tom Gedeon. Emotiw 2020: Driver gaze, group emotion, student engagement and physiological signal based challenges. In Proceedings of the International Conference on Multimodal Interaction, pages 784--789, 2020.

[38]

Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W Schuller, Iulia Lefter, et al. Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild. In Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, pages 35--44, 2020.

Digital Library

[39]

Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, and Björn W Schuller. The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. In Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, pages 5--14, 2021.

Digital Library

[40]

Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Meßner, Alan Cowen, Erik Cambria, and Björn W Schuller. Muse 2022 challenge: Multimodal humour, emotional reactions, and stress. In Proceedings of the 30th ACM International Conference on Multimedia, pages 7389--7391, 2022.

Digital Library

[41]

Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. Meld: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pages 527--536, 2019.

[42]

Zheng Lian, Bin Liu, and Jianhua Tao. Ctnet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:985--1000, 2021.

[43]

Zheng Lian, Bin Liu, and Jianhua Tao. Decn: Dialogical emotion correction network for conversational emotion recognition. Neurocomputing, 454:483--495, 2021.

Cited By

Tao JFan CLian ZLyu ZShen YLiang S(2024)Development of multimodal sentiment recognition and understandingJournal of Image and Graphics10.11834/jig.24001729:6(1607-1627)Online publication date: 2024
https://doi.org/10.11834/jig.240017
Lian ZLiu BLiu RXu KCambria EZhao GSchuller BTao JTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)MRAC'24 Track 2: 2nd International Workshop on Multimodal and Responsible Affective ComputingProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3696103(39-40)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3696103
Lian ZSun HSun LWen ZZhang SChen SGu HZhao JMa ZChen XYi JLiu RXu KLiu BCambria EZhao GSchuller BTao JTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion RecognitionProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689959(41-48)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689959
Show More Cited By

Index Terms

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Stacked co-training for semi-supervised multi-label learning
Abstract
Due to the difficulty of annotation, multi-label learning sometimes obtains a small amount of labeled data and a large amount of unlabeled data as supplements. To make up this issue, many algorithms extended the existing semi-supervised ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CCF-Baidu Open Fund
Open Research Projects of Zhejiang Lab
National Natural Science Foundation of China (NSFC)
Beijing Municipal Science \& Technology Commission, Administrative Commission of Zhongguancun Science Park

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
1,852
Total Downloads

Downloads (Last 12 months)1,586
Downloads (Last 6 weeks)143

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tao JFan CLian ZLyu ZShen YLiang S(2024)Development of multimodal sentiment recognition and understandingJournal of Image and Graphics10.11834/jig.24001729:6(1607-1627)Online publication date: 2024
https://doi.org/10.11834/jig.240017
Lian ZLiu BLiu RXu KCambria EZhao GSchuller BTao JTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)MRAC'24 Track 2: 2nd International Workshop on Multimodal and Responsible Affective ComputingProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3696103(39-40)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3696103
Lian ZSun HSun LWen ZZhang SChen SGu HZhao JMa ZChen XYi JLiu RXu KLiu BCambria EZhao GSchuller BTao JTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion RecognitionProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689959(41-48)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689959
Cai YYe RXie JZhou YXu YWu ZTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and MixupProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689418(93-97)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689418
Ge MLi MTang DLi PLiu KDeng SPu SLiu LSong YZhang TTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)Early Joint Learning of Emotion Information Makes MultiModal Model Understand You BetterProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689415(54-61)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689415
Gao FShi PTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)Audio-Guided Fusion Techniques for Multimodal Emotion AnalysisProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689414(62-66)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689414
Fan QLi YXin YCheng XGao GMa MTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled SamplesProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689412(72-77)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689412
Zhao ZChen HLi XJiang DXie LTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual AlignmentProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689407(67-71)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689407
Cheng ZTu SHuang DLi MPeng XCheng ZHauptmann ATao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion RecognitionProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689404(78-87)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689404
Lai ZHong XWang YTao JGhosh SLian ZCai ZSchuller BDhall AZhao GKollias DCambria EGoecke RGedeon T(2024)Multimodal Blockwise Transformer for Robust Sentiment RecognitionProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689399(88-92)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689092.3689399
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents