[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3581783.3612836acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Published: 27 October 2023 Publication History

Abstract

The first Multimodal Emotion Recognition Challenge (MER 2023)1 was successfully held at ACM Multimedia. The challenge focuses on system robustness and consists of three distinct tracks: (1) MER-MULTI, where participants are required to recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provides a large amount of unlabeled samples for semi-supervised learning. In this paper, we introduce the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants. To continue using this dataset after MER 2023, please sign a new End User License Agreement2 and send it to our official email address3. We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.

References

[1]
Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic, and Christopher Pal. Recurrent neural networks for emotion recognition in video. In Proceedings of the International Conference on Multimodal Interaction, pages 467--474, 2015.
[2]
Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vin-cent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, et al. Emonets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces, 10:99--111, 2016.
[3]
Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3):55--75, 2018.
[4]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. Iemo-cap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335--359, 2008.
[5]
AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2236--2246, 2018.
[6]
Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 19--26, 2017.
[7]
Md Shad Akhtar, Dushyant Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. Multi-task learning for multi-modal emotion recognition and sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 370--379, 2019.
[8]
Huang-Cheng Chou, Chi-Chun Lee, and Carlos Busso. Exploiting co-occurrence frequency of emotions in perceptual evaluations to train a speech emotion classifier. In Proceedings of the Interspeech, pages 161--165, 2022.
[9]
Kexin Wang, Zheng Lian, Licai Sun, Bin Liu, Jianhua Tao, and Yin Fan. Emotional reaction analysis based on multi-label graph convolutional networks and dynamic facial expression recognition transformer. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, pages 75--80, 2022.
[10]
Jinming Zhao, Ruichen Li, and Qin Jin. Missing modality imagination network for emotion recognition with uncertain missing modalities. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2608--2618, 2021.
[11]
Ziqi Yuan, Wei Li, Hua Xu, and Wenmeng Yu. Transformer-based feature recon- struction network for robust multimodal sentiment analysis. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4400--4407, 2021.
[12]
Licai Sun, Zheng Lian, Bin Liu, and Jianhua Tao. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. arXiv preprint arXiv:2208.07589, 2022.
[13]
Devamanyu Hazarika, Yingting Li, Bo Cheng, Shuai Zhao, Roger Zimmermann, and Soujanya Poria. Analyzing modality robustness in multimodal sentiment analysis. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 685--696, 2022.
[14]
Changqing Zhang, Yajie Cui, Zongbo Han, Joey Tianyi Zhou, Huazhu Fu, and Qinghua Hu. Deep partial multi-view learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(05):2402--2415, 2022.
[15]
Zheng Lian, Lan Chen, Licai Sun, Bin Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[16]
Zheng Lian, Jianhua Tao, Bin Liu, and Jian Huang. Unsupervised representation learning with future observation prediction for speech emotion recognition. Proceedings of the Interspeech, pages 3840--3844, 2019.
[17]
Rui Mao, Qian Liu, Kai He, Wei Li, and Erik Cambria. The biases of pre-trained language models: An empirical study on prompt-based sentiment analysis and emotion detection. IEEE Transactions on Affective Computing, 2022.
[18]
Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of transfer learning. Journal of Big data, 3(1):1--40, 2016.
[19]
Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Proceedings of the Advances in Neural Information Processing Systems, 2022.
[20]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15979--15988. IEEE, 2022.
[21]
Björn Schuller, Michel Valstar, Florian Eyben, Gary McKeown, Roddy Cowie, and Maja Pantic. Avec 2011-the first international audio/visual emotion challenge. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, pages 415--424. Springer, 2011.
[22]
Björn Schuller, Michel Valster, Florian Eyben, Roddy Cowie, and Maja Pantic. Avec 2012: the continuous audio/visual emotion challenge. In Proceedings of the International Conference on Multimodal Interaction, pages 449--456. ACM, 2012.
[23]
Michel Valstar, Björn Schuller, Kirsty Smith, Florian Eyben, Bihan Jiang, Sanjay Bilakhia, Sebastian Schnieder, Roddy Cowie, and Maja Pantic. Avec 2013: the continuous audio/visual emotion and depression recognition challenge. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2013.
[24]
Michel Valstar, Björn Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2014.
[25]
Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic. Avec 2015: The 5th international audio/visual emotion challenge and workshop. In Proceedings of the 23rd ACM International Conference on Multimedia, pages 1335--1336, 2015.
[26]
Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2016.
[27]
Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 3--9, 2017.
[28]
Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, et al. Avec 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 3--13, 2018.
[29]
Fabien Ringeval, Björn Schuller, Michel Valstar, Nicholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, et al. Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, pages 3--12, 2019.
[30]
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Michael Wagner, and Tom Gedeon. Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pages 509--516, 2013.
[31]
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Karan Sikka, and Tom Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 461--466, 2014.
[32]
Abhinav Dhall, OV Ramana Murthy, Roland Goecke, Jyoti Joshi, and Tom Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 423--426, 2015.
[33]
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, pages 427--432, 2016.
[34]
Abhinav Dhall, Roland Goecke, Shreya Ghosh, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 524--528, 2017.
[35]
Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, pages 653--656, 2018.
[36]
Abhinav Dhall. Emotiw 2019: Automatic emotion, engagement and cohesion prediction tasks. In Proceedings of the International Conference on Multimodal Interaction, pages 546--550, 2019.
[37]
Abhinav Dhall, Garima Sharma, Roland Goecke, and Tom Gedeon. Emotiw 2020: Driver gaze, group emotion, student engagement and physiological signal based challenges. In Proceedings of the International Conference on Multimodal Interaction, pages 784--789, 2020.
[38]
Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W Schuller, Iulia Lefter, et al. Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild. In Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, pages 35--44, 2020.
[39]
Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, and Björn W Schuller. The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. In Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, pages 5--14, 2021.
[40]
Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Meßner, Alan Cowen, Erik Cambria, and Björn W Schuller. Muse 2022 challenge: Multimodal humour, emotional reactions, and stress. In Proceedings of the 30th ACM International Conference on Multimedia, pages 7389--7391, 2022.
[41]
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. Meld: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pages 527--536, 2019.
[42]
Zheng Lian, Bin Liu, and Jianhua Tao. Ctnet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:985--1000, 2021.
[43]
Zheng Lian, Bin Liu, and Jianhua Tao. Decn: Dialogical emotion correction network for conversational emotion recognition. Neurocomputing, 454:483--495, 2021.

Cited By

View all
  • (2024)Development of multimodal sentiment recognition and understandingJournal of Image and Graphics10.11834/jig.24001729:6(1607-1627)Online publication date: 2024
  • (2024)MRAC'24 Track 2: 2nd International Workshop on Multimodal and Responsible Affective ComputingProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3696103(39-40)Online publication date: 28-Oct-2024
  • (2024)MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion RecognitionProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689959(41-48)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Check for updates

    Author Tags

    1. modality robustness
    2. multi-label learning
    3. multimodal emotion recognition challenge (mer 2023)
    4. semi-supervised learning

    Qualifiers

    • Research-article

    Funding Sources

    • CCF-Baidu Open Fund
    • Open Research Projects of Zhejiang Lab
    • National Natural Science Foundation of China (NSFC)
    • Beijing Municipal Science \& Technology Commission, Administrative Commission of Zhongguancun Science Park

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,586
    • Downloads (Last 6 weeks)143
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Development of multimodal sentiment recognition and understandingJournal of Image and Graphics10.11834/jig.24001729:6(1607-1627)Online publication date: 2024
    • (2024)MRAC'24 Track 2: 2nd International Workshop on Multimodal and Responsible Affective ComputingProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3696103(39-40)Online publication date: 28-Oct-2024
    • (2024)MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion RecognitionProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689959(41-48)Online publication date: 28-Oct-2024
    • (2024)Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and MixupProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689418(93-97)Online publication date: 28-Oct-2024
    • (2024)Early Joint Learning of Emotion Information Makes MultiModal Model Understand You BetterProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689415(54-61)Online publication date: 28-Oct-2024
    • (2024)Audio-Guided Fusion Techniques for Multimodal Emotion AnalysisProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689414(62-66)Online publication date: 28-Oct-2024
    • (2024)Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled SamplesProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689412(72-77)Online publication date: 28-Oct-2024
    • (2024)Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual AlignmentProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689407(67-71)Online publication date: 28-Oct-2024
    • (2024)SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion RecognitionProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689404(78-87)Online publication date: 28-Oct-2024
    • (2024)Multimodal Blockwise Transformer for Robust Sentiment RecognitionProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing10.1145/3689092.3689399(88-92)Online publication date: 28-Oct-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media