More Web Proxy on the site http://driver.im/

research-article

The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

Authors:

Björn W. Schuller,

Anton Batliner,

Shahin Amiriparian,

Alexander Barnhill,

Maurice Gerczuk,

Andreas Triantafyllopoulos,

Alice E. Baird,

Panagiotis Tzirakis,

Nikola Lackovic,

Marie-José Caraty,

Claude MontaciéAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 9635 - 9639

https://doi.org/10.1145/3581783.3612835

Published: 27 October 2023 Publication History

Abstract

The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' ComPaRE features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit; in addition, wav2vec2 models are used.

References

[1]

Shahin Amiriparian. 2019. Deep Representation Learning Techniques for Audio Signal Processing. Ph.D. Dissertation. Technische Universität München.

[2]

Shahin Amiriparian, Michael Freitag, Nicholas Cummins, and Björn Schuller. 2017a. Sequence to Sequence Autoencoders for Unsupervised Representation Learning from Audio. In Proc. DCASE 2017. IEEE, Munich, Germany, 17--21.

[3]

Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, and Björn Schuller. 2017b. Snore Sound Classification Using Image-based Deep Spectrum Features. In Proc. Interspeech. ISCA, Stockholm, Sweden, 3512--3516.

[4]

Shahin Amiriparian, Maurice Gerczuk, Lukas Stappen, Alice Baird, Lukas Koebe, Sandra Ottl, and Björn Schuller. 2020. Towards Cross-Modal Pre-Training and Learning Tempo-Spatial Characteristics for Audio Recognition with Convolutional and Recurrent Neural Networks. EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2020, 19 (2020), 1--11.

Digital Library

[5]

Muhammad Anshari, Mohammad Nabil Almunawar, Syamimi Ariff Lim, and Abdullah Al-Mudimigh. 2019. Customer relationship management and big data enabled: Personalization & customization of services. Applied Computing and Informatics, Vol. 15 (2019), 94--101.

[6]

Anton Batliner, Simone Hantke, and Björn Schuller. 2020. Ethics and good practice in computational paralinguistics. IEEE Transactions on Affective Computing, Vol. 13, 3 (2020), 1236--1253.

[7]

Alan S. Cowen, Hillary Anger Elfenbein, Petri Laukka, and Dacher Keltner. 2019a. Mapping 24 emotions conveyed by brief human vocalization. Am Psychol., Vol. 74 (2019), 698--712.

[8]

Alan S Cowen and Dacher Keltner. 2021. Semantic space theory: A computational approach to emotion. Trends in Cognitive Sciences, Vol. 25, 2 (2021), 124--136.

[9]

Alan S Cowen, Petri Laukka, Hillary Anger Elfenbein, Runjing Liu, and Dacher Keltner. 2019b. The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures. Nature human behaviour, Vol. 3, 4 (2019), 369--382.

[10]

Hillary Anger Elfenbein, Petri Laukka, Jean Althoff, Wanda Chui, Frederick K Iraki, Thomas Rockstuhl, and Nutankumar S. Thingujam. 2022. What Do We Hear in the Voice? An Open-Ended Judgment Study of Emotional Speech Prosody. Personality and Social Psychology Bulletin, Vol. 48, 7 (2022), 1087--1104.

[11]

Florian Eyben, Felix Weninger, Florian Groß, and Björn Schuller. 2013. Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor. In Proc. ACM Multimedia. ACM, Barcelona, Spain, 835--838.

Digital Library

[12]

Michael Freitag, Shahin Amiriparian, Sergey Pugachevskiy, Nicholas Cummins, and Björn Schuller. 2018. auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks. Journal of Machine Learning Research, Vol. 18 (2018), 1--5.

[13]

Christian Hildebrand, Fotis Efthymiou, Francesc Busquet, William H. Hampton, Donna L. Hoffman, and Thomas P. Novak. 2020. Voice analytics in business research. Conceptual foundations, acoustic feature extraction, and applications. Journal of Business Research, Vol. 121 (2020), 364--374.

[14]

Nikola Lackovic, Claude Montacié, Gauthier Lalande, and Marie-José Caraty. 2022. Prediction of User Request and Complaint in Spoken Customer-Agent Conversations. preprint arXiv:2208.10249 (2022).

[15]

Petri Laukka, Hillary Anger Elfenbein, Wanda Chui, Nutankumar S. Thingujam, Frederick K. Iraki, Thomas Rockstuhl, and Jean Althoff. 2010. Presenting the VENEC corpus: Development of a cross-cultural corpus of vocal emotion expressions and a novel method of annotating emotion appraisals. In Proc. LREC 2010 Workshop on Corpora for Research on Emotion and Affect. LREC, Marrakesh, Morocco, 53--57.

[16]

Petri Laukka, Hillary Anger Elfenbein, Nutankumar S Thingujam, Thomas Rockstuhl, Frederick K Iraki, Wanda Chui, and Jean Althoff. 2016. The expression and recognition of emotions in the voice across five nations: A lens model analysis based on acoustic features. Journal of personality and social psychology, Vol. 111, 5 (2016), 686.

[17]

Reza Lotfian and Carlos Busso. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings. IEEE Transactions on Affective Computing, Vol. 10, 4 (2019), 471--483.

[18]

Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2018. Meld: A multimodal multi-party dataset for emotion recognition in conversations. preprint arXiv:1810.02508 (2018).

[19]

Andrew Rosenberg. 2012. Classifying skewed data: importance weighting to optimize average recall. In Proc. Interspeech. ISCA, Portland, USA, 2242--2245.

[20]

Scott Scheidt and Q. B. Chung. 2019. Making a case for speech analytics to improve customer service quality: Vision, implementation and evaluation. International Journal of Information Management, Vol. 45 (2019), 223--232.

Digital Library

[21]

Björn Schuller and Anton Batliner. 2014. Computational Paralinguistics - Emotion, Affect, and Personality in Speech and Language Processing. Wiley, Chichester, UK.

[22]

Björn Schuller, Anton Batliner, Stefan Steidl, and Dino Seppi. 2011. Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge. Speech Communication, Vol. 53 (2011), 1062--1087.

Digital Library

[23]

Björn Schuller, Stefan Steidl, and Anton Batliner. 2009. The INTERSPEECH 2009 Emotion Challenge. In Proc. Interspeech. ISCA, Brighton, UK, 312--315.

[24]

Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, Marcello Mortillaro, Hugues Salamin, Anna Polychroniou, Fabio Valente, and Samuel Kim. 2013. The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism. In Proc. Interspeech. ISCA, Lyon, France, 148--152.

[25]

Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, and Casper Kaandorp. 2021. The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates. In Proc. Interspeech. ISCA, Brno, Czechia, 431--435.

[26]

Björn W Schuller, Anton Batliner, Christian Bergler, Eva-Maria Messner, Antonia Hamilton, Shahin Amiriparian, Alice Baird, Georgios Rizos, Maximilian Schmitt, Lukas Stappen, et al. 2020The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks. In Proc. Interspeech. ISCA, Shanghai, China, 2042--2046.

[27]

Charles Spearman. 1904. The Proof and Measurement of Association between Two Things. The American Journal of Psychology, Vol. 15 (1904), 72--101.

[28]

Anita Whiting and Naveen Donthu. 2006. Managing voice-to-voice encounters: reducing the agony of being put on hold. Journal of Service Research, Vol. 8 (2006), 234--244.

Cited By

Barrett LTang KHowell P(2024)Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markersFrontiers in Psychology10.3389/fpsyg.2024.115528515Online publication date: 27-Feb-2024
https://doi.org/10.3389/fpsyg.2024.1155285
Porjazovski DGrósz TKurimo M(2024)From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding TechniquesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.342630132(3546-3560)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1109/TASLP.2024.3426301
Ristea NAnghel AIonescu R(2024)Cascaded cross-modal transformer for audio–textual classificationArtificial Intelligence Review10.1007/s10462-024-10869-157:9Online publication date: 2-Aug-2024
https://doi.org/10.1007/s10462-024-10869-1
Show More Cited By

Index Terms

The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation
MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

The Multimodal Sentiment Analysis Challenge (MuSe) 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems: In the Mimicked Emotions Sub-Challenge (MuSe-Mimic), participants predict three ...
The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human ...
Audio Features from the Wav2Vec 2.0 Embeddings for the ACM Multimedia 2022 Stuttering Challenge
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

The ACM Multimedia 2022 Stuttering Challenge is to determine the stuttering-related class of a speech segment. There are seven stuttering-related classes and an eighth garbage class. For this purpose, we have investigated the Wav2Vec 2.0 deep neural ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

DFG project (ParaStiChaD)
DFG?s Reinhart Koselleck project (AUDI0NOMOUS)

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
198
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)11

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Barrett LTang KHowell P(2024)Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markersFrontiers in Psychology10.3389/fpsyg.2024.115528515Online publication date: 27-Feb-2024
https://doi.org/10.3389/fpsyg.2024.1155285
Porjazovski DGrósz TKurimo M(2024)From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding TechniquesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.342630132(3546-3560)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1109/TASLP.2024.3426301
Ristea NAnghel AIonescu R(2024)Cascaded cross-modal transformer for audio–textual classificationArtificial Intelligence Review10.1007/s10462-024-10869-157:9Online publication date: 2-Aug-2024
https://doi.org/10.1007/s10462-024-10869-1
Mohapatra PPandey ASui YZhu QEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech TasksProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612855(9511-9515)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612855
Viksit SAbrol VEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Multi-Layer Acoustic & Linguistic Feature Fusion for ComParE-23 Emotion and Requests ChallengeProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612851(9492-9495)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612851
Sun YXu KLiu CDou YQian KEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Automatic Audio Augmentation for Requests Sub-ChallengeProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612849(9482-9486)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612849
Porjazovski DGetman YGrósz TKurimo MEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian InferenceProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612848(9477-9481)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612848
Ristea NIonescu REl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Cascaded Cross-Modal Transformer for Request and Complaint DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612846(9467-9471)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612846
Amiriparian SChrist LKönig ACowen AMeßner ECambria ESchuller BEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of AffectsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3610943(9723-9725)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3610943

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents