[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3581783.3612835acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

Published: 27 October 2023 Publication History

Abstract

The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' ComPaRE features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit; in addition, wav2vec2 models are used.

References

[1]
Shahin Amiriparian. 2019. Deep Representation Learning Techniques for Audio Signal Processing. Ph.D. Dissertation. Technische Universität München.
[2]
Shahin Amiriparian, Michael Freitag, Nicholas Cummins, and Björn Schuller. 2017a. Sequence to Sequence Autoencoders for Unsupervised Representation Learning from Audio. In Proc. DCASE 2017. IEEE, Munich, Germany, 17--21.
[3]
Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, and Björn Schuller. 2017b. Snore Sound Classification Using Image-based Deep Spectrum Features. In Proc. Interspeech. ISCA, Stockholm, Sweden, 3512--3516.
[4]
Shahin Amiriparian, Maurice Gerczuk, Lukas Stappen, Alice Baird, Lukas Koebe, Sandra Ottl, and Björn Schuller. 2020. Towards Cross-Modal Pre-Training and Learning Tempo-Spatial Characteristics for Audio Recognition with Convolutional and Recurrent Neural Networks. EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2020, 19 (2020), 1--11.
[5]
Muhammad Anshari, Mohammad Nabil Almunawar, Syamimi Ariff Lim, and Abdullah Al-Mudimigh. 2019. Customer relationship management and big data enabled: Personalization & customization of services. Applied Computing and Informatics, Vol. 15 (2019), 94--101.
[6]
Anton Batliner, Simone Hantke, and Björn Schuller. 2020. Ethics and good practice in computational paralinguistics. IEEE Transactions on Affective Computing, Vol. 13, 3 (2020), 1236--1253.
[7]
Alan S. Cowen, Hillary Anger Elfenbein, Petri Laukka, and Dacher Keltner. 2019a. Mapping 24 emotions conveyed by brief human vocalization. Am Psychol., Vol. 74 (2019), 698--712.
[8]
Alan S Cowen and Dacher Keltner. 2021. Semantic space theory: A computational approach to emotion. Trends in Cognitive Sciences, Vol. 25, 2 (2021), 124--136.
[9]
Alan S Cowen, Petri Laukka, Hillary Anger Elfenbein, Runjing Liu, and Dacher Keltner. 2019b. The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures. Nature human behaviour, Vol. 3, 4 (2019), 369--382.
[10]
Hillary Anger Elfenbein, Petri Laukka, Jean Althoff, Wanda Chui, Frederick K Iraki, Thomas Rockstuhl, and Nutankumar S. Thingujam. 2022. What Do We Hear in the Voice? An Open-Ended Judgment Study of Emotional Speech Prosody. Personality and Social Psychology Bulletin, Vol. 48, 7 (2022), 1087--1104.
[11]
Florian Eyben, Felix Weninger, Florian Groß, and Björn Schuller. 2013. Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor. In Proc. ACM Multimedia. ACM, Barcelona, Spain, 835--838.
[12]
Michael Freitag, Shahin Amiriparian, Sergey Pugachevskiy, Nicholas Cummins, and Björn Schuller. 2018. auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks. Journal of Machine Learning Research, Vol. 18 (2018), 1--5.
[13]
Christian Hildebrand, Fotis Efthymiou, Francesc Busquet, William H. Hampton, Donna L. Hoffman, and Thomas P. Novak. 2020. Voice analytics in business research. Conceptual foundations, acoustic feature extraction, and applications. Journal of Business Research, Vol. 121 (2020), 364--374.
[14]
Nikola Lackovic, Claude Montacié, Gauthier Lalande, and Marie-José Caraty. 2022. Prediction of User Request and Complaint in Spoken Customer-Agent Conversations. preprint arXiv:2208.10249 (2022).
[15]
Petri Laukka, Hillary Anger Elfenbein, Wanda Chui, Nutankumar S. Thingujam, Frederick K. Iraki, Thomas Rockstuhl, and Jean Althoff. 2010. Presenting the VENEC corpus: Development of a cross-cultural corpus of vocal emotion expressions and a novel method of annotating emotion appraisals. In Proc. LREC 2010 Workshop on Corpora for Research on Emotion and Affect. LREC, Marrakesh, Morocco, 53--57.
[16]
Petri Laukka, Hillary Anger Elfenbein, Nutankumar S Thingujam, Thomas Rockstuhl, Frederick K Iraki, Wanda Chui, and Jean Althoff. 2016. The expression and recognition of emotions in the voice across five nations: A lens model analysis based on acoustic features. Journal of personality and social psychology, Vol. 111, 5 (2016), 686.
[17]
Reza Lotfian and Carlos Busso. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings. IEEE Transactions on Affective Computing, Vol. 10, 4 (2019), 471--483.
[18]
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2018. Meld: A multimodal multi-party dataset for emotion recognition in conversations. preprint arXiv:1810.02508 (2018).
[19]
Andrew Rosenberg. 2012. Classifying skewed data: importance weighting to optimize average recall. In Proc. Interspeech. ISCA, Portland, USA, 2242--2245.
[20]
Scott Scheidt and Q. B. Chung. 2019. Making a case for speech analytics to improve customer service quality: Vision, implementation and evaluation. International Journal of Information Management, Vol. 45 (2019), 223--232.
[21]
Björn Schuller and Anton Batliner. 2014. Computational Paralinguistics - Emotion, Affect, and Personality in Speech and Language Processing. Wiley, Chichester, UK.
[22]
Björn Schuller, Anton Batliner, Stefan Steidl, and Dino Seppi. 2011. Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge. Speech Communication, Vol. 53 (2011), 1062--1087.
[23]
Björn Schuller, Stefan Steidl, and Anton Batliner. 2009. The INTERSPEECH 2009 Emotion Challenge. In Proc. Interspeech. ISCA, Brighton, UK, 312--315.
[24]
Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, Marcello Mortillaro, Hugues Salamin, Anna Polychroniou, Fabio Valente, and Samuel Kim. 2013. The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism. In Proc. Interspeech. ISCA, Lyon, France, 148--152.
[25]
Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, and Casper Kaandorp. 2021. The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates. In Proc. Interspeech. ISCA, Brno, Czechia, 431--435.
[26]
Björn W Schuller, Anton Batliner, Christian Bergler, Eva-Maria Messner, Antonia Hamilton, Shahin Amiriparian, Alice Baird, Georgios Rizos, Maximilian Schmitt, Lukas Stappen, et al. 2020The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks. In Proc. Interspeech. ISCA, Shanghai, China, 2042--2046.
[27]
Charles Spearman. 1904. The Proof and Measurement of Association between Two Things. The American Journal of Psychology, Vol. 15 (1904), 72--101.
[28]
Anita Whiting and Naveen Donthu. 2006. Managing voice-to-voice encounters: reducing the agony of being put on hold. Journal of Service Research, Vol. 8 (2006), 234--244.

Cited By

View all
  • (2024)Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markersFrontiers in Psychology10.3389/fpsyg.2024.115528515Online publication date: 27-Feb-2024
  • (2024)From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding TechniquesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.342630132(3546-3560)Online publication date: 12-Jul-2024
  • (2024)Cascaded cross-modal transformer for audio–textual classificationArtificial Intelligence Review10.1007/s10462-024-10869-157:9Online publication date: 2-Aug-2024
  • Show More Cited By

Index Terms

  1. The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. benchmark
      2. challenge
      3. complaints
      4. computational paralinguistics
      5. emotion share
      6. requests

      Qualifiers

      • Research-article

      Funding Sources

      • DFG project (ParaStiChaD)
      • DFG?s Reinhart Koselleck project (AUDI0NOMOUS)

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)148
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markersFrontiers in Psychology10.3389/fpsyg.2024.115528515Online publication date: 27-Feb-2024
      • (2024)From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding TechniquesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.342630132(3546-3560)Online publication date: 12-Jul-2024
      • (2024)Cascaded cross-modal transformer for audio–textual classificationArtificial Intelligence Review10.1007/s10462-024-10869-157:9Online publication date: 2-Aug-2024
      • (2023)Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech TasksProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612855(9511-9515)Online publication date: 26-Oct-2023
      • (2023)Multi-Layer Acoustic & Linguistic Feature Fusion for ComParE-23 Emotion and Requests ChallengeProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612851(9492-9495)Online publication date: 26-Oct-2023
      • (2023)Automatic Audio Augmentation for Requests Sub-ChallengeProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612849(9482-9486)Online publication date: 26-Oct-2023
      • (2023)Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian InferenceProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612848(9477-9481)Online publication date: 26-Oct-2023
      • (2023)Cascaded Cross-Modal Transformer for Request and Complaint DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612846(9467-9471)Online publication date: 26-Oct-2023
      • (2023)MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of AffectsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3610943(9723-9725)Online publication date: 26-Oct-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media