[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3395035.3425182acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Combining Clustering and Functionals based Acoustic Feature Representations for Classification of Baby Sounds

Published: 27 December 2020 Publication History

Abstract

This paper investigates different fusion strategies as well as provides insights on their effectiveness alongside standalone classifiers in the framework of paralinguistic analysis of infant vocalizations. The combinations of such systems as Support Vector Machines (SVM) and Extreme Learning Machines (ELM) based classifiers, as well as its weighted kernel version are explored, training systems on different acoustic feature representations and implementing weighted score-level fusion of the predictions. The proposed framework is tested on INTERSPEECH ComParE-2019 Baby Sounds corpus, which is a collection of Home Bank infant vocalization corpora annotated for five classes. Adhering to the challenge protocol, using a single test set submission we outperform the challenge baseline Unweighted Average Recall (UAR) score and achieve a comparable result to the state-of-the-art.

References

[1]
Elika Bergelson. 2017. Bergelson Seedlings Homebank Corpus. https://doi.org/10.21415/T5PK6D
[2]
Scaff C, Stieglitz J, and Cristia A. 2018. Daylong recordings from young children learning in Tsimane in Bolivia. https://doi.org/10.17910/B7.445
[3]
Marisa Casillas, Penelope Brown, and Steven C. Levinson. 2017. Casillas HomeBank Corpus. https://doi.org/10.21415/T51X12
[4]
Meg Cychosz. 2018. Cychosz HomeBank Corpus. https://doi.org/10.21415/YFYW-HE74
[5]
Meg Cychosz, Amanda Seidl, Elika Bergelson, Marisa Casillas, Gladys Baudet, Anne S Warlaumont, Camila Scaff, Lisa Yankowitz, and Alejandrina Cristia. 2020. BabbleCor: A Crosslinguistic Corpus of Babble Development in Five Languages. https://doi.org/10.17605/OSF.IO/RZ4TX
[6]
Florian Eyben, Felix Weninger, Florian Groß, and Björn Schuller. 2013. Recent Developments in openSMILE, the Munich open-source Multimedia Feature Extractor. In Proceedings of the 21st ACM International Conference on Multimedia. ACM, 835--838.
[7]
Michael Freitag, Shahin Amiriparian, Sergey Pugachevskiy, Nicholas Cummins, and Björn Schuller. 2017. auDeep: Unsupervised learning of representations from audio with deep recurrent neural networks. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 6340--6344.
[8]
Gábor Gosztolya. 2019. Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds. (2019), 2413--2417. https://doi.org/10.21437/Interspeech.2019--1726
[9]
Charles R Greenwood, Kathy Thiemann-Bourque, Dale Walker, Jay Buzhardt, and Jill Gilkerson. 2011. Assessing children's home language environments using automatic speech recognition technology. Communication Disorders Quarterly, Vol. 32, 2 (2011), 83--92.
[10]
Simone Hantke, Florian Eyben, Tobias Appel, and Björn Schuller. 2015. iHEARu-PLAY: Introducing a game for crowdsourced data collection for affective computing. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 891--897.
[11]
Guang-Bin Huang, Hongming Zhou, Xiaojian Ding, and Rui Zhang. 2012. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol. 42, 2 (2012), 513--529.
[12]
Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. 2004. Extreme learning machine: a new learning scheme of feedforward neural networks. In Proc. IEEE Intl. Joint Conf. on Neural Networks, Vol. 2. IEEE, 985--990.
[13]
Heysem Kaya and Alexey Karpov. 2016. Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks. In INTERSPEECH. San Francisco, USA, 2046--2050.
[14]
Heysem Kaya and Alexey A Karpov. 2017. Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold. In INTERSPEECH. Stockholm, Sweden, 3527--3531.
[15]
Heysem Kaya, Alexey A Karpov, and Albert Ali Salah. 2015a. Fisher Vectors with Cascaded Normalization for Paralinguistic Analysis. In INTERSPEECH. Dresden, Germany, 909--913.
[16]
Heysem Kaya, Tugce Özkaptan, Albert Ali Salah, and Fikret Gürgen. 2015b. Random Discriminative Projection based Feature Selection with Application to Conflict Recognition. Signal Processing Letters, IEEE, Vol. 22, 6 (2015), 671--675. https://doi.org/10.1109/LSP.2014.2365393
[17]
Heysem Kaya, Albert Ali Salah, Alexey Karpov, Olga Frolova, Aleksey Grigorev, and Elena Lyakso. 2017. Emotion, age, and gender classification in children's speech by humans and machines. Computer Speech & Language, Vol. 46 (2017), 268--283.
[18]
Florian Lingenfelser, Johannes Wagner, Thurid Vogt, Jonghwa Kim, and Elisabeth André. 2010. Age and gender classification from speech using decision level fusion and ensemble based techniques. In INTERSPEECH. 2798--2801.
[19]
Elena Lyakso, Olga Frolova, Evgeniya Dmitrieva, Aleksey Grigorev, Heysem Kaya, Albert Ali Salah, and Alexey Karpov. 2015. EmoChildRu: Emotional child Russian speech corpus. In International Conference on Speech and Computer SPECOM. Springer, 144--152.
[20]
Elena Lyakso, Olga Frolova, and Alexey Karpov. 2018. A new method for collection and annotation of speech data of atypically developing children. In 2018 International Conference on Sensor Networks and Signal Processing (SNSP). IEEE, 175--180.
[21]
Florent Perronnin and Christopher Dance. 2007. Fisher kernels on visual vocabularies for image categorization. In IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, Minnesota, USA, Proceedings, 1--8.
[22]
Maximilian Schmitt and Björn W Schuller. 2016. openXBOW-introducing the Passau open-source crossmodal bag-of-words toolkit. preprint arXiv:1605.06778 (2016).
[23]
Björn W. Schuller, Anton Batliner, Christian Bergler, Florian B. Pokorny, Jarek Krajewski, Margaret Cychosz, Ralf Vollmann, Sonja-Dana Roelen, Sebastian Schnieder, Elika Bergelson, Alejandrina Cristia, Amanda Seidl, Anne S. Warlaumont, Lisa Yankowitz, Elmar Nöth, Shahin Amiriparian, Simone Hantke, and Maximilian Schmitt. 2019. The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity. In INTERSPEECH. 2378--2382. https://doi.org/10.21437/Interspeech.2019--1122
[24]
Gizem Sogancioglu, Oxana Verkholyak, Heysem Kaya, Dmitrii Fedotov, Tobias Cadee, Albert Ali Salah, and Alexey Karpov. 2020. Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition. In INTERSPEECH. Shanghai, China. to appear.
[25]
David Tavarez, Xabier Sarasola, Agustin Alonso, Jon Sanchez, Luis Serrano, Eva Navas, and Inma Hernáez. 2017. Exploring Fusion Methods and Feature Space for the Classification of Paralinguistic Information. In INTERSPEECH. 3517--3521.
[26]
Anne Sara Warlaumont, Gina M Pretzer, S Mendoza, and Eric A Walle. 2016. Warlaumont Homebank Corpus. https://doi.org/10.21415/T54S3C
[27]
Sung-Lin Yeh, Gao-Yi Chao, Bo-Hao Su, Yu-Lin Huang, Meng-Han Lin, Yin-Chun Tsai, Yu-Wen Tai, Zheng-Chi Lu, Chieh-Yu Chen, Tsung-Ming Tai, Chiu-Wang Tseng, Cheng-Kuang Lee, and Chi-Chun Lee. 2019. Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition. In INTERSPEECH. 2398--2402. https://doi.org/10.21437/Interspeech.2019--2110
[28]
Weiwei Zong, Guang-Bin Huang, and Yiqiang Chen. 2013. Weighted extreme learning machine for imbalance learning. Neurocomputing, Vol. 101 (2013), 229 -- 242. https://doi.org/10.1016/j.neucom.2012.08.010

Cited By

View all
  • (2021)Measuring Frequency of Child-directed WH-Question Words for Alternate Preschool Locations using Speech Recognition and Location Tracking TechnologiesCompanion Publication of the 2021 International Conference on Multimodal Interaction10.1145/3461615.3485440(414-418)Online publication date: 18-Oct-2021
  • (2021)Describing Vocalizations in Young Children: A Big Data Approach Through Citizen Science AnnotationJournal of Speech, Language, and Hearing Research10.1044/2021_JSLHR-20-0066164:7(2401-2416)Online publication date: 16-Jul-2021

Index Terms

  1. Combining Clustering and Functionals based Acoustic Feature Representations for Classification of Baby Sounds
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction
      October 2020
      548 pages
      ISBN:9781450380027
      DOI:10.1145/3395035
      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 December 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. baby sounds classification
      2. computational paralinguistics
      3. extreme learning machines
      4. information fusion
      5. support vector machines

      Qualifiers

      • Short-paper

      Funding Sources

      • Russian Science Foundation

      Conference

      ICMI '20
      Sponsor:
      ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
      October 25 - 29, 2020
      Virtual Event, Netherlands

      Acceptance Rates

      Overall Acceptance Rate 453 of 1,080 submissions, 42%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Measuring Frequency of Child-directed WH-Question Words for Alternate Preschool Locations using Speech Recognition and Location Tracking TechnologiesCompanion Publication of the 2021 International Conference on Multimodal Interaction10.1145/3461615.3485440(414-418)Online publication date: 18-Oct-2021
      • (2021)Describing Vocalizations in Young Children: A Big Data Approach Through Citizen Science AnnotationJournal of Speech, Language, and Hearing Research10.1044/2021_JSLHR-20-0066164:7(2401-2416)Online publication date: 16-Jul-2021

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media