BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research
<p>Lexical frequency of the unique words (<math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>399</mn> </mrow> </semantics></math>) contained in the BELMASK matrix sentences, based on the 7-point logarithmic frequency scale (0: rare–6: frequent) of the German digital dictionary “Digitales Wörterbuch der deutschen Sprache” (DWDS) (<a href="https://www.dwds.de/d/api" target="_blank">https://www.dwds.de/d/api</a>), accessed on 21 July 2024. Note: cumulative calculation of frequencies for inflected/uninflected word forms.</p> "> Figure 2
<p>Phonemic distribution of the BELMASK matrix sentences (blue) and the Oldenburg sentence test (green), digitized and extracted from [<a href="#B49-data-09-00092" class="html-bibr">49</a>], compared to the average phoneme distribution for written German (red), as reported in [<a href="#B55-data-09-00092" class="html-bibr">55</a>] (see Table “100.000 sound count”) and conversational German (yellow), based on the <a href="https://www.bas.uni-muenchen.de/forschung/Bas/BasPHONSTATeng.html" target="_blank">https://www.bas.uni-muenchen.de/forschung/Bas/BasPHONSTATeng.html</a> [<a href="#B56-data-09-00092" class="html-bibr">56</a>] extended phone monogram statistics for the Verbmobil 1+2, SmartKom and RVG1 databases, accessed on 21 July 2024.</p> "> Figure 3
<p>Relationship between pseudo log likelihood (PLL) scores of the BELMASK matrix and sentence length (number of tokens), including correlation analysis (Pearson’s <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mo>−</mo> <mn>0.77</mn> </mrow> </semantics></math>). Shaded area of regression line corresponds to the 95% confidence interval. Each dot represents a sentence. The red dot represents the highly-predictable reference sentence “The rocket flies into space”, not contained in the BELMASK set.</p> "> Figure 4
<p>Relationship between pseudo log likelihood (PLL) scores of the BELMASK matrix words and DWDS word log frequency, including correlation analysis (Pearson’s <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>0.56</mn> </mrow> </semantics></math>). Shaded area of regression line corresponds to the 95% confidence interval. Each dot represents a unique word.</p> "> Figure 5
<p>Frequency response of the face mask used during recordings with subsequent 1/12 octave band smoothing, measured reciprocally using a 3D-printed head [<a href="#B63-data-09-00092" class="html-bibr">63</a>].</p> "> Figure 6
<p>Experimental setup of the recording sessions. Display of keywords on screen in speaker booth not depicted.</p> "> Figure 7
<p>Example of annotation layers in the Praat TextGrid object as a result of the G2P→ MAUS→ PHO2SYL pipeline.</p> ">
Abstract
:1. Introduction
1.1. Summary of Research Findings on Acoustic and Perceptual Effects of Face Masks
1.2. Datasets of Face-Masked Speech
2. Data Description
3. Methods
3.1. Construction of the Test Material
3.2. Validation of Predictability
3.3. Speakers
3.4. Audio and Video Recordings
4. Post-Processing
4.1. Linear Mapping
4.2. Corrections and Errors
4.3. File Structure
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ASR | Automatic Speech/Speaker Recognition |
BAS | Bavarian Archive for Speech Signals |
BELMASK | Berlin Database of Lombard and Masked Speech |
BERT | Bidirectional Encoder Representations from Transformers |
CI | Cochlear Implant |
ComParE | Computational Paralinguistics Challenge |
COVID-19 | Coronavirus Disease 2019 |
DWDS | Digitales Wörterbuch der deutschen Sprache |
EULA | End User License Agreement |
FFP2 | Filtering Facepiece P2 |
HMM | Hidden Markov Model |
LLM | Large Language Model |
MASC | Mask Augsburg Speech Corpus |
MDC | Mobile Device Communication |
MLM | Masked Language Model |
MSC | Mask Sub-Challenge |
OLSA | Oldenburg sentence test |
PLL | Pseudo Log Likelihood |
SD | Standard Deviation |
SNR | Signal-to-Noise Ratio |
SPL | Sound Pressure Level |
SRT | Speech Reception Threshold |
Appendix A
Nr. | Sentence | Nr. | Sentence | |
---|---|---|---|---|
s01 | Coco zählt dreizehn grüne Wände. | s49 | Die Hofbesitzerin schnippelt acht exotische Kiwis. | |
s02 | Liane trägt drei bunte Tücher. | s50 | Die Cousine verkostet neun krosse Zwiebeln. | |
s03 | Oma beobachtet zwei spielende Kinder. | s51 | Peter verputzt zwei volle Schüsseln. | |
s04 | Ludwig tritt fünf kaputte Dosen. | s52 | Rudolf trifft acht freudige Gruppen. | |
s05 | Die Familie betrachtet sieben nasse Tauben. | s53 | Beate bewundert sechs reife Pflaumen. | |
s06 | Lisa überwindet sechs krachende Wellen. | s54 | Der Schuster bearbeitet zehn braune Holzbretter. | |
s07 | Sarah verkauft acht rote Smoothies. | s55 | Joseph pflückt drei gute Papayas. | |
s08 | Timo besitzt neun rosa Fahnen. | s56 | Ruth kaut zwei zähe Stangen. | |
s09 | Die Dame hält zehn spitze Steine. | s57 | Alex erschreckt vier schlafende Ziegen. | |
s10 | Udo sieht elf gelbe Muscheln. | s58 | Der Geologe trimmt fünf dornige Büsche. | |
s11 | Elisa registriert zwölf dunkle Biere. | s59 | Emilia fährt sieben klapprige Taxis. | |
s12 | Verena findet zwei plumpe Gänse. | s60 | Andreas schlemmt dreizehn fettige Bretzeln. | |
s13 | Opa hört drei sanfte Bässe. | s61 | Adam feuert zwölf imaginäre Schüsse. | |
s14 | Vater genießt vier kalte Erdbeeren. | s62 | Georg isst drei knackige Rüben. | |
s15 | Leon produziert fünf schrille Töne. | s63 | Gisela schneidet fünf trockene Tulpen. | |
s16 | Susi bekommt sechs prima Forellen. | s64 | Ramona erntet dreizehn breite Zucchinis. | |
s17 | Dieter besucht acht große Boote. | s65 | Der Erzieher füttert sechs einsame Spatzen. | |
s18 | Theodor prüft sieben salzige Käse. | s66 | Hugo wäscht zwölf türkise Hemden. | |
s19 | Ole studiert neun tolle Zeitungen. | s67 | Der Hofjunge pflanzt vier seltene Kräuter. | |
s20 | Tamara bestellt zehn süße Weine. | s68 | Der Ladenbesitzer erhält zehn schimmelnde Datteln. | |
s21 | Das Mädchen klaut elf glasierte Muffins. | s69 | Der Lieferant hebt acht strahlende Segel. | |
s22 | Biene nimmt zwölf delikate Puten. | s70 | Gertrud entwirft sieben edle Gewänder. | |
s23 | Egon bemerkt zwei schöne Fische. | s71 | Margarete befestigt elf hölzerne Schilder. | |
s24 | Ute gibt drei lustige Tipps. | s72 | Doro frühstückt vier kleine Gurken. | |
s25 | Der Badegast sammelt fünf weiße Perlen. | s73 | Der Lehrling bäckt elf frische Pasteten. | |
s26 | Maria bastelt vier schimmernde Ketten. | s74 | Simon bringt zwei wichtige Seiten. | |
s27 | Die Urlauberin holt sechs riesige Donuts. | s75 | Helene mietet elf alte Fahrräder. | |
s28 | Doris bezahlt acht sprudelnde Brausen. | s76 | Der Apotheker verschreibt vier günstige Wickel. | |
s29 | Gerald fängt dreizehn graue Tische. | s77 | Mara präsentiert neun wertvolle Spiele. | |
s30 | Olaf schwimmt neun flotte Runden. | s78 | Das Schulkind lernt zehn komplexe Fächer. | |
s31 | Der Kurgast reserviert zehn sonnige Liegen. | s79 | Der Seemann verleiht sieben eigene Bücher. | |
s32 | Renate entsorgt elf dreckige Teller. | s80 | Der Musiker überzeugt drei kranke Touristen. | |
s33 | Der Bote empfängt zwölf zerstörte Kartons. | s81 | Lola singt sechs blöde Balladen. | |
s34 | Ulrike betreut zwei blaue Delfine. | s82 | Der Fotograf beleuchtet dreizehn antike Städte. | |
s35 | Der Ehemann spendiert drei feurige Schnäpse. | s83 | Lina radelt sechs hügelige Strecken. | |
s36 | Rosa baut vier sandige Schlösser. | s84 | Der Lehrer verabschiedet vier jodelnde Burschen. | |
s37 | Lena wirft fünf japanische Vasen. | s85 | Der Blumenhändler züchtet zehn feine Rosen. | |
s38 | Samuel malt sieben grelle Pfaue. | s86 | Der Kioskbesitzer überfliegt neun witzige Titel. | |
s39 | Der Ureinwohner macht dreizehn abstrakte Bilder. | s87 | Nena näht sieben pinke Jacken. | |
s40 | Fine erwähnt acht schlaue Witze. | s88 | Die Baronin verspielt dreizehn goldene Bänder. | |
s41 | Der Bube ruft neun laute Sprüche. | s89 | Martha ergattert acht dünne Hosen. | |
s42 | Der Maler zeichnet zehn tiefe Gewässer. | s90 | Der Poet erzählt elf spannende Sagen. | |
s43 | Ina liefert neun teure Gläser. | s91 | Uwe schleppt neun flauschige Katzen. | |
s44 | Der Sohn riecht zwölf saure Pfirsiche. | s92 | Die Diva verteilt zwei gewonnene Chips. | |
s45 | Frau Huber ermahnt fünf tobende Punks. | s93 | Thomas unterschreibt fünf zerkratzte Platten. | |
s46 | Ariane erwirbt sieben duftende Gräser. | s94 | Der Pfadfinder zeigt sechs fixe Schritte. | |
s47 | Der Kumpane jagt drei wilde Schafe. | s95 | Der Studi verinnerlicht zwölf schwere Gedichte. | |
s48 | Die Ärztin verfolgt dreizehn blasse Ferkel. | s96 | Lara transportiert zwölf köstliche Äpfel. |
1 | For all annotation and pipeline abbreviations consult the BAS Webmaus documentation available at: https://clarin.phonetik.uni-muenchen.de/BASWebServices/help/tutorial, accessed on 21 July 2024. |
2 | “Uploaded material is automatically deleted [from the BAS servers] after 24 h. Uploaded data are not forwarded to third parties, except in the case of the service ‘ASR’, which forwards user data to a third-party, commercial webservice provider”, see: https://clarin.phonetik.uni-muenchen.de/BASWebServices/help, accessed on 21 July 2024. |
References
- Geng, P.; Lu, Q.; Guo, H.; Zeng, J. The effects of face mask on speech production and its implication for forensic speaker identification-A cross-linguistic study. PLoS ONE 2023, 18, e0283724. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Ni, K.; Huang, Y. Effect of Face Masks on Automatic Speech Recognition Accuracy for Mandarin. Appl. Sci. 2024, 14, 3273. [Google Scholar] [CrossRef]
- Ritchie, K.; Carragher, D.; Davis, J.; Read, K.; Jenkins, R.E.; Noyes, E.; Gray, K.L.H.; Hancock, P.J.B. Face masks and fake masks: The effect of real and superimposed masks on face matching with super-recognisers, typical observers, and algorithms. Cogn. Res. 2024, 9, 5. [Google Scholar] [CrossRef] [PubMed]
- Badh, G.; Knowles, T. Acoustic and perceptual impact of face masks on speech: A scoping review. PLoS ONE 2023, 18, e0285009. [Google Scholar] [CrossRef] [PubMed]
- Pörschmann, C.; Lübeck, T.; Arend, J.M. Impact of face masks on voice radiation. J. Acoust. Soc. Am. 2020, 148, 3663–3670. [Google Scholar] [CrossRef] [PubMed]
- Bandaru, S.V.; Augustine, A.M.; Lepcha, A.; Sebastian, S.; Gowri, M.; Philip, A.; Mammen, M.D. The effects of N95 mask and face shield on speech perception among healthcare workers in the coronavirus disease 2019 pandemic scenario. J. Laryngol. Otol. 2020, 134, 895–898. [Google Scholar] [CrossRef] [PubMed]
- Bottalico, P.; Murgia, S.; Puglisi, G.E.; Astolfi, A.; Kirk, K.I. Effect of masks on speech intelligibility in auralized classrooms. J. Acoust. Soc. Am. 2020, 148, 2878–2884. [Google Scholar] [CrossRef] [PubMed]
- Brown, V.A.; Van Engen, K.J.; Peelle, J.E. Face mask type affects audiovisual speech intelligibility and subjective listening effort in young and older adults. Cogn. Res. Princ. Implic. 2021, 6, 49. [Google Scholar] [CrossRef] [PubMed]
- Smiljanic, R.; Keerstock, S.; Meemann, K.; Ransom, S.M. Face masks and speaking style affect audio-visual word recognition and memory of native and non-native speech. J. Acoust. Soc. Am. 2021, 149, 4013–4023. [Google Scholar] [CrossRef] [PubMed]
- Toscano, J.C.; Toscano, C.M. Effects of face masks on speech recognition in multi-talker babble noise. PLoS ONE 2021, 16, e0246842. [Google Scholar] [CrossRef]
- Mendel, L.L.; Gardino, J.A.; Atcherson, S.R. Speech Understanding Using Surgical Masks: A Problem in Health Care? J. Am. Acad. Audiol. 2008, 19, 686–695. [Google Scholar] [CrossRef] [PubMed]
- Magee, M.; Lewis, C.; Noffs, G.; Reece, H.; Chan, J.C.S.; Zaga, C.J.; Paynter, C.; Birchall, O.; Rojas Azocar, S.; Ediriweera, A.; et al. Effects of face masks on acoustic analysis and speech perception: Implications for peri-pandemic protocols. J. Acoust. Soc. Am. 2020, 148, 3562–3568. [Google Scholar] [CrossRef] [PubMed]
- Das, S.; Sarkar, S.; Das, A.; Das, S.; Chakraborty, P.; Sarkar, J. A comprehensive review of various categories of face masks resistant to COVID-19. Clin. Epidemiol. Glob. Health. 2021, 12, 100835. [Google Scholar] [CrossRef] [PubMed]
- Martarelli, M.; Montalto, L.; Chiariotti, P.; Simoni, S.; Castellini, P.; Battista, G.; Paone, N. Acoustic Attenuation of COVID-19 Face Masks: Correlation to Fibrous Material Porosity, Mask Breathability and Bacterial Filtration Efficiency. Acoustics 2022, 4, 123–138. [Google Scholar] [CrossRef]
- Atcherson, S.R.; Mendel, L.L.; Baltimore, W.J.; Patro, C.; Lee, S.; Pousson, M.; Spann, M.J. The Effect of Conventional and Transparent Surgical Masks on Speech Understanding in Individuals with and without Hearing Loss. J. Am. Acad. Audiol. 2017, 28, 58–67. [Google Scholar] [CrossRef]
- Sönnichsen, R.; Tó, G.L.; Hohmann, V.; Hochmuth, S.; Radeloff, A. Challenging Times for Cochlear Implant Users—Effect of Face Masks on Audiovisual Speech Understanding during the COVID-19 Pandemic. Trends Hear. 2022, 26, 23312165221134378. [Google Scholar] [CrossRef] [PubMed]
- Rahne, T.; Fröhlich, L.; Plontke, S.; Wagner, L. Influence of surgical and N95 face masks on speech perception and listening effort in noise. PLoS ONE 2021, 16, e0253874. [Google Scholar] [CrossRef] [PubMed]
- Giovanelli, E.; Valzolgher, C.; Gessa, E.; Todeschini, M.; Pavani, F. Unmasking the Difficulty of Listening to Talkers With Masks: Lessons from the COVID-19 pandemic. i-Perception 2021, 12, 204166952199839. [Google Scholar] [CrossRef]
- Pichora-Fuller, M.K.; Kramer, S.E.; Eckert, M.A.; Edwards, B.; Hornsby, B.W.Y.; Humes, L.E.; Lemke, U.; Lunner, T.; Matthen, M.; Mackersie, C.L.; et al. Hearing Impairment and Cognitive Energy: The Framework for Understanding Effortful Listening (FUEL). Ear Hear. 2016, 37, 5S. [Google Scholar] [CrossRef]
- Ribeiro, V.V.; Dassie-Leite, A.P.; Pereira, E.C.; Santos, A.D.N.; Martins, P.; Irineu, R.d.A. Effect of Wearing a Face Mask on Vocal Self-Perception during a Pandemic. J. Voice 2020, 37, 878. [Google Scholar] [CrossRef] [PubMed]
- Gama, R.; Castro, M.E.; van Lith-Bijl, J.T.; Desuter, G. Does the wearing of masks change voice and speech parameters? Eur. Arch. Oto-Rhino 2021, 279, 1701–1708. [Google Scholar] [CrossRef] [PubMed]
- McKenna, V.S.; Patel, T.H.; Kendall, C.L.; Howell, R.J.; Gustin, R.L. Voice Acoustics and Vocal Effort in Mask-Wearing Healthcare Professionals: A Comparison Pre- and Post-Workday. J. Voice 2021, 37, 802.e15–802.e23. [Google Scholar] [CrossRef] [PubMed]
- Gutz, S.E.; Rowe, H.P.; Tilton-Bolowsky, V.E.; Green, J.R. Speaking with a KN95 face mask: A within-subjects study on speaker adaptation and strategies to improve intelligibility. Cogn. Res. Princ. Implic. 2022, 7, 73. [Google Scholar] [CrossRef] [PubMed]
- Lombard, E. Le signe de l’élévation de la voix [The sign of raising the voice]. Ann. Mal. Oreille Larynx Nez Pharynx 1911, 37, 101–119. [Google Scholar]
- Bottalico, P.; Passione, I.I.; Graetzer, S.; Hunter, E.J. Evaluation of the starting point of the Lombard Effect. Acta Acust. United Acust. 2017, 103, 169–172. [Google Scholar] [CrossRef] [PubMed]
- Hampton, T.; Crunkhorn, R.; Lowe, N.; Bhat, J.; Hogg, E.; Afifi, W.; De, S.; Street, I.; Sharma, R.; Krishnan, M.; et al. The negative impact of wearing personal protective equipment on communication during coronavirus disease 2019. J. Laryngol. Otol. 2020, 134, 577–581. [Google Scholar] [CrossRef] [PubMed]
- Cohn, M.; Pycha, A.; Zellou, G. Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech. Cognition 2021, 210, 104570. [Google Scholar] [CrossRef]
- Karagkouni, O. The Effects of the Use of Protective Face Mask on the Voice and Its Relation to Self-Perceived Voice Changes. J. Voice 2021, 37, 802.e1–802.e14. [Google Scholar] [CrossRef] [PubMed]
- Schiller, I.S.; Aspöck, L.; Schlittmeier, S.J. The impact of a speaker’s voice quality on auditory perception and cognition: A behavioral and subjective approach. Front. Psychol. 2023, 14, 1243249. [Google Scholar] [CrossRef] [PubMed]
- Moshona, C.; Fiebig, A. Effects of face-masked speech on short-term memory. In Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023, Turin, Italy, 11–15 September 2023; pp. 4697–4704. [Google Scholar] [CrossRef]
- Truong, T.L.; Beck, S.D.; Weber, A. The impact of face masks on the recall of spoken sentences. J. Acoust. Soc. Am. 2021, 149, 142–144. [Google Scholar] [CrossRef] [PubMed]
- Truong, T.L.; Weber, A. Intelligibility and recall of sentences spoken by adult and child talkers wearing face masks. J. Acoust. Soc. Am. 2021, 150, 1674–1681. [Google Scholar] [CrossRef] [PubMed]
- Son, L.K.; Schwartz, B.L. The relation between metacognitive monitoring and control. In Applied Metacognition; Perfect, T.J., Schwartz, B.L., Eds.; Cambridge University Press: Cambridge, UK, 2002; pp. 15–38. [Google Scholar]
- Carbon, C.C. Wearing Face Masks Strongly Confuses Counterparts in Reading Emotions. Front. Psychol. 2020, 11, 566886. [Google Scholar] [CrossRef]
- Fitousi, D.; Rotschild, N.; Pnini, C.; Azizi, O. Understanding the Impact of Face Masks on the Processing of Facial Identity, Emotion, Age, and Gender. Front. Psychol. 2021, 12, 743793. [Google Scholar] [CrossRef] [PubMed]
- Vitevitch, M.S.; Chan, K.Y.; Goldstein, R. Using English as a ‘Model Language’ to Understand Language Processing. In Motor Speech Disorders: A Cross-Language Perspective; Miller, N., Lowit, A., Eds.; Multilingual Matters: Bristol, UK, 2014; pp. 58–73. [Google Scholar] [CrossRef]
- Blasi, D.E.; Henrich, J.; Adamou, E.; Kemmerer, D.; Majid, A. Over-reliance on English hinders cognitive science. Trends Cogn. Sci. 2022, 26, 1153–1170. [Google Scholar] [CrossRef] [PubMed]
- Mohamed, M.M.; Nessiem, M.A.; Batliner, A.; Bergler, C.; Hantke, S.; Schmitt, M.; Baird, A.; Mallol-Ragolta, A.; Karas, V.; Amiriparian, S.; et al. Face mask recognition from audio: The MASC database and an overview on the mask challenge. Pattern Recognit. 2022, 122, 108361. [Google Scholar] [CrossRef] [PubMed]
- Schuller, B.W.; Batliner, A.; Bergler, C.; Messner, E.M.; Hamilton, A.; Amiriparian, S.; Baird, A.; Rizos, G.; Schmitt, M.; Stappen, L.; et al. The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing and Masks. In Proceedings of the Interspeech 2020, Shanghai, China, 25–29 October 2020; pp. 2042–2046. [Google Scholar] [CrossRef]
- Mallol-Ragolta, A.; Urbach, N.; Liu, S.; Batliner, A.; Schuller, B.W. The MASCFLICHT Corpus: Face Mask Type and Coverage Area Recognition from Speech. In Proceedings of the INTERSPEECH 2023, Dublin, Ireland, 20–24 August 2023; pp. 2358–2362. [Google Scholar] [CrossRef]
- Awan, S.N.; Shaikh, M.A.; Awan, J.A.; Abdalla, I.; Lim, K.O.; Misono, S. Smartphone Recordings are Comparable to “Gold Standard” Recordings for Acoustic Measurements of Voice. J. Voice 2023, in press. [CrossRef]
- Maryn, Y.; Ysenbaert, F.; Zarowski, A.; Vanspauwen, R. Mobile Communication Devices, Ambient Noise, and Acoustic Voice Measures. J. Voice 2017, 31, 248.e11–248.e23. [Google Scholar] [CrossRef]
- Jannetts, S.; Schaeffler, F.; Beck, J.; Cowen, S. Assessing voice health using smartphones: Bias and random error of acoustic voice parameters captured by different smartphone types. Int. J. Lang. Commun. Disord. 2019, 54, 292–305. [Google Scholar] [CrossRef] [PubMed]
- Alghamdi, N.; Maddock, S.; Marxer, R.; Barker, J.; Brown, G.J. A corpus of audio-visual Lombard speech with frontal and profile views. J. Acoust. Soc. Am. 2018, 143, EL523–EL529. [Google Scholar] [CrossRef]
- Marcoux, K.; Ernestus, M. Acoustic characteristics of non-native Lombard speech in the DELNN corpus. J. Phon. 2024, 102, 101281. [Google Scholar] [CrossRef]
- Folk, L.; Schiel, F. The Lombard Effect in Spontaneous Dialog Speech. In Proceedings of the INTERSPEECH 2011, Florence, Italy, 27–31 August 2011; pp. 2701–2704. [Google Scholar]
- Sołoducha, M.; Raake, A.; Kettler, F.; Voigt, P. Lombard speech database for German language. In Proceedings of the 42nd Annual Conference on Acoustics—DAGA 2016, Aachen, Germany, 14–17 March 2016. [Google Scholar]
- Trujillo, J.; Özyürek, A.; Holler, J.; Drijvers, P. Speakers exhibit a multimodal Lombard effect in noise. Sci. Rep. 2021, 11, 16721. [Google Scholar] [CrossRef] [PubMed]
- Wagener, K.; Kühnel, V.; Kollmeier, B. Entwicklung und Evaluation eines Satztests für die deutsche Sprache I: Design des Oldenburger Satztests [Development and evaluation of a sentence test for the German language I: Design of the Oldenburg sentence test]. Z. Für Audiol. 1999, 38, 1–32. [Google Scholar]
- Poirier, M.; Saint-Aubin, J. Word frequency effects in immediate serial recall: Item familiarity and item co-occurence have the same effect. Memory 2005, 13, 325–332. [Google Scholar] [CrossRef] [PubMed]
- Hunter, C.R. Dual-task accuracy and response time index effects of spoken sentence predictability and cognitive load on listening effort. Trends Hear. 2021, 25, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Roverud, E.; Bradlow, A.; Kidd, G.J. Examining the sentence superiority effect for sentences presented and reported in forwards or backwards order. Appl. Psycholinguist. 2020, 41, 381–400. [Google Scholar] [CrossRef] [PubMed]
- Kowialiewski, B.; Krasnoff, J.; Mizrak, E.; Oberauer, K. The semantic relatedness effect in serial recall: Deconfounding encoding and recall order. J. Mem. Lang. 2022, 127, 104377. [Google Scholar] [CrossRef]
- Baddeley, A.D.; Thomson, N.; Buchanan, M. Word length and the structure of short-term memory. J. Verb. Learn. Verb. Behav. 1975, 14, 575–589. [Google Scholar] [CrossRef]
- Best, K.H. Laut- und Phonemhäufigkeiten im Deutschen [Sound and phoneme frequencies in German]. Göttinger Beiträge Zur Sprachwiss. 2005, 10/11, 21–32. [Google Scholar]
- Schiel, F. BAStat: New statistical resources at the Bavarian Archive for speech signals. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta, 17–23 May 2010. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019. [Google Scholar]
- MDZ Digital Library Team (dbmdz) at the Bavarian State Library. Bert-Base-German-Dbmdz-Cased. Available online: https://huggingface.co/dbmdz/bert-base-german-cased (accessed on 21 July 2024).
- Schuster, M.; Nakajima, K. Japanese and Korean voice search. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 5149–5152. [Google Scholar] [CrossRef]
- Salazar, J.; Liang, D.; Nguyen, T.Q.; Kirchhoff, K. Masked Language Model Scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar] [CrossRef]
- Kauf, C.; Ivanova, A. A Better Way to Do Masked Language Model Scoring. arXiv 2023, arXiv:2305.10588. [Google Scholar]
- Misra, K. minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models. arXiv 2022, arXiv:2203.13112. [Google Scholar]
- Moshona, C.; Hofmann, J.; Fiebig, A.; Sarradj, E. Bestimmung des Übertragungsverlustes von Atemschutzmasken mittels eines 3D-Kopfmodells unter Berücksichtigung des Ansatzrohres [Determination of the transmission loss of respiratory masks using a 3D head model considering the vocal tract]. In Proceedings of the 49nd Annual Conference on Acoustics—DAGA 2023, Hamburg, Germany, 6–9 March 2023; pp. 178–181. [Google Scholar]
- Mooshammer, C. Korpus Gelesener Geschlechtergerechter Sprache (KGGS) [Corpus of Read Gender-Inclusive Language (KGGS)]. 2020. Available online: https://rs.cms.hu-berlin.de/phon (accessed on 21 July 2024).
- Nakamura, M.; Iwano, K.; Furui, S. Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Comput. Speech Lang. 2008, 22, 171–184. [Google Scholar] [CrossRef]
- Boersma, P.; Weenik, D. Praat, a system for doing phonetics by computer. Glot Int. 2001, 5, 341–345. [Google Scholar]
- Kisler, T.; Reichel, U.; Schiel, F. Multilingual processing of speech via web services. Comput. Speech Lang. 2017, 45, 326–347. [Google Scholar] [CrossRef]
- Schiel, F. A Statistical Model for Predicting Pronunciation. In Proceedings of the 18th International Congress of Phonetic Sciences, ICPhS 2015, Glasgow, UK, 10–14 August 2015; p. 195. [Google Scholar]
Name | Berlin Dataset of Lombard and Masked Speech |
Abbreviation | BELMASK |
Version | 1.0 |
License | End User License Agreement (EULA) |
Speech material (total) | 3840 matrix sentences à ∼ |
Speech material (per speaker) | 96 matrix sentences à ∼ in 4 conditions |
Speech conditions | , , , |
Total duration | ∼ (∼ per speaker) |
Register | Cued, uninstructed speech |
Speakers | 10 (4 female, 6 male) |
Language | German |
Modality | Audio, video |
Annotation mode | Automated (webMAUS + Pipeline: G2P → MAUS → PHO2SYL) |
Annotation layers | Tokenized word segmentation based on MAUS (ORT-MAU) |
Canonical pronunciation encoded in X-SAMPA (KAN-MAU) | |
Phonological syllable segmentation based on G2P (KAS-MAU) | |
Phonetic segmentation in X-SAMPA (MAU) | |
Phonetic syllable segmentation based on MAUS (MAS) | |
Available data formats | .mkv, .wav, .txt, .TextGrid |
Audio bitrate / Sampling Frequency | 16 bit, |
Video compression codec | H.264 |
Size | (compressed: ) |
ID | Gender | Age | L1 | P1 | P2 |
---|---|---|---|---|---|
VP01 | m | 38 | DE | DE-BE | DE-BE |
VP02 | f | 38 | DE | GR | GR |
VP03 | f | 36 | DE | DE-BE | DE-BE |
VP04 | m | 34 | DE | DE-BB | DE-SN |
VP05 | m | 24 | DE | CN | CN |
VP06 | m | 27 | DE | DE-NW | DE-HE |
VP07 | f | 26 | DE | DE-BY | DE-BY |
VP08 | m | 33 | DE | DE-ST | DE-ST |
VP09 | f | 21 | DE | RU | DE-NW |
VP10 | m | 25 | DE | DE-HE | DE-BW |
VP | [dB] | Factor x | Condition Order |
---|---|---|---|
VP01 | dB | 0.501 | →→→ |
VP02 | 0 dB | 1.000 | →→→ |
VP03 | dB | 0.708 | →→→ |
VP04 | dB | 1.122 | →→→ |
VP05 | dB | 0.398 | →→→ |
VP06 | dB | 0.398 | →→→ |
VP07 | dB | 0.708 | →→→ |
VP08 | 0 dB | 1.000 | →→→ |
VP09 | 0 dB | 1.000 | →→→ |
VP10 | dB | 1.122 | →→→ |
VP | Condition | Sentence |
---|---|---|
VP03 | s45 Frau Huber ermahnt neun (richtig: fünf) tobende Punks. s45 Mrs. Huber warns nine (correct: five) raging punks. | |
VP03 | s58 Der Geologe trimmt vier (richtig: fünf) dornige Büsche. s58 The geologist trims four (correct: five) thorny bushes. | |
VP04 | s70 Gertrud entwirft sieben strahlende (richtig: edle) Gewänder. s70 Gertrud designs seven radiant (correct: noble) robes. | |
VP06 | s89 Martha ergattert acht dünne Rosen (richtig: Hosen). s89 Martha snags eight thin roses (correct: pants). | |
VP08 | s05 Die Familie beobachtet (richtig: betrachtet) sieben nasse Tauben. s05 The family observes (correct: views) seven wet pigeons. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Moshona, C.C.; Rudawski, F.; Fiebig, A.; Sarradj, E. BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research. Data 2024, 9, 92. https://doi.org/10.3390/data9080092
Moshona CC, Rudawski F, Fiebig A, Sarradj E. BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research. Data. 2024; 9(8):92. https://doi.org/10.3390/data9080092
Chicago/Turabian StyleMoshona, Cleopatra Christina, Frederic Rudawski, André Fiebig, and Ennes Sarradj. 2024. "BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research" Data 9, no. 8: 92. https://doi.org/10.3390/data9080092
APA StyleMoshona, C. C., Rudawski, F., Fiebig, A., & Sarradj, E. (2024). BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research. Data, 9(8), 92. https://doi.org/10.3390/data9080092