Abstract
The automatic classification of a speaker’s dialect can enrich many applications, e.g. in the human-machine interaction (HMI) or natural language processing (NLP) but also in specific areas such as pronunciation tutoring, forensic analysis or personalization of call-center talks. Although a lot of HMI/NLP-related research has been dedicated to different tasks in affective computing, emotion recognition, semantic understanding and other advanced topics, there seems to be a lack of methods for an automated dialect analysis that is not based on transcriptions, in particular for some languages like German. For other languages such as English, Mandarin or Arabic, a multitude of feature combinations and classification methods has been tried already, which provides a starting point for our study. We describe selected experiments to train suitable classifiers on German dialect varieties in the corpus “Regional Variants of German 1” (RVG1). Our article starts with a systematic choice of appropriate spectral features. In a second step, these features are post-processed with different methods and used to train one Gaussian Mixture Model (GMM) per feature combination as a Universal Background Model (UBM). The resulting UBMs are then adapted to a varied selection of dialects by maximum-a-posteriori (MAP) adaptation. Our preliminary results on German show, that a dialect discrimination and classification is possible. The unweighted recognition accuracy ranges from 32.4 to 54.9% in a 3-dialects test and from 19.6 to 31.4% in a classification of 9-dialects. Some dialects are easier distinguishable, purely using spectral features, while others require a different feature set or more sophisticated classification methods, which we will explore in future experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hanani, A., Russell, M.J., Carey, M.J.: Human and computer recognition of regional accents and ethnic groups from British English speech. Comput. Speech Lang. 27, 59–74 (2013). https://doi.org/10.1016/j.csl.2012.01.003
Najafian, M., Khurana, S., Shon, S., Ali, A., Glass, J.R.: Exploiting convolutional neural networks for phonotactic based dialect identification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, 15–20 April 2018, pp. 5174–5178 (2018). https://doi.org/10.1109/ICASSP.2018.8461486
Wang, H., van Heuven, V.J.: Relative contribution of vowel quality and duration to native language identification in foreign-accented English. In: Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018, Guiyang, China, 16–19 March 2018, pp. 16–20 (2018). https://doi.org/10.1145/3199478.3199507
Brown, G.: Automatic accent recognition systems and the effects of data on performance. In: Odyssey 2016: The Speaker and Language Recognition Workshop, Bilbao, Spain, 21–24 June 2016, pp. 94–100 (2016). https://doi.org/10.21437/Odyssey.2016-14
Bougrine, S., Cherroun, H., Ziadi, D.: Hierarchical classification for spoken Arabic dialect identification using prosody: Case of Algerian dialects. CoRR abs/1703.10065 (2017). http://arxiv.org/abs/1703.10065
Biadsy, F., Hirschberg, J., Habash, N.: Spoken Arabic dialect identification using phonotactic modeling. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009, Athens, Greece, 31 March 2009, pp. 53–61 (2009). https://aclanthology.info/papers/W09-0807/w09-0807
Akbacak, M., Vergyri, D., Stolcke, A., Scheffer, N., Mandal, A.: Effective Arabic dialect classification using diverse phonotactic models. In: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 27–31 August 2011, pp. 737–740 (2011). http://www.isca-speech.org/archive/interspeech_2011/i11_0737.html
Zheng, Y., et al.: Accent detection and speech recognition for Shanghai-accented Mandarin. In: INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 4–8 September 2005, pp. 217–220 (2005). http://www.isca-speech.org/archive/interspeech_2005/i05_0217.html
Hou, J., Liu, Y., Zheng, T.F., Olsen, J.Ø., Tian, J.: Multi-layered features with SVM for Chinese accent identification. In: 2010 International Conference on Audio, Language and Image Processing, pp. 25–30 (2010). https://doi.org/10.1109/ICALIP.2010.5685023
Lei, Y., Hansen, J.H.L.: Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. IEEE Trans. Audio Speech Lang. Process. 19, 85–96 (2011). https://doi.org/10.1109/TASL.2010.2045184
Torres-Carrasquillo, P.A., Sturim, D.E., Reynolds, D.A., McCree, A.: Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition. In: INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, 22–26 September 2008, pp. 723–726 (2008). http://www.isca-speech.org/archive/interspeech_2008/i08_0723.html
Biadsy, F., Hirschberg, J., Collins, M.: Dialect recognition using a phone-GMM-supervector-based SVM kernel. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 753–756 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_0753.html
Biadsy, F.: Automatic dialect and accent recognition and its application to speech recognition. Ph.D. thesis, Columbia University (2011). https://doi.org/10.7916/D8M61S68
Zissman, M.A., Gleason, T.P., Rekart, D., Losiewicz, B.L.: Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, ICASSP ’96, Atlanta, Georgia, USA, 7–10 May 1996, pp. 777–780 (1996). https://doi.org/10.1109/ICASSP.1996.543236
Chittaragi, N.B., Prakash, A., Koolagudi, S.: Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arab. J. Sci. Eng. 43, 4289–4302 (2017). https://doi.org/10.1007/s13369-017-2941-0
Najafian, M., Safavi, S., Weber, P., Russell, M.J.: Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems. In: Odyssey 2016: The Speaker and Language Recognition Workshop, Bilbao, Spain, 21–24 June 2016, pp. 132–139 (2016). https://doi.org/10.21437/Odyssey.2016-19
Zhang, Q., Boril, H., Hansen, J.H.L.: Supervector pre-processing for PRSVM-based Chinese and Arabic dialect identification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 26–31 May 2013, pp. 7363–7367 (2013). https://doi.org/10.1109/ICASSP.2013.6639093
Liu, G., Hansen, J.H.L.: A systematic strategy for robust automatic dialect identification. In: Proceedings of the 19th European Signal Processing Conference, EUSIPCO 2011, Barcelona, Spain, 29 August–2 September 2011, pp. 2138–2141 (2011). http://ieeexplore.ieee.org/document/7074191/
Lazaridis, A., el Khoury, E., Goldman, J., Avanzi, M., Marcel, S., Garner, P.N.: Swiss french regional accent identification. In: Odyssey 2014: The Speaker and Language Recognition Workshop, Joensuu, Finland, 16–19 June 2014 (2014). https://isca-speech.org/archive/odyssey_2014/abstracts.html#abs29
Burger, S., Schiel, F.: RVG 1 - a database for regional variants of contemporary German. In: Proceedings of the 1st International Conference on Language Resources and Evaluation, pp. 1083–1087. Granada, Spain (1998). https://www.phonetik.uni-muenchen.de/forschung/publikationen/Burger-98-RVG1.ps
Mettke, H.: Mittelhochdeutsche Grammatik. VEB Bibliographisches Institut, Leipzig, Germany (1989)
Larcher, A., Lee, K.A., Meignier, S.: An extensible speaker identification sidekit in Python. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20–25 March 2016, pp. 5095–5099 (2016). https://doi.org/10.1109/ICASSP.2016.7472648
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Dobbriner, J., Jokisch, O. (2019). Towards a Dialect Classification in German Speech Samples. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-26061-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)