Abstract
This paper explores a better way to learn word vector representations for language identification (LID). We have focused on a phonotactic approach using phoneme sequences in order to make phonotactic units (phone-grams) to incorporate context information. In order to take into consideration the morphology of phone-grams, we have considered the use of sub-word information (lower-order n-grams) to learn phone-grams embeddings using FastText. These embeddings are used as input to an i-Vector framework to train a multiclass logistic classifier. Our approach has been compared with a LID system that uses phone-gram embeddings learned through Skipgram that do not implement sub-word information, using Cavg as a metric for our experiments. Our approach to LID to incorporate sub-word information in phone-grams embeddings significantly improves the results obtained by using embeddings that are learned ignoring the structure of phone-grams. Furthermore, we have shown that our system provides complementary information to an acoustic system, improving it through the fusion of both systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ace P, Schwarz P, Ace V (2009) Phoneme recognition based on long temporal context
Barbaresi A (2017) Discriminating between similar languages using weighted subword features. In: Fourth workshop on NLP for similar languages, pp 184–189
Berkling K, Arai T, Barnard E (1994) Analysis of phoneme-based features for language identification. In: Proceedings of the international conference on acoustics, speech and signal processing. IEEE, pp 289–292
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. arXiv:1607.04606v2
Chaudhary A, Zhou C, Levin L, Neubig G, Mortensen D, Carbonell J (2018) Adapting word embeddings to new languages with morphological and phonological subword representations. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3285–3295
D’Haro L, Glembek O, Plchot O, Matejka P, Soufifar M, Córdoba R, Cernocky J (2012) Phonotactic language recognition using i-vectors and phoneme posteriogram counts. In: ISCA 13th annual conference, Proceedings of the INTERSPEECH, pp 42–45
Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image description. IEEE Trans Pattern Anal Mach Intell 39(4):664–676
Kulmizev A, Blankers B, Bjerva J, Nissim M, Noord G, Plank B, Wieling M (2017) The power of character n-grams in native language identification. In: Proceedings of the 12th workshop on innovative use of NLP for building educational applications, pp 382–389
Livescu K, Fosler-Lussier E, Metze F (2012) Sub-word modeling for automatic speech recognition. IEEE Signal Process Mag 29:44–57
Matejka P, Schwarz P, Cernock J, Chytil P (2005) Phonotactic language identification using high quality phonome recognition. In: Proceedings of the IberSPEECH, pp 2237–2240
Martin A, Greenberg C (2010) The 2009 NIST language recognition evaluation. In: Odyssey, p 30
Mager M, Cetinoglu O, Kann K (2019) Subword-level language identification for intra-word code-switching. arXiv:1904.01989v1
Mikolov T, Sutskever I, Deoras A, Le H, Kombrink S, Cernocky J (2011) Subword language modeling with neural networks
Palaskar S, Raunak V, Metze F (2019) Learned in speech recognition: contextual acoustic word embeddings. arXiv:1902.06833v1
Qi Y, Sachan D, Felix M, Padmanabhan S, Neubig G (2018) When and why are pre-trained word embeddings useful for neural machine translation? arXiv:1804.06323v2
Rodriguez L, Penagarikano M, Varona A, Diez M, Bordel G (2016) KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios. Lang Resour Eval 50(2):221–243
Salamea C, Córdoba R, D’Haro L, Segundo R, Ferreiros J (2018) On the use of phone-based embeddings for language recognition. In: Proceedings of the IberSPEECH, pp 55–59
Singh R, Raj B, Stern R (2002) Automatic generation of subword units for speech recognition systems. IEEE Trans Speech Audio Process 10(2):89–99
Xia M (2016) Codeswitching language identification using subword information enriched word vectors. In: Proceedings of the second workshop on computational approaches to code switching, pp 132–136
Zhang Z, Huang Y, Zhu P, Zhao H (2018) Effective character-augmented word embedding for machine reading comprehension. arXiv:1808.02772v1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Romero, D., Salamea, C. (2021). On the Use of Phonotactic Vector Representations with FastText for Language Identification. In: D'Haro, L.F., Callejas, Z., Nakamura, S. (eds) Conversational Dialogue Systems for the Next Decade. Lecture Notes in Electrical Engineering, vol 704. Springer, Singapore. https://doi.org/10.1007/978-981-15-8395-7_25
Download citation
DOI: https://doi.org/10.1007/978-981-15-8395-7_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8394-0
Online ISBN: 978-981-15-8395-7
eBook Packages: EngineeringEngineering (R0)