[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

Published: 20 July 2021 Publication History

Abstract

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.

References

[1]
K. Bhuvanagirir and S. K. Kopparapu. 2012. Mixed Language Speech Recognition without Explicit identification of language. Amer. J. Signal Process. 2(5), (2012), 92–97.
[2]
A. Biswas, E. Yilmaz, F. d. Wet, E. v. d. Westhuizen, and T. Niesler. 2019. Semi-supervised acoustic model training for five-lingual code-switched ASR. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). 3745–3749.
[3]
W. M. Campbell, J. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo. 2006. Support vector machines for speaker and language recognition. Comput. Speech Lang. 20, 2-3, (2006), 210–229.
[4]
W. M. Campbell, E. Singer, P. A. Torres-Carrasquillo, and D. A. Reynolds. 2004. Language recognition with support vector machines. In Proceedings of Odyssey: The Speaker and Language Recognition Workshop. 41–44.
[5]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), (2011), 1–27. Retrieved from http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
[6]
N. Dehak, P. A T. Carrasquillo, D. Reynolds, and R. Dehak. 2011. Language recognition via i-vectors and dimensionality reduction. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’11). 857–860.
[7]
Department of Higher Education, Ministry of Education, Government of India. Language education. Retrieved from https://mhrd.gov.in/language-education.
[8]
Department of Higher Education, Ministry of Education, Government of India. To know more about Indian languages. Retrieved from http://mhrd.gov.in/sites/upload_files/mhrd/files/upload_document/languagebr.pdf.
[9]
Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages. 2012. Retrieved from http://speech.iiit.ac.in/svldownloads/pro_po_en_report/.
[10]
J. G. Dominguez, D. Eustis, I. L. Moreno, A. Senior, F. Beaufays, and P. J. Moreno. 2015. A real-time end-to-end multilingual speech recognition architecture. IEEE J. Select. Top. Signal Process. 10, 4, (2015).
[11]
S. Ford. Language Mixing among Bilingual Children. Retrieved from http://www2.hawaii.edu/ sford/research/mixing.htm.
[12]
V. Golla. 2011. California Indian Languages. University of California Press—Language Arts & Disciplines, 380 pages.
[13]
R. R. Heredia and J. Altarriba. 2001. Bilingual language mixing: Why do bilinguals code-switch? Curr. Direct. Psychol. Sci. 10, (2001), 164–168.
[14]
A. K. V. Sai Jayram, V. Ramasubramanian, and T. V. Sreenivas. 2003. Language identification using parallel sub-word recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSAP’03), Vol. 1, I-32.
[15]
B. Jiang, Y. Song, S. Wei, J. H. Liu, I. McLoughlin, and L. Dai. 2014. Deep bottleneck features for spoken language identification. PLoS ONE, 9(7) (2014).
[16]
B. Jiang, Y. Song, S. Wei, M. Wang, I. McLoughlin, and L. Dai. 2014. Performance evaluation of deep bottleneck features for spoken language identification. In Proceedings of the International Symposium on Chinese Spoken Language Processing, 143–147.
[17]
L. Jorschick, A. E. Quick, D. Glasser, E. Lieven, and M. Tomasello. 2011. German-English-speaking children’s mixed NPs with “correct” agreement. Biling.: Lang. Cogn. 14, 2, (2011), 173–183.
[18]
S. Kim and M. L. Seltzer. 2018. Towards language-universal end-to-end speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). 4914–4918.
[19]
J. F. Kroll and A. M. B. De Groot (Ed.). 2005. Handbook of Bilingualism: Psycholinguistic Approaches. Oxford University Press.
[20]
S. B. S. Kumar, K. S. Rao, and D. Pati. 2013. Phonetic and prosodically rich transcribed speech corpus in Indian languages : Bengali and Odia. In Proceedings of the 16th IEEE International Oriental COCOSDA (O-COCOSDA’13). 1–5.
[21]
C. S. Kumar, V. P. Mohandas, and L. Haizhou. 2005. Multilingual speech recognition: A unified approach. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’05), 3357–3360.
[22]
Z. T. Kyaw Z. H. Lim E. S. Chng H. Xu, V. T. Pham and H. Li. 2018. Mandarin-English code-switching speech recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 554–555.
[23]
M. Li, H. Suo, X. Wu, P. Lu, and Y. Yan. 2007. Spoken language identification using score vector modeling and support vector machine. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’07). 350–353.
[24]
H. Lin, J. T. Huang, F. Beaufays, B. Strope, and H. Sung. 2012. Recognition of multilingual speech in mobile applications. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12), 4881–4884.
[25]
D. Lyu, R. Lyu, Y. Chiang, and C. Hsu. 2006. Speech recognition on code-switching among the chinese dialects. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’06), I–I.
[26]
B. Ma, C. Guan, H. Li, and C. Lee. 2002. Multilingual speech recognition with language identification. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’02).
[27]
M. C. Madhavi, S. Sharma, and H. A. Patil. 2014. Development of language resources for speech application in Gujarati and Marathi. In Proceedings of the IEEE International Conference on Asian Language Processing (IALP’14), Vol. 1, 115–118.
[28]
K. E. Manjunath, K. S. Rao, D. B. Jayagopi, and V. Ramasubramanian. 2018. Indian languages ASR: A multilingual phone recognition framework with IPA-based common phone-set, predicted articulatory features and feature fusion. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 1016–1020.
[29]
L. Mary and B. Yegnanarayana. 2004. Autoassociative neural network models for language identification. In Proceedings of the International Conference on Intelligent Sensing and Information Processing (ICISIP’04).
[30]
M. Muller, S. Stuker, and A. Waibel. 2016. Towards improving low-resource speech recognition using articulatory and language features. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT’16), 1–7.
[31]
T. Nagarajan and H. A. Murthy. 2003. A pair-wise multiple codebook approach to implicit language identification. In Proceedings of the Workshop on Spoken Language Processing. 101–108.
[32]
D. Nandi, D. Pati, and K. S. Rao. 2017. Implicit processing of LP residual for language identification. Comput. Speech Lang. (2017), 68–87.
[33]
B. Padi, S. Ramoji, V. Yeruva, S. Kumar, and S. Ganapathy. 2018. The LEAP language recognition system for LRE 2017 challenge—Improvements and error analysis. In Proceedings of the Odyssey: The Speaker and Language Recognition Workshop, 31–38.
[34]
V. T. Pham H. Xu E. S. Chng Z. Zeng, Y. Khassanov, and H. Li. 2019. On the end-to-end solution to Mandarin-English code-switching speech recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). 2165–2169.
[35]
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlcek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely. 2011. The Kaldi speech recognition toolkit. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Undertsanding (ASRU’11). Retrieved from http://kaldi-asr.org/.
[36]
L. Rabiner, B. Juang, and B. Yegnanarayana. 2008. Fundamentals of Speech Recognition. Pearson Education.
[37]
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. 2000. Speaker verification using adapted Gaussian mixture models. Dig. Signal Process. 10, 1--3, (2000), 19–41.
[38]
K. T. Riedhammer, T. Bocklet, A. Ghoshal, and D. Povey. 2012. Revisiting semi-continuous hidden Markov models. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12). 4721–4724.
[39]
S. A. SantoshKumar and V. Ramasubramanian. 2005. Automatic language identification using ergodic-HMM. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05). 609–612.
[40]
B. D. Sarma, M. Sarma, M. Sarma, and S. R. M. Prasanna. 2013. Development of assamese phonetic engine: Some issues. In Proceedings of the IEEE Conference of the India Council of Computer Science and Engineering (INDICON’13). 1–6.
[41]
T. Schultz. 2014. Multilingual automatic speech recognition for code-switching speech. In Proceedings of the 9th International Symposium on Chinese Spoken Language Processing.
[42]
T. Schultz and A. Waibel. 1998a. Language independent and language adaptive large vocabulary speech recognition. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’98). 1819–1822.
[43]
T. Schultz and A. Waibel. 1998b. Multilingual and crosslingual speech recognition. In Proceedings of the DARPA Workshop on Broadcast News Transcription and Understanding. 259–262.
[44]
T. Schultz and A. Waibel. 2001. Language independent and language adaptive acoustic modeling for speech recognition. Speech Commun. 35, (2001), 31–51.
[45]
T. Schultz and K. Kirchhoff. 2006. Multilingual Speech Processing. Academic Press.
[46]
scikit-learn. scikit-learn: Machine learning in Python. Retrieved from https://scikit-learn.org.
[47]
Sclite Tool. Retrieved from http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm.
[48]
M. V. Shridhara, B. K Banahatti, L. Narthan, V. Karjigi, and R. Kumaraswamy. 2013. Development of Kannada speech corpus for prosodically guided phonetic search engine. In Proceedings of the 16th International Oriental COCOSDA (O-COCOSDA’13), 1–6.
[49]
S. M. Siniscalchi, D. Lyu, T. Svendsen, and C. Lee. 2012. Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE Trans. Acoust. Speech Signal Process. 20, 3 (2012), 875–887.
[50]
S. Sitaram K. Bali S. Sivasankaran, B. M. L. Srivastava and M. Choudhury. 2018. Phone merging for code-switched speech recognition. In Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-switching, 11–19.
[51]
The International Phonetic Association. 2007. Handbook of the International Phonetic Association. Cambridge University Press. Retrieved from https://www.internationalphoneticassociation.org/.
[52]
S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, and K. Rao. 2018. Multilingual speech recognition with a single end-to-end model. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’18). 4904–4908.
[53]
G. R. Tucker. 1999. A global perspective on bilingualism and bilingual education. ERIC Digest, Office of Educational Research and Improvement (ED), Washington, DC.
[54]
N. T. Vu, D. Imseng, D. Povey, P. Motlicek, T. Schultz, and H. Bourlard2014. Multilingual deep neural network-based acoustic modeling for rapid language adaptation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’14), 7639-7643.
[55]
N. T. Vu, D. Lyu, J. Weiner, D. Telaar, T. Schlippe, F. Blaicher, E. Chng, T. Schultz, and Haizhou Li. 2012. A first speech recognition system for mandarin-english code-switch conversational speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12). 4889–4892.
[56]
A. Waibel, H. Soltau, T. Schultz, T. Schaaf, and F. Metze. 2000. Multilingual speech recognition. In Verbmobil: Foundations of Speech-to-Speech Translation. Artificial Intelligence. Springer, 33–45.
[57]
J. Weiner, N. T. Vu, D. Telaar, F. Metze, T. Schultz, D. Lyu, E. Chng, and H. Li. 2012. Integration of language identification into a recognition system for spoken conversations containing code-switches. In Proceedings of the 3rd Workshop on Spoken Language Technology for Under-resourced Languages (SLTU’12).
[58]
L. Xie P. Guo, H. Xu, and E. S. Chng. 2018. Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 1928–1932.
[59]
E. Yilmaz, A. Biswas, F. De Wet, E. v. d. Westhuizen, and T. Niesler. 2018. Building a unified code-switching asr system for south african languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18), 1923–1927.
[60]
E. Yilmaz, H. v. d. Heuvel, and D. v. Leeuwen. 2016. Investigating Bilingual Deep Neural Networks for automatic recognition of code-switching frisian speech. In Proceedings of the 5th Workshop on Spoken Language Technology for Under-resourced Languages(SLTU), 159–166.
[61]
X. Zhang, J. Trmal, D. Povey, and S. Khudanpur. 2014. Improving deep neural network acoustic models using generalized maxout networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’14). 215–219.
[62]
S. Zhao C. Gong W. Zou N. Luo, D. Jiang and X. Li. 2018. Towards end-to-end code-switching speech recognition. Retrieved from https://arxiv.org/abs/1810.13091.

Cited By

View all
  • (2024)Code-Mixed Street Address Recognition and Accent Adaptation for Voice-Activated Navigation ServicesIEEE Access10.1109/ACCESS.2024.349661712(168393-168411)Online publication date: 2024
  • (2023)Comparative Analysis of Automatic Speech Recognition TechniquesProceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)10.2991/978-94-6463-136-4_79(897-904)Online publication date: 1-May-2023

Index Terms

  1. Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 4
    July 2021
    419 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3465463
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2021
    Accepted: 01 November 2020
    Revised: 01 November 2020
    Received: 01 March 2019
    Published in TALLIP Volume 20, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Indian languages
    2. multilingual phone recognition
    3. LID-switched monolingual PRS
    4. code-switching
    5. common multilingual phone-set

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Code-Mixed Street Address Recognition and Accent Adaptation for Voice-Activated Navigation ServicesIEEE Access10.1109/ACCESS.2024.349661712(168393-168411)Online publication date: 2024
    • (2023)Comparative Analysis of Automatic Speech Recognition TechniquesProceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)10.2991/978-94-6463-136-4_79(897-904)Online publication date: 1-May-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media