[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Automatic recognition of oral vowels in tone language: Experiments with fuzzy logic and neural network models

Published: 01 January 2011 Publication History

Abstract

Automatic recognition of tone language speech is a complex problem in that it involves two parallel recognition tasks. A recognition system to accomplish this task must be able to simultaneously recognise tone and phone Components in the acoustic signal. The acoustic cue for the tones is the fundamental frequency (F0) while the first and second formant (F1 and F2) frequencies are the acoustic cues for the phones. In this study, we experiment with two soft-computing techniques, namely: artificial neural network (ANN) and fuzzy logic (FL) in the recognition of oral vowels in tone language. The standard Yoruba (SY) language is used for our case study. The ANN and FL speech recognition systems were developed using MatLab. The result showed that the ANN based model performed better on the training data while the FL based model performed better on the test set. This implies that the ANN system was able to interpolate or approximate the data more accurately whereas the FL system is better at extrapolating from the data. In addition, it was observed that the ANN system required larger amount of data for it is development whereas the FL system development requires some expert's knowledge. In conclusion, the FL based system seems to be the better approach for developing practical automatic speech recognition (ASR) system for languages such as SY where the language resources are limited.

References

[1]
Bámgbósé, A., Yorùbá Ortography. 1965. Cambridge University Press, Cambridge.
[2]
Dalston, R.M., Acoustic characteristics of English/w,r,l/spoken correctly by young children and adult. Journal of Acoustic Society of America. v570 i2.
[3]
Pal, S.K. and Majmder, D.D., Fuzzy sets and decisionmaking approaches in vowel and speaker recognition. IEEE Transactions on Systems, Man and Cybernetics. 625-629.
[4]
De Mori, R., Gubrynowicz, R. and Laface, P., Inference of a knowledge source for the recognition of nasals in continuous speech. IEEE Transactions on Acousitc Speech and Signal Processing. v270 i5. 538-549.
[5]
Adéwálé, L.O., The Yorùbá High Tone Syllable Revisited. Number 19. 1986. Department of Linguistics, University of Edinburgh, Edinburgh.
[6]
Liu, L.-C., Yang, W.-J., Wang, H.-C. and Chang, Y.-C., Tone recognition of polysyllabic words in Mandarin speech. Computer Speech and Language. v30 i3. 253-264.
[7]
Rabiner, L., A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. v770 i2. 257-286.
[8]
Connell, B. and Ladd, D.R., Aspect of pitch realisation in Yorùbá. Phonology. v7. 1-29.
[9]
Lee, K.F., Context-dependent phonetic hidden Markov models for speaker independent continuous speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing. v4. 599-609.
[10]
Bámgbósé, A., Fonólójí íti Gírámí Yorùbá. 1990. University Press PLC, íbídín.
[11]
Juang, B.H. and Rabiner, L.R., Hidden Markov models for speech recognition. Technometrics. v330 i3. 251-272.
[12]
Chudý, V., Hapák, V. and Chudý, L., Isolated word recognition in Slavak via neural nets. Neurocomputing. v3 i5-6. 259-282.
[13]
Dai, J., Isolated word recognition using Markov chain models. IEEE Transactions on Speech and Audio Processing. v30 i6. 259-282.
[14]
Lee, L.-S., Tseng, C.-Y., Liu, F.-H., Chang, C.H., Gu, H.-Y., Hsieh, S.H. and Chen, C.H., Special speech recognition approaches for the highly confusing mandarin syllables based on hidden markov models. Computer Speech and Language. v50 i2. 181-201.
[15]
Wu, P., Warwick, K. and Koska, M., Neural network feature maps for Chinese phonemes. Neurocomputing. v40 i1-2. 109-112.
[16]
Lee, Y. and Lee, L.-S., Continuous hidden markov models integrating transitional and instantaneous features for Mandarin syllable recognition. Computer Speech and Language. v70 i3. 247-263.
[17]
Bezdek, J.C., Fuzzy models and digital signal processing: (for pattern recognition): is this a good marriage?. Digital Signal Processing. v3. 253-270.
[18]
Akinlabí, A., Underspecification and phonology of yorùbá/r/. Linguistic Inquiry. v240 i1. 139-160.
[19]
Huang, E.-F., Soongand, F.K. and Wang, H.-C., The use of tree-trellis search for large-vocabulary Mandarin polysyllabic word speech recognition. Computer Speech and Language. v80 i1. 39-50.
[20]
Hanson, S.J., Backpropagation: some comments and variations. In: Rumelhart, D.E., Yves, C. (Eds.), Backpropagation Theory, Architecture, and Applications, Lawrence Erlbaum, New Jersey. pp. 237-271.
[21]
Lee, T., Ching, P.C., Chang, L.W., Cheng, Y.H. and Mak, B., Tone recognition of isolated Cantonese syllables. IEEE Transactions on Speech and Audio Processing. v30 i3. 204-209.
[22]
T. Lee, Automatic recognition of isolated cantonese syllable using neural networks, Ph.D. Thesis, The Chinese University of Hong Kong, Hong Kong, 1996.
[23]
Lin, C.-H., Wu, C.-H., Ting, P.-Y. and Wang, H.-M., Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units. Speech Communication. v180 i2. 175-190.
[24]
Ming, J. and Smith, F.J., Modelling of the interframc dependence in an HMM using conditional guassian mixtures. Computer Speech and Language. v10. 229-247.
[25]
Fu, S.W.K., Lee, C.H. and Clubb, O.L., A survey on Chinese speech recognition. Communications of COLIPS. v6. 1-17.
[26]
Ray, K.S. and Ghoshal, J., Neuro fuzzy approach to pattern recognition. Neural Networks. v100 i1. 161-182.
[27]
S.K. Riis, Hidden Markov models and neural networks for speech recognition, Ph.D. Thesis, Department of Mathematical Modelling, Technical University of Denmark, Available at http://citeseer.ist.psu.edu/185951.html, Visited March 2008, 1998.
[28]
Zhang, Y. and Zhu, X., A HMM-based integrated method for speaker independent speech recognition. In: Proceedings of ICSP 98, pp. 613-616.
[29]
Taylor, P.A., King, S., Isard, S.D. and Wright, H., Intonation and dialogue context as constraints for speech recognition. Language and Speech. v410 i3. 493-512.
[30]
Zue, V., Cole, R. and Ward, W., Speech recognition. In: Cole, R., Mariani, J., Uszkoreit, H., Varile, G.B., Zaenen, A., Zampolli, A., Zue, V. (Eds.), Survey of the State of the Art in Human Language Technology, Studies in Natural Language Processing, Cambridge University Press.
[31]
Shen, J.-L., Wang, H.-M., Lyu, R.-Y. and Lee, L.-S., Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition. Computer Speech and Language. v130 i1. 79-97.
[32]
Lin, M.-T., Lee, C.-K. and Lin, C.-Y., Consonant/vowel segmentation for Mandarin syllable recognition. Computer Speech and Language. v130 i3. 207-222.
[33]
Harrison, P., Acquiring the phonology of lexical tone in infants. Lingua. v110. 581-616.
[34]
Akinlabí, A. and Liberman, Tonal Complexes and Tonal Alignment. 2000.
[35]
Rao, A.V. and Rose, K., Deterministically annealed design of hidden markov model speech recognizers. IEEE Transactions on Speech and Audio Processing. v90 i2. 111-126.
[36]
Z. Mengjie, Overview of speech recognition and related machine learning techniques, Technical report, Victoria University of Wellington, www.mcs.vuw.ac.nz/comp/Publications/archive/CS-TR-01/CS-TR-01-15.pdf (Visited January 2007), 2001.
[37]
Demeechai, T. and Mäkeläinen, K., Recognition of syllables in a tone language. Speech Communication. v33. 241-254.
[38]
Trentin, E. and Gori, M., A survey of hybrid ANN-HMM models for automatic speech recognition. Neurocomputing. v370 i1. 91-126.
[39]
Thubthong, N. and Kijsirikul, B., Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. v90 i6. 815-825.
[40]
Zhao, J. and Bose, B.K., Evaluation of membership functions for fuzzy logic controlled induction motor drive. In: Proceedings of the 28th IEEE Annual Conference of the Industrial Electronics Society, volume 1, November, pp. 229-234.
[41]
Demuth, H. and Beale, M., Neural Network Toolbox For Use with MATLAB. 2002. The MathWorks, Inc., Natick, MA.
[42]
Leung, R.W.K., Lau, H.C.W. and Kwong, C.K., Article on a responsive replenishment system:a fuzzy logic approach. Expert Systems. v200 i1. 20-32.
[43]
Lefévre, F., Non-parametric probability estimation for non-parametric probability estimation for HMM-based automatic speech recognition. Computer Speech and Language. v170 i2-3. 113-136.
[44]
Wester, M., Pronunciation modelling for ASR-knowledge-based and data-derived methods. Computer Speech and Language. v17. 69-85.
[45]
Weber, K., Ikbal, S., Bengio, S. and Bourlard, H., Robust speech recognition and feature extraction using HMM. Computer Speech and Language. v170 i2-3. 195-211.
[46]
Rosti, A.-V.I. and Gales, M.J.F., Factor analysed hidden markov models for speech recognition. Computer Speech and Language. v180 i2. 181-200.
[47]
Boersma, P. and Weenink, D., Praat: Doing Phonetics by Computer. 2004.
[48]
Sato, Y., Voice quality conversion using interactive evolution of prosodic control. Applied Soft Computing. v5. 181-192.
[49]
Johnson, D., Modelling the Speech Signal. 2005.
[50]
Russell, M.J. and Jackson, P.J.B., A multiple-level linearlinear segmental HMM with a formant-based intermediate layer. Computer Speech and Language. v190 i2. 205-225.
[51]
Hagen, A. and Morris, A., Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR. Computer Speech and Language. v190 i1. 3-30.
[52]
M.A. Przezdziecki, Vowel harmony and coarticulation in three dialects of Yorùbá: phonetics determining phonology, Ph.D. Thesis, Cornel University, Available at http://ling.cornell.edu/docs/przezdziecki2005.pdf. Visited September 2007, 2005.
[53]
Trentin, E. and Gori, M., Inversion-based nonlinear adaptation of noisy acoustic parameters for a neural/HMM speech recognizer. Neurocomputing. v700 i1-3. 398-408.
[54]
Sjölander, K. and Beskow, J., Wavesurfer Introduction. 2006.
[55]
Odejobí, O.A., Wong, S.H.S. and Beaumont, A.J., A fuzzy decision tree-based duration model for standard Yorùbá text-to-speech synthesis. Computer Speech and Language. v21. 325-349.
[56]
Brito, J., Genetic learning of vocal tract area functions for articulatory synthesis of Spannish vowels. Applied Soft Computing. v7. 1035-1043.
[57]
Halavati, R., Shouraki, S.B. and Zadeh, S.H., Recognition of human speech phonemes using a novel fuzzy approach. Applied Soft Computing. v7. 828-839.
[58]
Tóth, L. and Kocsor, A., A segment-based interpretation of HMMANN hybrids. Computer Speech and Language. v210 i3. 526-578.
[59]
Wutiwiwatchai, C. and Furui, S., Thai speech technology: a review. Speech Communication. v49. 8-27.
[60]
Salor, O., Pellom, B.L., Ciloglu, T. and Demirekler, M., Turkish speech corpora and recognition tools developed by porting sonic: towards multilingual speech recognition. Computer Speech and Language. v21. 580-593.
[61]
Wikipedia, Yoruba Language. 2007.
[62]
Odejobi, O.A., Recognition of tones in Yorùbá speech: experiments with artificial neural networks. In: Bhanu, P., Prasanna, S.R.M. (Eds.), Speech, Audio, Image and Biomedical Signal Processing using Neural Networks, volume 83 of Studies in Computational Intelligence, Springer. pp. 23-48.
[63]
Qian, Y., Soong, F.K. and Lee, T., Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR. Computer Speech and Language. v22. 360-373.
[64]
Qian, Y. and Soong, F.K., A multi-space distribution (MSD) and two-stream tone modeling approach to Mandarin speech recognition. Speech Communication. v510 i12. 1169-1179.
[65]
Nair, N.U. and Sreenivas, T.V., Joint evaluation of multiple speech patterns for speech recognition and training. Computer Speech and Language. v240 i2. 307-340.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Applied Soft Computing
Applied Soft Computing  Volume 11, Issue 1
January, 2011
1490 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2011

Author Tags

  1. Soft-computing
  2. Speech recognition
  3. Tone language

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media