Abstract
In this paper, we investigate the contribution of tone in a Hidden Markov Model (HMM)-based speech synthesis of Ibibio (ISO 693-3: nic; Ethnologue: IBB), an under-resourced language. We review the language’s speech characteristics, required for building the front end components of the design and propose a finite state transducer (FST), useful for modelling the language’s tonetactics. The existing speech database of Ibibio is also studied and the quality of synthetic speech examined through a spectral analysis of voices obtained from two synthesis experiments, with and without tone feature labels. A confusion matrix classifying the results of a controlled listening test for both experiments is constructed and statistics comparing their performance quality presented. Results obtained revealed that synthesis systems with tone feature labels outperformed synthesis systems without tone feature labels, as more tone confusions were perceived by listeners in the latter.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The [
u] sound, however occurs in word finally in very few words based on dialectal (free) variation. Some dialects of Ibibio have the word, rat as ‘’ while others have it as ‘’. This is just one exception in the distribution of Ibibio central vowels.
References
Bao, Z. (1999). The structure of tone. New York: Oxford University Press.
Barkat, M. S., & Gadallah, M. E.-S. (2010). The effect of speech features and HMM parameters on the quality of HMM based Arabic synthesis system. International Journal of Computer and Electrical Engineering, 2(2), 235–242.
Black, A. W., Zen, H., & Tokuda, K. (2007). Statistical parametric speech synthesis. In Proc. of IEEE international conference on acoustic, speech and signal processing (ICASSP) (pp. 1229–1232).
Chen, M. Y. (2000). Tone sandhi: patterns across Chinese dialects, Cambridge, England: CUP Halle, M. & Stevens, K. (1971). A note on laryngeal features (Quarterly progress report 101). MIT.
Dutoit, T. (1997). An introduction to text-to-speech synthesis. Dordrecht: Kluwer Academic.
Ekpenyong, M. E. (2013). Speech synthesis for tone language systems. Ph.D. dissertation, University of Uyo, Nigeria.
Ekpenyong, M., Urua, E.-A., & Gibbon, D. (2008). Towards an unrestricted domain TTS system for African tone languages. International Journal of Speech Technology 11, 87–96.
Ekpenyong, M., Urua, E.-A., Watts, O., King, S., & Yamagishi, J. (2013). Statistical parametric speech synthesis for Ibibio. Speech Communication [Online]. Available at doi:10.1016/j.specom.2013.02.003
Essien, O. E. (1990). A grammar of the Ibibio language. Ibadan: University Press Limited.
Gibbon, D. (1987). Finite state processing of tone languages. In Proceedings of European ACL, Copenhagen.
Gibbon, D. (2004). Tone and timing: two problems and two methods for prosodic typology. In Proceedings of the international conference on tonal aspects of languages, Beijing.
Gibbon, D., Urua, E.-A., & Ulrike, G. (2003). A computational model of low tones in Ibibio. In Proc. of the international Congress of phonetic sciences, Barcelona (pp. 623–626).
Gibbon, D., Urua, E.-A., & Ekpenyong, M. (2004). Data creation for Ibibio speech synthesis. Local Language Speech Technology Initiative (LLSTI) Publication.
Gu, W., & Lee, T. (2007). Effects of focus on prosody of cantonese speech—a comparison of surface feature analysis and model-based analysis. In Proc. of the international workshop on paralinguistic speech—between models and data, Saarbrücken, Germany.
Hirst, D., & di Cristo, A. (1998). Intonation systems: a survey of twenty languages. London: Cambridge Univ. Press.
Hyman, L. M. (1975). Phonology: theory and analysis. New York: Holt, Rinehart and Winston.
Katamba, F. (1997). Morphology. London: Macmillan Press.
Keller, E., Bailly, G., Monaghan, A., Terken, J., & Huckvale, M. (2002). Improvements in speech synthesis: cost 258: the naturalness of synthetic speech. Chichester: Willey.
Kingston, J. (2005). The phonetics of Athabaskan tonogenesis. In S. Hargus & K. Rice (Eds.), Athabaskan prosody (pp. 137–184). Amsterdam: John Benjamins.
Kirhner, J. S. (2003). Tone synthesis in Mandarin Chinese, University of Arisona. Available at http://jessesabakirchner.com/docs/2003-Mandarin-tone-synthesis.pdf.
Law, K. M., Tan, L., & Lau, W. H. (2001). Cantonese text-to-speech synthesis using sub-syllable units. In Proc. INTERSPEECH (pp. 991–994).
Lee, K., & Cox, R. V. (2001). A very low bit rate speech coder based on recognition/synthesis paradigm. IEEE Transactions on Speech and Audio Processing, 9(5), 482–491.
Louw, J. A. (2008). Speect: a multilingual text-to-speech system. In Proc. of 19th annual symposium of the pattern recognition association of South Africa (PRASA), Cape Town (pp. 165–168).
Saychum, S., Rugchatjaroen, A., Thatphithakkul, N., Wutiwiwatchai, C., & Thangthai, A. (2008). Automatic duration weighting in Thai unit-selection speech synthesis. In Proceedings of 5th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON), Krabi (pp. 549–552).
Shih, C. (2007). Prosody learning and generation. Berlin: Springer.
Stewart, J. M. (1983). Downstep and floating low tones in Adioukrou. Journal of African Languages and Linguistics, 5, 57–78.
Tokuda, K., Zen, H., & Black, A. (2002). An HMM-based speech synthesis system applied to English. In Proc. IEEE workshop on speech synthesis (pp. 227–230).
Urua, E. (2007). Ibibio phonetics and phonology, (Revised ed.). Port-Harcourt: M & J Grand Orbit Communications Ltd.
van Santen, J. P. H., Sproat, R. W., Olive, J. P., & Hirschberg, J. (1997). Progress in speech synthesis. New York: Springer.
Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: basic concepts, state of the art and future challenges (pp. 23–40). Chichester: Wiley.
Yip, M. (2002). Tone. London: Cambridge Univ. Press.
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A., & Tokuda, K. (2007). The HMM-based speech synthesis system (HTS) version 2.0. In Proc. of 6th ISCA workshop on speech synthesis, Bonn (pp. 294–299).
Acknowledgements
We appreciate Dr. Okokon Akpan, a lecturer in the Department of Linguistics and Nigerian Languages, University of Uyo, for responding to our request—to record and use his voice for the Ibibio speech database. We also thank the anonymous reviewers for their excellent comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ekpenyong, M.E., Udoh, E. Tone modelling in Ibibio speech synthesis. Int J Speech Technol 17, 145–159 (2014). https://doi.org/10.1007/s10772-013-9216-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-013-9216-2