Tone modelling in Ibibio speech synthesis

Moses E. Ekpenyong¹ &
EmemObong Udoh²

212 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, we investigate the contribution of tone in a Hidden Markov Model (HMM)-based speech synthesis of Ibibio (ISO 693-3: nic; Ethnologue: IBB), an under-resourced language. We review the language’s speech characteristics, required for building the front end components of the design and propose a finite state transducer (FST), useful for modelling the language’s tonetactics. The existing speech database of Ibibio is also studied and the quality of synthetic speech examined through a spectral analysis of voices obtained from two synthesis experiments, with and without tone feature labels. A confusion matrix classifying the results of a controlled listening test for both experiments is constructed and statistics comparing their performance quality presented. Results obtained revealed that synthesis systems with tone feature labels outperformed synthesis systems without tone feature labels, as more tone confusions were perceived by listeners in the latter.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Improved Syllable-Based Text to Speech Synthesis for Tone Language Systems

Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states

Article 05 December 2018

Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The [u] sound, however occurs in word finally in very few words based on dialectal (free) variation. Some dialects of Ibibio have the word, rat as ‘’ while others have it as ‘’. This is just one exception in the distribution of Ibibio central vowels.

References

Bao, Z. (1999). The structure of tone. New York: Oxford University Press.
Google Scholar
Barkat, M. S., & Gadallah, M. E.-S. (2010). The effect of speech features and HMM parameters on the quality of HMM based Arabic synthesis system. International Journal of Computer and Electrical Engineering, 2(2), 235–242.
Article Google Scholar
Black, A. W., Zen, H., & Tokuda, K. (2007). Statistical parametric speech synthesis. In Proc. of IEEE international conference on acoustic, speech and signal processing (ICASSP) (pp. 1229–1232).
Google Scholar
Chen, M. Y. (2000). Tone sandhi: patterns across Chinese dialects, Cambridge, England: CUP Halle, M. & Stevens, K. (1971). A note on laryngeal features (Quarterly progress report 101). MIT.
Dutoit, T. (1997). An introduction to text-to-speech synthesis. Dordrecht: Kluwer Academic.
Book Google Scholar
Ekpenyong, M. E. (2013). Speech synthesis for tone language systems. Ph.D. dissertation, University of Uyo, Nigeria.
Ekpenyong, M., Urua, E.-A., & Gibbon, D. (2008). Towards an unrestricted domain TTS system for African tone languages. International Journal of Speech Technology 11, 87–96.
Article Google Scholar
Ekpenyong, M., Urua, E.-A., Watts, O., King, S., & Yamagishi, J. (2013). Statistical parametric speech synthesis for Ibibio. Speech Communication [Online]. Available at doi:10.1016/j.specom.2013.02.003
Essien, O. E. (1990). A grammar of the Ibibio language. Ibadan: University Press Limited.
Google Scholar
Gibbon, D. (1987). Finite state processing of tone languages. In Proceedings of European ACL, Copenhagen.
Google Scholar
Gibbon, D. (2004). Tone and timing: two problems and two methods for prosodic typology. In Proceedings of the international conference on tonal aspects of languages, Beijing.
Google Scholar
Gibbon, D., Urua, E.-A., & Ulrike, G. (2003). A computational model of low tones in Ibibio. In Proc. of the international Congress of phonetic sciences, Barcelona (pp. 623–626).
Google Scholar
Gibbon, D., Urua, E.-A., & Ekpenyong, M. (2004). Data creation for Ibibio speech synthesis. Local Language Speech Technology Initiative (LLSTI) Publication.
Gu, W., & Lee, T. (2007). Effects of focus on prosody of cantonese speech—a comparison of surface feature analysis and model-based analysis. In Proc. of the international workshop on paralinguistic speech—between models and data, Saarbrücken, Germany.
Google Scholar
Hirst, D., & di Cristo, A. (1998). Intonation systems: a survey of twenty languages. London: Cambridge Univ. Press.
Google Scholar
Hyman, L. M. (1975). Phonology: theory and analysis. New York: Holt, Rinehart and Winston.
Google Scholar
Katamba, F. (1997). Morphology. London: Macmillan Press.
Google Scholar
Keller, E., Bailly, G., Monaghan, A., Terken, J., & Huckvale, M. (2002). Improvements in speech synthesis: cost 258: the naturalness of synthetic speech. Chichester: Willey.
Google Scholar
Kingston, J. (2005). The phonetics of Athabaskan tonogenesis. In S. Hargus & K. Rice (Eds.), Athabaskan prosody (pp. 137–184). Amsterdam: John Benjamins.
Google Scholar
Kirhner, J. S. (2003). Tone synthesis in Mandarin Chinese, University of Arisona. Available at http://jessesabakirchner.com/docs/2003-Mandarin-tone-synthesis.pdf.
Law, K. M., Tan, L., & Lau, W. H. (2001). Cantonese text-to-speech synthesis using sub-syllable units. In Proc. INTERSPEECH (pp. 991–994).
Google Scholar
Lee, K., & Cox, R. V. (2001). A very low bit rate speech coder based on recognition/synthesis paradigm. IEEE Transactions on Speech and Audio Processing, 9(5), 482–491.
Article Google Scholar
Louw, J. A. (2008). Speect: a multilingual text-to-speech system. In Proc. of 19th annual symposium of the pattern recognition association of South Africa (PRASA), Cape Town (pp. 165–168).
Google Scholar
Saychum, S., Rugchatjaroen, A., Thatphithakkul, N., Wutiwiwatchai, C., & Thangthai, A. (2008). Automatic duration weighting in Thai unit-selection speech synthesis. In Proceedings of 5th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON), Krabi (pp. 549–552).
Google Scholar
Shih, C. (2007). Prosody learning and generation. Berlin: Springer.
Google Scholar
Stewart, J. M. (1983). Downstep and floating low tones in Adioukrou. Journal of African Languages and Linguistics, 5, 57–78.
Google Scholar
Tokuda, K., Zen, H., & Black, A. (2002). An HMM-based speech synthesis system applied to English. In Proc. IEEE workshop on speech synthesis (pp. 227–230).
Google Scholar
Urua, E. (2007). Ibibio phonetics and phonology, (Revised ed.). Port-Harcourt: M & J Grand Orbit Communications Ltd.
Google Scholar
van Santen, J. P. H., Sproat, R. W., Olive, J. P., & Hirschberg, J. (1997). Progress in speech synthesis. New York: Springer.
Book MATH Google Scholar
Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: basic concepts, state of the art and future challenges (pp. 23–40). Chichester: Wiley.
Google Scholar
Yip, M. (2002). Tone. London: Cambridge Univ. Press.
Book Google Scholar
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A., & Tokuda, K. (2007). The HMM-based speech synthesis system (HTS) version 2.0. In Proc. of 6th ISCA workshop on speech synthesis, Bonn (pp. 294–299).
Google Scholar

Download references

Acknowledgements

We appreciate Dr. Okokon Akpan, a lecturer in the Department of Linguistics and Nigerian Languages, University of Uyo, for responding to our request—to record and use his voice for the Ibibio speech database. We also thank the anonymous reviewers for their excellent comments.

Author information

Authors and Affiliations

Department of Computer Science, University of Uyo, P.M.B. 1017, 520003, Uyo, Akwa Ibom State, Nigeria
Moses E. Ekpenyong
Department of Linguistics and Nigerian Languages, University of Uyo, P.M.B. 1017, 520003, Uyo, Akwa Ibom State, Nigeria
EmemObong Udoh

Authors

Moses E. Ekpenyong
View author publications
You can also search for this author in PubMed Google Scholar
EmemObong Udoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moses E. Ekpenyong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ekpenyong, M.E., Udoh, E. Tone modelling in Ibibio speech synthesis. Int J Speech Technol 17, 145–159 (2014). https://doi.org/10.1007/s10772-013-9216-2

Download citation

Received: 27 June 2013
Accepted: 21 November 2013
Published: 04 December 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10772-013-9216-2

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improved Syllable-Based Text to Speech Synthesis for Tone Language Systems

Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states

Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Tone modelling in Ibibio speech synthesis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improved Syllable-Based Text to Speech Synthesis for Tone Language Systems

Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states

Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation