Abstract
In this paper we present the process of designing an efficient speech corpus for the first unit selection speech synthesis system for Bulgarian, along with some significant preliminary results regarding the quality of the resulted system. As the initial corpus is a crucial factor for the quality delivered by the Text-to-Speech system, special effort has been given in designing a complete and efficient corpus for use in a unit selection TTS system. The targeted domain of the TTS system and hence that of the corpus is the news reports, and although it is a restricted one, it is characterized by an unlimited vocabulary. The paper focuses on issues regarding the design of an optimal corpus for such a framework and the ideas on which our approach was based on. A novel multi-stage approach is presented, with special attention given to language and speaker dependent issues, as they affect the entire process. The paper concludes with the presentation of our results and the evaluation experiments, which provide clear evidence of the quality level achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Moulines, E., Charpentier, F.: Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication 9, 453–467 (1990)
Schroeter, J.: Basic Principles of Speech Synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y. (eds.) Springer Handbook of Speech Processing. Springer, Heidelberg (2008)
Nagy, A., Pesti, P., Nemeth, G., Bihm, T.: Design issues of a corpus-based speech synthesizer. Hungarian J. Commun. 6, 18–24 (2005)
Mobius, B.: Corpus-Based Speech Synthesis: Methods and Challenges. Arbeitspapiere des Instituts fur Maschinelle Sprachverarbeitung (Univ. Stuttgart), AIMS 6(4), 87–116 (2000)
Hunt, A., Black, A.: Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database. In: Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Atlanta, pp. 373–376 (1996)
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis. In: Proc. of ICASSP, pp.1315–1318 (June 2000)
Dutoit, T.: Corpus-based Speech Synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y. (eds.) Springer Handbook of Speech Processing, Part D, ch. 21, pp. 437–455. Springer, Heidelberg (2008)
Black, A., Lenzo, K.: Optimal Utterance Selection for Unit Selection Speech Synthesis Databases. International Journal of Speech Technology 6(4), 357–363 (2003)
Bozkurt, B., Ozturk, O., Dutoit, T.: Text Design for TTS Speech Corpus Building Using a Modified Greedy Selection. In: Eurospeech 2003, pp. 277–280 (2003)
Matousek, J., Psutka, J., Kruta, J.: Design of Speech Corpus for Text-to-Speech Synthesis. In: Eurospeech 2001, Alborg (2001)
Lewis, E., Tatham, M.: Word and Syllable Concatenation in Text-to-Speech Synthesis. In: Eurospeech 2001, vol. 2, pp. 615–618 (1999)
Lambert, T.: Automatic construction of a prosodically rich text corpus for speech synthesis systems. In: SP 2006, paper 200 (2006)
Yi, J.R.W., Glass, J.R.: Natural-Sounding Speech Synthesis using Variable-Length Units. In: Proc. ICSLP 1998, Sydney Australia, vol. 4, pp. 1167–1170 (1998)
Kishore, S.P., Black, A.: Unit Size in Unit Selection Speech Synthesis. In: Eurospeech 2003, pp. 1317–1320 (2003)
Schweitzer, A., Braunschweiler, N., Klankert, T., Mobius, B., Sauberlich, B.: Restricted Unlimited Domain Synthesis. In: Eurospeech 2003, pp. 1321–1324 (2003)
Mobius, B.: Rare events and closed domains: Two delicate concepts in speech synthesis. International Journal of Speech Technology 6(1), 57–71 (2003)
Franois, H., Boffard, O.: The Greedy Algorithm and its Application to the Construction of a Continuous Speech Database. In: 3rd International Conference on Language Resources and Evaluation (LREC 2002), vol. 5, pp. 1420–1426 (2002)
Andersen, O., Hoequist, C.: Keeping Rare Events Rare. In: Eurospeech 2003, vol. 2, pp. 1337–1340 (2003)
Balestri, M., Pacchiotti, A., Quazza, S., Salza, P., Sandri, S.: Choose the best to modify the least: a new generation concatenative synthesis system. In: Proceedings of the European Conference on Speech Communication and Technology, Budapest, Hungary, vol. 5, pp. 2291–2294 (1999)
Raptis, S., Tsiakoulis, P., Chalamandaris, A., Karabetsos, S.: High Quality Unit-Selection Speech Synthesis for Bulgarian. In: Proceedings of SPECOM 2009, St. Petersburg, Russia, June 21-25, pp. 388–393 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chalamandaris, A., Tsiakoulis, P., Raptis, S., Karabetsos, S. (2011). Corpus Design for a Unit Selection TtS System with Application to Bulgarian. In: Vetulani, Z. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2009. Lecture Notes in Computer Science(), vol 6562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20095-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-20095-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20094-6
Online ISBN: 978-3-642-20095-3
eBook Packages: Computer ScienceComputer Science (R0)