Abstract
This paper describes techniques to find an optimal data set for building high quality unit-selection speech synthesis inventories. As the quality of unit-selection speech synthesis is dependent on the coverage of the database used in the selection, it is important to select the right data to record. In this paper we describe some simple techniques as well as a more complex acoustic modeling technique based on the database speaker's acoustic characteristics. Result of a simple evaluation procedure are presented justifying the technique.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Black, A.W. and Lenzo, K.A. (2000a). Limited domain synthesis. International Conference on Spoken Language Processing, ICSLP 2000. Beijing, China, vol. II, pp. 411-414.
Black, A.W. and Lenzo, K.A. (2000b). Building voices in the Festival speech synthesis system. http://festvox.org/festvox/.
Black, A.W. and Taylor, P. (1997). Automatically clustering similar units for unit selection in speech synthesis. Proceedings of Eurospeech'97. Rhodes, Greece, vol. 2, pp. 601-604.
Black, A.W., Taylor, P., and Caley, R. (1998). The Festival speech synthesis system. http://festvox.org/festival.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Pacific Grove, CA: Wadsworth & Brooks.
Carroll, L. (1865). Alice's Adventures in Wonderland. London, UK: Macmillan.
Donovan, R. and Woodland, P. (1995). Improvements in an HMMbased speech synthesiser. Proceedings of Eurospeech'95. Madrid, Spain, vol. 1, pp. 573-576.
Fisher, W., Doddington, G., and Goudie-Marshall, K. (1986). The DARPA speech recognition research database: Specifications and status. Proceedings of the DARPA Workshop on Speech Recognition. Palo Alto, CA. pp. 93-99.
Fujimura, O. (1994). C/D model: A computational model of phonetic implementation. In E.S. Ristad (Ed.), Langauge Computations, Volume 17 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Providence, RI: American Mathematical Society, pp. 1-20.
Hart, M. (2000). Project Gutenberg. http://promo.net/pg/.
Hunt, A. and Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'96. Atlanta, Georgia, vol. 1, pp. 373-376.
Lenzo, K. and Black, A. (2000). Diphone collection and synthesis. International Conference on Spoken Language Processing, ICSLP 2000. Beijing, China, vol. III, pp. 306-309.
Rudnicky, A., Bennett, C., Black, A., Chotimongkol, A., Lenzo, K., Oh, A., and Singh, R. (2000). Task and domain specific modelling in the Carnegie Mellon Communicator system. International Conference on Spoken Language Processing, ICSLP 2000, Beijing, China, vol. II, pp. 130-133.
van Santen, J. and Buchsbaum, A. (1997). Methods for optimal text selection. Proceedings of Eurospeech'97. Rhodes, Greece, vol. 2, pp. 553-556.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Black, A.W., Lenzo, K. Optimal Utterance Selection for Unit Selection Speech Synthesis Databases. International Journal of Speech Technology 6, 357–363 (2003). https://doi.org/10.1023/A:1025704800086
Issue Date:
DOI: https://doi.org/10.1023/A:1025704800086