Optimal Utterance Selection for Unit Selection Speech Synthesis Databases

Alan W. Black¹ &
Kevin Lenzo¹

108 Accesses
3 Citations
Explore all metrics

Abstract

This paper describes techniques to find an optimal data set for building high quality unit-selection speech synthesis inventories. As the quality of unit-selection speech synthesis is dependent on the coverage of the database used in the selection, it is important to select the right data to record. In this paper we describe some simple techniques as well as a more complex acoustic modeling technique based on the database speaker's acoustic characteristics. Result of a simple evaluation procedure are presented justifying the technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Algorithms for Automatic Selection of Allophones to the Acoustic Units Database

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Black, A.W. and Lenzo, K.A. (2000a). Limited domain synthesis. International Conference on Spoken Language Processing, ICSLP 2000. Beijing, China, vol. II, pp. 411-414.
Google Scholar
Black, A.W. and Lenzo, K.A. (2000b). Building voices in the Festival speech synthesis system. http://festvox.org/festvox/.
Black, A.W. and Taylor, P. (1997). Automatically clustering similar units for unit selection in speech synthesis. Proceedings of Eurospeech'97. Rhodes, Greece, vol. 2, pp. 601-604.
Google Scholar
Black, A.W., Taylor, P., and Caley, R. (1998). The Festival speech synthesis system. http://festvox.org/festival.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Pacific Grove, CA: Wadsworth & Brooks.
Google Scholar
Carroll, L. (1865). Alice's Adventures in Wonderland. London, UK: Macmillan.
Google Scholar
Donovan, R. and Woodland, P. (1995). Improvements in an HMMbased speech synthesiser. Proceedings of Eurospeech'95. Madrid, Spain, vol. 1, pp. 573-576.
Google Scholar
Fisher, W., Doddington, G., and Goudie-Marshall, K. (1986). The DARPA speech recognition research database: Specifications and status. Proceedings of the DARPA Workshop on Speech Recognition. Palo Alto, CA. pp. 93-99.
Fujimura, O. (1994). C/D model: A computational model of phonetic implementation. In E.S. Ristad (Ed.), Langauge Computations, Volume 17 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Providence, RI: American Mathematical Society, pp. 1-20.
Google Scholar
Hart, M. (2000). Project Gutenberg. http://promo.net/pg/.
Hunt, A. and Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'96. Atlanta, Georgia, vol. 1, pp. 373-376.
Google Scholar
Lenzo, K. and Black, A. (2000). Diphone collection and synthesis. International Conference on Spoken Language Processing, ICSLP 2000. Beijing, China, vol. III, pp. 306-309.
Google Scholar
Rudnicky, A., Bennett, C., Black, A., Chotimongkol, A., Lenzo, K., Oh, A., and Singh, R. (2000). Task and domain specific modelling in the Carnegie Mellon Communicator system. International Conference on Spoken Language Processing, ICSLP 2000, Beijing, China, vol. II, pp. 130-133.
Google Scholar
van Santen, J. and Buchsbaum, A. (1997). Methods for optimal text selection. Proceedings of Eurospeech'97. Rhodes, Greece, vol. 2, pp. 553-556.
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Institute, Carnegie Mellon University, 4525 NSH/LTI, 5000 Forbes Ave., Pittsburgh, PA, 15213, USA
Alan W. Black & Kevin Lenzo

Authors

Alan W. Black
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Lenzo
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Black, A.W., Lenzo, K. Optimal Utterance Selection for Unit Selection Speech Synthesis Databases. International Journal of Speech Technology 6, 357–363 (2003). https://doi.org/10.1023/A:1025704800086

Download citation

Issue Date: October 2003
DOI: https://doi.org/10.1023/A:1025704800086

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Algorithms for Automatic Selection of Allophones to the Acoustic Units Database

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Optimal Utterance Selection for Unit Selection Speech Synthesis Databases

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Algorithms for Automatic Selection of Allophones to the Acoustic Units Database

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation