Abstract
This paper describes a set of experiments aiming at the construction and evaluation of a new phrasing module for European Portuguese text-to-speech synthesis, using classification and regression trees learned from hand-labelled texts. Using the assessment criteria of matching boundary predictions against the corresponding labelled ones, the best solution achieves an overall performance of 91.9%, with 86.3% of correctly assigned breaks and 4.3% of false insertions. Although in absolute terms such scores may be considered surprisingly good given the size of the training set, the total number of exact matches at the sentence level is much lower (22%). This suggested a more formal experiment to test the acceptability of the predicted phrasing in the judgement of human evaluators. As the model was not trained on a labelled speech corpus but on hand-labelled texts, the reference phrasing needed also to be assessed. The evaluation experiment involved 90 participants who were asked to grade both the automatic and the reference phrasings, and also to express their opinion on where the breaks should be placed. As expected, the results showed a large variability among the subjects in their acceptance of a specific sentence partition, and criteria had to be defined to summarise the data from the different evaluators. With the adopted criteria, the performance of the automatic assignment procedure at the sentence level is better rated by human evaluators than by simple matching with the reference corpus (78% vs. 22%, respectively).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bachenko, J. and Fitzpatrick, E. (1990). A computational grammar of discourse-neutral prosodic phrasing in English. Computational Linguistics, 16(3):155-170.
Beckman, M.E. and Elam, G.A. (1997). Guidelines for ToBI Labeling. Guidelines Version 3.0. Cleveland, OH: Ohio State University Research Foundation.
Beckman, M.E. and Hirschberg, J. (1994). The ToBI Annotation Conventions. Appendix A. Cleveland, OH: Ohio State University Research Foundation.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regression Trees. Pacific Grove, CA: Wadsworth and Brooks.
Gee, J.P. and Grosjean, F. (1983). Performance structure: A psycholinguistic and linguistic appraisal. Cognitive Psychology, 15:411-458.
Gussenhoven, C. (1988). Intonational phrasing and the prosodic hierarchy. In W.U. Dressler, H.C. Luschutzky, O.E. Pfeiffer, and R. Rennison (Eds.), Phonologica 1988. Cambridge University Press, pp. 89-99.
Hirschberg, J. and Prieto, P. (1996). Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication, 18:281-290.
Huang, X., Acero, A., and Hon, H. (2001). Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Englewood Cliff, NJ: Prentice Hall.
Ladd, D.R. (1996). Intonational Phonology. Cambridge, UK: Cambridge University Press.
Nespor, M. and Vogel, I. (1986). Prosodic Phonology. Dordrecht, The Netherlands: Foris Publications.
Oliveira, L.C., Viana, M.C., and Trancoso, I.M. (1991). DIXI-Portuguese text-to-speech system. Proc. of the European Conference on Speech Technology. Genoa, Italy, pp. 1239-1242.
Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation, PhD thesis, Massachusetts Institute of Technology, Cambridge, MA.
Pierrehumbert, J. and Beckman, M. (1988). Japanese Tone Structure. Cambridge, MA: MIT Press.
Selkirk, E. (1984). Phonology and Syntax: The Relation between Sound and Structure. Cambridge, MA: MIT Press.
Selkirk, E. (1986). On derived domains in sentence prosody. In C.J. Ewen and J.M. Anderson (Eds.), Phonology Yearbook 3. London: Cambridge University Press, pp. 371-405.
Silverman, K. (1987). The Structure and Processing of Fundamental Frequency Contours, PhD thesis, Cambridge University, Cambridge, UK.
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Whightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J. (1992). ToBI: A standard for labeling english prosody. Proceedings of International Conference on Spoken Language Processing, ICSLP'92. Banff, Canada, pp. 867-870.
Sorin, C., Larreur, D., and Llorca, R. (1987). Rhythm-based prosodic parser for text-to-speech system in French. Proceedings of the 11th International Congress of Phonetic Sciences. Tallinn, Estonia, USSR, pp. 125-128.
Taylor, P. and Black, A. (1998). Assigning phrase breaks from partof-speech sequences. Computer Speeech and Language, 12(2):99-117.
Viana, M.C. (1987). Para a S´ntese da Entoac¸ ño em Português, PhD thesis, CLUL-INIC, Lisbon, Portugal.
Wang, M.Q. and Hirschberg, J. (1992). Automatic classification of intonational phrase boundaries. Computer Speech and Language, 6:175-196.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Viana, M.C., Oliveira, L.C. & Mata, A.I. Prosodic Phrasing: Machine and Human Evaluation. International Journal of Speech Technology 6, 83–94 (2003). https://doi.org/10.1023/A:1021060308216
Issue Date:
DOI: https://doi.org/10.1023/A:1021060308216