Prosodic Phrasing: Machine and Human Evaluation

M. Céu Viana¹,
Luís C. Oliveira² &
Ana I. Mata¹

52 Accesses
6 Citations
Explore all metrics

Abstract

This paper describes a set of experiments aiming at the construction and evaluation of a new phrasing module for European Portuguese text-to-speech synthesis, using classification and regression trees learned from hand-labelled texts. Using the assessment criteria of matching boundary predictions against the corresponding labelled ones, the best solution achieves an overall performance of 91.9%, with 86.3% of correctly assigned breaks and 4.3% of false insertions. Although in absolute terms such scores may be considered surprisingly good given the size of the training set, the total number of exact matches at the sentence level is much lower (22%). This suggested a more formal experiment to test the acceptability of the predicted phrasing in the judgement of human evaluators. As the model was not trained on a labelled speech corpus but on hand-labelled texts, the reference phrasing needed also to be assessed. The evaluation experiment involved 90 participants who were asked to grade both the automatic and the reference phrasings, and also to express their opinion on where the breaks should be placed. As expected, the results showed a large variability among the subjects in their acceptance of a specific sentence partition, and criteria had to be defined to summarise the data from the different evaluators. With the adopted criteria, the performance of the automatic assignment procedure at the sentence level is better rated by human evaluators than by simple matching with the reference corpus (78% vs. 22%, respectively).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Bachenko, J. and Fitzpatrick, E. (1990). A computational grammar of discourse-neutral prosodic phrasing in English. Computational Linguistics, 16(3):155-170.
Google Scholar
Beckman, M.E. and Elam, G.A. (1997). Guidelines for ToBI Labeling. Guidelines Version 3.0. Cleveland, OH: Ohio State University Research Foundation.
Google Scholar
Beckman, M.E. and Hirschberg, J. (1994). The ToBI Annotation Conventions. Appendix A. Cleveland, OH: Ohio State University Research Foundation.
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regression Trees. Pacific Grove, CA: Wadsworth and Brooks.
Google Scholar
Gee, J.P. and Grosjean, F. (1983). Performance structure: A psycholinguistic and linguistic appraisal. Cognitive Psychology, 15:411-458.
Google Scholar
Gussenhoven, C. (1988). Intonational phrasing and the prosodic hierarchy. In W.U. Dressler, H.C. Luschutzky, O.E. Pfeiffer, and R. Rennison (Eds.), Phonologica 1988. Cambridge University Press, pp. 89-99.
Hirschberg, J. and Prieto, P. (1996). Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication, 18:281-290.
Google Scholar
Huang, X., Acero, A., and Hon, H. (2001). Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Englewood Cliff, NJ: Prentice Hall.
Google Scholar
Ladd, D.R. (1996). Intonational Phonology. Cambridge, UK: Cambridge University Press.
Google Scholar
Nespor, M. and Vogel, I. (1986). Prosodic Phonology. Dordrecht, The Netherlands: Foris Publications.
Google Scholar
Oliveira, L.C., Viana, M.C., and Trancoso, I.M. (1991). DIXI-Portuguese text-to-speech system. Proc. of the European Conference on Speech Technology. Genoa, Italy, pp. 1239-1242.
Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation, PhD thesis, Massachusetts Institute of Technology, Cambridge, MA.
Google Scholar
Pierrehumbert, J. and Beckman, M. (1988). Japanese Tone Structure. Cambridge, MA: MIT Press.
Google Scholar
Selkirk, E. (1984). Phonology and Syntax: The Relation between Sound and Structure. Cambridge, MA: MIT Press.
Google Scholar
Selkirk, E. (1986). On derived domains in sentence prosody. In C.J. Ewen and J.M. Anderson (Eds.), Phonology Yearbook 3. London: Cambridge University Press, pp. 371-405.
Google Scholar
Silverman, K. (1987). The Structure and Processing of Fundamental Frequency Contours, PhD thesis, Cambridge University, Cambridge, UK.
Google Scholar
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Whightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J. (1992). ToBI: A standard for labeling english prosody. Proceedings of International Conference on Spoken Language Processing, ICSLP'92. Banff, Canada, pp. 867-870.
Sorin, C., Larreur, D., and Llorca, R. (1987). Rhythm-based prosodic parser for text-to-speech system in French. Proceedings of the 11th International Congress of Phonetic Sciences. Tallinn, Estonia, USSR, pp. 125-128.
Taylor, P. and Black, A. (1998). Assigning phrase breaks from partof-speech sequences. Computer Speeech and Language, 12(2):99-117.
Google Scholar
Viana, M.C. (1987). Para a S´ntese da Entoac¸ ño em Português, PhD thesis, CLUL-INIC, Lisbon, Portugal.
Google Scholar
Wang, M.Q. and Hirschberg, J. (1992). Automatic classification of intonational phrase boundaries. Computer Speech and Language, 6:175-196.
Google Scholar

Download references

Author information

Authors and Affiliations

Centro de Linguística da Universidade de Lisboa (CLUL), Av. Prof. Gama Pinto, no. 2, Lisboa, Portugal
M. Céu Viana & Ana I. Mata
Instituto de Engenharia de Sistemas e Conputadores—Investigão e Desenvolvimento em Lisboa (INESC ID/IST), Rua Alves Redol no. 9, 1000-029, Lisboa, Portugal
Luís C. Oliveira

Authors

M. Céu Viana
View author publications
You can also search for this author in PubMed Google Scholar
Luís C. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Ana I. Mata
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Viana, M.C., Oliveira, L.C. & Mata, A.I. Prosodic Phrasing: Machine and Human Evaluation. International Journal of Speech Technology 6, 83–94 (2003). https://doi.org/10.1023/A:1021060308216

Download citation

Issue Date: January 2003
DOI: https://doi.org/10.1023/A:1021060308216

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speaker-Dependent BiLSTM-Based Phrasing

Human and Transformer-Based Prosodic Phrasing in Two Speech Genres

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Prosodic Phrasing: Machine and Human Evaluation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speaker-Dependent BiLSTM-Based Phrasing

Human and Transformer-Based Prosodic Phrasing in Two Speech Genres

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation