Abstract
Text-to-speech (TTS) is currently a mature technology used in many areas such as education and accessibility. Some modules of a TTS system depend on the language and, while there are many public materials for some languages (e.g., English and Japanese), the resources for Brazilian Portuguese (BP) are still limited. This work describes the development of a complete hidden Markov model (HMM) based TTS system for BP which can be applied to the desktop environment. It also releases a set of natural language processing tools for BP, which expands the already publicly available resources, supporting the development of new researches for academic or industrial purposes. Subjective and objective performance tests are presented, comparing the proposed TTS system with other softwares currently available for BP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The syllable is a unit relatively easy to identify and segmental if the splitting rules stipulated by the language orthography are followed. However, as a phonological unit, there is no consensus about its basic structure, as discussed in [9]. For most authors, a syllable is defined so that its nucleus, canonically a vowel, constitutes a peak in the curve of audibility that is preceded (onset) and/or followed (coda) by a sequence of segments (none or more consonants), with progressively decreasing sonority values. The nucleus and coda are sometimes lumped together to form what is called the rhyme. By applying these principles, the syllable is a speech unit of rhythmic organization, although other authors disagree, stating that the syllable should not be seen in parts but as a whole.
References
Dicionário Online de Português. (2018). http://www.dicio.com.br/
Grupo falabrasil (2018). https://goo.gl/EWcfdg
HTS (2018). http://hts.sp.nitech.ac.jp/
HTS Engine (2018). http://hts-engine.sourceforge.net/
Alcaim, A., Solewicz, J.A., de Morais, J.A.: Frequência de ocorrência dos fones e listas de frases foneticamente balanceadas para o português falado no Rio de Janeiro. Revista da Sociedade Brasileira de Telecomunicacoes 7(1), 23–41 (1992)
Braga, D., Coelho, L., Resende Jr., F.G.V.: A rule-based grapheme-to-phone converter for TTS systems in European Portuguese, pp. 141–156 (2007)
Braga, D., Silva, P., Ribeiro, M., Dias, M.S., Campillo, F., Garc’a-Mateo, C.: Hélia, Heloisa and Helena: new HTS systems in European Portuguese, Brazilian Portuguese and Galician. In: International Conference on Computational Processing of the Portuguese Language, PROPOR 2010 (2010)
Cirigliano, R.J.R., Monteiro, C., Barbosa, F.L., Resende Jr., F.G.V.R., Couto, L.R., de Morais, J.A.: Um conjunto de 1000 frases foneticamente balanceadas para o português brasileiro obtido utilizando e a abordagem de algoritmos genéticos. Anais do Simpósio Brasileiro de Telecomunicações (SBrT) (2005)
Collischonn, G.: Introdução a Estudos de Fonologia do Português Brasileiro. Porto Alegre: EDIPUCRS, pp. 95–126 (2005)
Costa, E., Monte, A., Neto, N., Klautau, A.: Um Framework para Desenvolvimento de Sistemas TTS Personalizados no Português do Brasil. In: XXX Simpósio Brasileiro de Telecomunicações (2012)
Couto, I., Neto, N., Tadaiesky, V., Klautau, A., Maia, R.: An open source HMM-based text-to-speech system for Brazilian Portuguese. In: 7th International Telecommunications Symposium (2010)
Dutoit, T., Pagel, V., Pierret, N., Bataille, F., Vrecken, O.V.D.: The MBROLA project: towards a set of high-quality speech synthesizers free of use for non-commercial purposes. In: Proceedings of ICSLP 1996, Philadelphia, vol. 3, pp. 1393–1396 (1996)
Faria, A.: Applied Phonetics: Portuguese Text-to-Speech. Technical report, University of California (2003)
Maciel, A., Carvalho, E.: Integration and evaluation of an HMM-based text-to-speech system to FIVE. In: 19th International Conference on Systems, Signals and Image Processing, IWSSIP 2012 (2012)
Maia, R., Zen, H., Tokuda, K., Kitamura, T., Resende, F.: An HMM-based Brazilian Portuguese speech synthetiser and its characteristics. J. Commun. Inf. Syst. 21, 58–71 (2006)
Monte, A., Ribeiro, D., Neto, N., Cruz, R., Klautau, A.: A rule-based syllabification algorithm with stress determination for Brazilian Portuguese natural language processing. In: 17th International Congress of Phonetic Sciences, pp. 1418–1421 (2011)
Barbosa, P., et al.: Aiuruete: a high-quality concatenative text-to-speech system for Brazilian Portuguese with demisyllabic analysis-based units and hierarchical model of rhythm production. In: Proceedings of the Eurospeech 1999, pp. 2059–2062 (1999)
Schröder, M., Trouvain, J.: The German text-to-speech synthesis system MARY: a tool for research, development and teaching. Int. J. Speech Technol. 6, 365–377 (2001)
Silva, D., de Lima, A., Maia, R., Braga, D., de Moraes, J.F., de Moraes, J.A., Resende Jr., F.G.: A rule-based grapheme-phone converter and stress determination for Brazilian Portuguese natural language processing. In: VI International Telecommunications Symposium (2006)
Silva, D.C., Braga, D., Resende Jr., F.G.V.: Separação das Silabas e Determinação da Tonicidade no Português Brasileiro. In: XXVI Simpósio Brasileiro de Telecomunicações, SBrT 2008 (2008)
Siravenha, A., Neto, N., Macedo, V., Klautau, A.: Uso de Regras Fonológicas com de terminação de Vogal Tônica para Conversão Grafema-Fone em Português Brasileiro. In: 7th International Information and Telecommunication Technologies Symposium (2008)
Souza, D., Saturnino, L., Maciel, A.: A portability evaluation of Brazilian Portuguese voice produced with MARY TTS. In: 2014 International Conference on Systems, Signals and Image Processing (IWSSIP) (2014)
Taylor, P.: Text-To-Speech Synthesis. Cambridge University Press, Cambridge (2009)
Turunen, M.: Speech application design and development. Technical report (2004)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proceedings of EUROSPEECH, vol. 5, no. 98, pp. 2347–2350 (1999)
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Costa, E., Neto, N. (2018). Free Tools and Resources for HMM-Based Brazilian Portuguese Speech Synthesis. In: Simari, G.R., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J.A. (eds) Advances in Artificial Intelligence – IBERAMIA 2018. IBERAMIA 2018. Lecture Notes in Computer Science(), vol 11238. Springer, Cham. https://doi.org/10.1007/978-3-030-03928-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-03928-8_30
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03927-1
Online ISBN: 978-3-030-03928-8
eBook Packages: Computer ScienceComputer Science (R0)