Phoneme Duration Prediction for Kazakh Language

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

International Conference on Speech and Computer

1531 Accesses
2 Citations

Abstract

Our research team set the goal of creating a modern speech synthesis system for the Kazakh language. One of the most important components of such system is the phoneme duration prediction. In this article, we present our work on the creation of such a classifier. We managed to develop a detector based on deep neural network, using for this purpose a minimum number of input linguistic and phonetic parameters. Based on the learning results, the proposed detector predicts the duration of phonemes on test data with a deviation of 20–25 ms on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DNN-Based Duration Modeling for Synthesizing Short Sentences

LSTM-Based Kazakh Speech Synthesis

Incorporation of Manner of Articulation Constraint in LSTM for Speech Recognition

Article 28 February 2019

References

Arman K., Rybin, S.V., Matveev, Y.N., Kaziyeva, N., Burambayeva, N.,: Modeling pause for the synthesis of Kazakh speech. In: Proceedings of the Fourth International Conference on Engineering & MIS 2018 (ICEMIS 2018), Article 1, 4 p. ACM, New York, NY, USA, (2018). https://doi.org/10.1145/3234698.3234699
Chen, B., Bian, T., Yu, K.: Discrete duration model for speech synthesis. In: 18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, 20–24 August 2017, pp. 789–793 (2017)
Google Scholar
Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: 15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014, Singapore, 14–18 September 2014, pp. 2268–2272 (2014)
Google Scholar
Foltz, P.W.: Latent semantic analysis for text-based research. Behav. Res. Methods Instrum. Comput. 28(2), 197–202 (1996). https://doi.org/10.3758/BF03204765
Article Google Scholar
Henter, G.E., Ronanki, S., Watts, O., Wester, M., Wu, Z., King, S.: Robust TTS duration modelling using DNNs. In: Proceedings of the ICASSP, vol. 41. IEEE, Shanghai, March 2016. http://homepages.inf.ed.ac.uk/ghenter/pubs/henter2016robust.pdf
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Kaliyev, A., Rybin, S.V., Matveev, Y.: The pausing method based on brown clustering and word embedding. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 741–747. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_74
Chapter Google Scholar
Karpov, A., Verkhodanova, V.: Speech technologies for under-resourced languages of the world 2015, pp. 117–135 (2015)
Google Scholar
Khomitsevich, O., Mendelev, V., Tomashenko, N., Rybin, S., Medennikov, I., Kudubayeva, S.: A bilingual Kazakh-Russian system for automatic speech recognition and synthesis. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 25–33. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23132-7_3
Chapter Google Scholar
Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of ACL 2008: HLT, pp. 595–603. Association for Computational Linguistics, Columbus, June 2008. http://www.aclweb.org/anthology/P/P08/P08-1068
Miller, S., Guinness, J., Zamanian, A.: Name tagging with word clusters and discriminative training. In: Susan Dumais, D.M., Roukos, S. (eds.) HLT-NAACL 2004: Main Proceedings, pp. 337–342. Association for Computational Linguistics, Boston, 2–7 May 2004. http://www.aclweb.org/anthology/N04-1043
Ronanki, S., Watts, O., King, S., Henter, G.E.: Median-based generation of synthetic speech durations using a non-parametric approach. CoRR abs/1608.06134 (2016). http://arxiv.org/abs/1608.06134
Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4470–4474 (2015)
Google Scholar
Zen, H., Senior, A.W.: Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, 4–9 May 2014, pp. 3844–3848 (2014). https://doi.org/10.1109/ICASSP.2014.6854321

Download references

Acknowledgments

This work was partially financially supported by the Government of the Russian Federation (Grant 08-08) and by the initial funding from the ITMO University.

Author information

Authors and Affiliations

ITMO University, Saint Petersburg, Russia
Arman Kaliyev, Sergey V. Rybin & Yuri N. Matveev

Authors

Arman Kaliyev
View author publications
You can also search for this author in PubMed Google Scholar
Sergey V. Rybin
View author publications
You can also search for this author in PubMed Google Scholar
Yuri N. Matveev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arman Kaliyev .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaliyev, A., Rybin, S.V., Matveev, Y.N. (2018). Phoneme Duration Prediction for Kazakh Language. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_29
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics