Sibilant Consonants Classification with Deep Neural Networks

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11805))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

1888 Accesses

Abstract

Many children suffering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children’s voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech.

Recently, the use of deep neural networks has given considerable improvements in classification for a variety of use cases, from image classification to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy.

We compared the performance of several different artificial neural networks that used Mel frequency cepstral coefficients or log Mel filterbanks. Our best deep learning model achieves classification scores of \(95.48\%\) using a 2D convolutional model with log Mel filterbanks as input features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deploying a Speech Therapy Game Using a Deep Neural Network Sibilant Consonants Classifier

A Serious Mobile Game with Visual Feedback for Training Sibilant Consonants

Evaluation Methods of English Advanced Pronunciation Skills Based on Speech Recognition

Notes

1.
Freepik, http://www.freepik.com.

References

Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 48, pp. 173–182. PMLR (2016)
Google Scholar
Anjos, I., Grilo, M., Ascensão, M., Guimarães, I., Magalhães, J., Cavaco, S.: A serious mobile game with visual feedback for training sibilant consonants. In: Cheok, A.D., Inami, M., Romão, T. (eds.) ACE 2017. LNCS, vol. 10714, pp. 430–450. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76270-8_30
Chapter Google Scholar
Barratt, J., Littlejohns, P., Thompson, J.: Trial of intensive compared with weekly speech therapy in preschool children. Arch. Dis. Child. 67(1), 106–108 (1992)
Article Google Scholar
Benselama, Z., Guerti, M., Bencherif, M.: Arabic speech pathology therapy computer aided system. J. Comput. Sci. 3(9), 685–692 (2007)
Article Google Scholar
Bhogal, S.K., Teasell, R., Speechley, M.: Intensity of aphasia therapy, impact on recovery. Stroke 34(4), 987–993 (2003)
Article Google Scholar
Carvalho, M.I.P., Ferreira, A.: Interactive game for the training of Portuguese vowels. Master’s thesis. Faculdade de Engenharia da Universidade do Porto (2008)
Google Scholar
Clarkson, P., Moreno, P.J.: On the use of support vector machines for phonetic classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 585–588 (1999)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Readings in Speech Recognition, pp. 65–74. Elsevier (1990)
Google Scholar
Denes, G., Perazzolo, C., Piani, A., Piccione, F.: Intensive versus regular speech therapy in global aphasia: a controlled study. Aphasiology 10(4), 385–394 (1996)
Article Google Scholar
Figueiredo, A.C.: Análise acústica dos fonemas produzidos por crianças com desempenho articulatório alterado. Master’s thesis. Escola Superior de Saúde de Alcoitão (2017)
Google Scholar
Gold, B., Morgan, N., Ellis, D.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music, 2nd edn. Wiley-Interscience, Hoboken (2011)
Book Google Scholar
Guimarães, I.: A Ciência e a Arte da Voz Humana. ESSA - Escola Superior de Saúde do Alcoitão (2007)
Google Scholar
Hsu, C.W., Lee, L.S.: Higher order cepstral moment normalization for improved robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(2), 205–220 (2009)
Article Google Scholar
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 1st edn. Prentice Hall PTR, Upper Saddle River (2001)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Mestre, I.: Sibilantes e motricidade orofacial em crianças portuguesas dos 5:00 aos 9:11 anos de idade. Master’s thesis. Escola Superior de Saúde do Alcoitão (2018)
Google Scholar
Miodońska, Z., Kręcichwost, M., Szymańska, A.: Computer-aided evaluation of sibilants in preschool children sigmatism diagnosis. In: Piętka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 367–376. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39796-2_30
Chapter Google Scholar
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient and dynamic time warping techniques. Computing Research Repository (CoRR) abs/1003.4083 (2010)
Google Scholar
Palaz, D., Magimai-Doss, M., Collobert, R.: Analysis of CNN-based speech recognition system using raw speech as input. In: Proceedings of Interspeech, pp. 11–15 (2015)
Google Scholar
Preston, J., Edwards, M.L.: Phonological awareness and types of sound errors in preschoolers with speech sound disorders. J. Speech Lang. Hear. Res. 53(1), 44–60 (2010)
Article Google Scholar
Rua, M.: Caraterização do desempenho articulatório e oromotor de crianças com alterações da fala. Master’s thesis. Escola Superior de Saúde de Alcoitão (2015)
Google Scholar
Sainath, T.N., Kingsbury, B., Mohamed, A.R., Saon, G., Ramabhadran, B.: Improvements to filterbank and delta learning within a deep neural network framework. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6839–6843 (2014)
Google Scholar
Sainath, T.N., Kingsbury, B., Ramabhadran, B., Fousek, P., Novak, P., Mohamed, A.R.: Making deep belief networks effective for large vocabulary continuous speech recognition. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 30–35 (2011)
Google Scholar
Sainath, T.N., Weiss, R.J., Senior, A., Wilson, K.W., Vinyals, O.: Learning the speech front-end with raw waveform CLDNNs. In: Proceedings of the Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Salomon, J., King, S., Salomon, J.: Framewise phone classification using support vector machines. In: Proceedings of the International Conference on Spoken Language Processing (2002)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Proceedings of Interspeech (2009)
Google Scholar
Solera-Ureña, R., Padrell-Sendra, J., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F.: SVMs for automatic speech recognition: a survey. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.) Progress in Nonlinear Speech Processing. LNCS, vol. 4391, pp. 190–216. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71505-4_11
Chapter Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Valentini-Botinhao, C., Degenkolb-Weyers, S., Maier, A., Nöth, E., Eysholdt, U., Bocklet, T.: Automatic detection of sigmatism in children. In: Proceedings of the Workshop on Child, Computer Interaction (WOCCI) (2012)
Google Scholar
Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849 (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by the Portuguese Foundation for Science and Technology under projects BioVisualSpeech (CMUP-ERI/TIC/0033/2014) and NOVA-LINCS (PEest/UID/CEC/04516/2019). We thank Mariana Ascensão and the postgraduate SLP students from Escola Superior de Saúde do Alcoitão who collaborated in the data collection task. Finally, we thank Agrupamento de Escolas de Almeida Garrett, and the children who participated in the recordings.

Author information

Authors and Affiliations

NOVA LINCS, Department of Computer Science, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516, Caparica, Portugal
Ivo Anjos, Nuno Marques, João Magalhães & Sofia Cavaco
Escola Superior de Saúde do Alcoitão, Rua Conde Barão, Alcoitão, 2649-506, Alcabideche, Portugal
Margarida Grilo & Isabel Guimarães

Authors

Ivo Anjos
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Marques
View author publications
You can also search for this author in PubMed Google Scholar
Margarida Grilo
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Guimarães
View author publications
You can also search for this author in PubMed Google Scholar
João Magalhães
View author publications
You can also search for this author in PubMed Google Scholar
Sofia Cavaco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivo Anjos .

Editor information

Editors and Affiliations

INESC-TEC, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal
Paulo Moura Oliveira
University of Minho, Braga, Portugal
Paulo Novais
LIACC/UP, University of Porto, Porto, Portugal
Luís Paulo Reis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anjos, I., Marques, N., Grilo, M., Guimarães, I., Magalhães, J., Cavaco, S. (2019). Sibilant Consonants Classification with Deep Neural Networks. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11805. Springer, Cham. https://doi.org/10.1007/978-3-030-30244-3_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-30244-3_36
Published: 30 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30243-6
Online ISBN: 978-3-030-30244-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics