[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Sibilant Consonants Classification with Deep Neural Networks

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2019)

Abstract

Many children suffering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children’s voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech.

Recently, the use of deep neural networks has given considerable improvements in classification for a variety of use cases, from image classification to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy.

We compared the performance of several different artificial neural networks that used Mel frequency cepstral coefficients or log Mel filterbanks. Our best deep learning model achieves classification scores of \(95.48\%\) using a 2D convolutional model with log Mel filterbanks as input features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Freepik, http://www.freepik.com.

References

  1. Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 48, pp. 173–182. PMLR (2016)

    Google Scholar 

  2. Anjos, I., Grilo, M., Ascensão, M., Guimarães, I., Magalhães, J., Cavaco, S.: A serious mobile game with visual feedback for training sibilant consonants. In: Cheok, A.D., Inami, M., Romão, T. (eds.) ACE 2017. LNCS, vol. 10714, pp. 430–450. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76270-8_30

    Chapter  Google Scholar 

  3. Barratt, J., Littlejohns, P., Thompson, J.: Trial of intensive compared with weekly speech therapy in preschool children. Arch. Dis. Child. 67(1), 106–108 (1992)

    Article  Google Scholar 

  4. Benselama, Z., Guerti, M., Bencherif, M.: Arabic speech pathology therapy computer aided system. J. Comput. Sci. 3(9), 685–692 (2007)

    Article  Google Scholar 

  5. Bhogal, S.K., Teasell, R., Speechley, M.: Intensity of aphasia therapy, impact on recovery. Stroke 34(4), 987–993 (2003)

    Article  Google Scholar 

  6. Carvalho, M.I.P., Ferreira, A.: Interactive game for the training of Portuguese vowels. Master’s thesis. Faculdade de Engenharia da Universidade do Porto (2008)

    Google Scholar 

  7. Clarkson, P., Moreno, P.J.: On the use of support vector machines for phonetic classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 585–588 (1999)

    Google Scholar 

  8. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Readings in Speech Recognition, pp. 65–74. Elsevier (1990)

    Google Scholar 

  9. Denes, G., Perazzolo, C., Piani, A., Piccione, F.: Intensive versus regular speech therapy in global aphasia: a controlled study. Aphasiology 10(4), 385–394 (1996)

    Article  Google Scholar 

  10. Figueiredo, A.C.: Análise acústica dos fonemas produzidos por crianças com desempenho articulatório alterado. Master’s thesis. Escola Superior de Saúde de Alcoitão (2017)

    Google Scholar 

  11. Gold, B., Morgan, N., Ellis, D.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music, 2nd edn. Wiley-Interscience, Hoboken (2011)

    Book  Google Scholar 

  12. Guimarães, I.: A Ciência e a Arte da Voz Humana. ESSA - Escola Superior de Saúde do Alcoitão (2007)

    Google Scholar 

  13. Hsu, C.W., Lee, L.S.: Higher order cepstral moment normalization for improved robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(2), 205–220 (2009)

    Article  Google Scholar 

  14. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 1st edn. Prentice Hall PTR, Upper Saddle River (2001)

    Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  17. Mestre, I.: Sibilantes e motricidade orofacial em crianças portuguesas dos 5:00 aos 9:11 anos de idade. Master’s thesis. Escola Superior de Saúde do Alcoitão (2018)

    Google Scholar 

  18. Miodońska, Z., Kręcichwost, M., Szymańska, A.: Computer-aided evaluation of sibilants in preschool children sigmatism diagnosis. In: Piętka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 367–376. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39796-2_30

    Chapter  Google Scholar 

  19. Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient and dynamic time warping techniques. Computing Research Repository (CoRR) abs/1003.4083 (2010)

    Google Scholar 

  20. Palaz, D., Magimai-Doss, M., Collobert, R.: Analysis of CNN-based speech recognition system using raw speech as input. In: Proceedings of Interspeech, pp. 11–15 (2015)

    Google Scholar 

  21. Preston, J., Edwards, M.L.: Phonological awareness and types of sound errors in preschoolers with speech sound disorders. J. Speech Lang. Hear. Res. 53(1), 44–60 (2010)

    Article  Google Scholar 

  22. Rua, M.: Caraterização do desempenho articulatório e oromotor de crianças com alterações da fala. Master’s thesis. Escola Superior de Saúde de Alcoitão (2015)

    Google Scholar 

  23. Sainath, T.N., Kingsbury, B., Mohamed, A.R., Saon, G., Ramabhadran, B.: Improvements to filterbank and delta learning within a deep neural network framework. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6839–6843 (2014)

    Google Scholar 

  24. Sainath, T.N., Kingsbury, B., Ramabhadran, B., Fousek, P., Novak, P., Mohamed, A.R.: Making deep belief networks effective for large vocabulary continuous speech recognition. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 30–35 (2011)

    Google Scholar 

  25. Sainath, T.N., Weiss, R.J., Senior, A., Wilson, K.W., Vinyals, O.: Learning the speech front-end with raw waveform CLDNNs. In: Proceedings of the Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  26. Salomon, J., King, S., Salomon, J.: Framewise phone classification using support vector machines. In: Proceedings of the International Conference on Spoken Language Processing (2002)

    Google Scholar 

  27. Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Proceedings of Interspeech (2009)

    Google Scholar 

  28. Solera-Ureña, R., Padrell-Sendra, J., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F.: SVMs for automatic speech recognition: a survey. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.) Progress in Nonlinear Speech Processing. LNCS, vol. 4391, pp. 190–216. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71505-4_11

    Chapter  Google Scholar 

  29. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  30. Valentini-Botinhao, C., Degenkolb-Weyers, S., Maier, A., Nöth, E., Eysholdt, U., Bocklet, T.: Automatic detection of sigmatism in children. In: Proceedings of the Workshop on Child, Computer Interaction (WOCCI) (2012)

    Google Scholar 

  31. Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Portuguese Foundation for Science and Technology under projects BioVisualSpeech (CMUP-ERI/TIC/0033/2014) and NOVA-LINCS (PEest/UID/CEC/04516/2019). We thank Mariana Ascensão and the postgraduate SLP students from Escola Superior de Saúde do Alcoitão who collaborated in the data collection task. Finally, we thank Agrupamento de Escolas de Almeida Garrett, and the children who participated in the recordings.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivo Anjos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anjos, I., Marques, N., Grilo, M., Guimarães, I., Magalhães, J., Cavaco, S. (2019). Sibilant Consonants Classification with Deep Neural Networks. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11805. Springer, Cham. https://doi.org/10.1007/978-3-030-30244-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30244-3_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30243-6

  • Online ISBN: 978-3-030-30244-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics