Abstract
Spectrograms provide a visual representation of the time-frequency variations of a speech signal. Furthermore, the color scales can be used as a pre-processing normalization step. In this study, we investigated the suitability of using different color scales for the reconstruction of spectrograms together with bottleneck features extracted from Convolutional AutoEncoders (CAEs). We trained several CAEs considering different parameters such as the number of channels, wideband/narrowband spectrograms, and different color scales. Additionally, we tested the suitability of the proposed CAE architecture for the prediction of the severity of Parkinson’s Disease (PD) and for the nasality level in children with Cleft Lip and Palate (CLP). The results showed that it is possible to estimate the neurological state for PD with Spearman’s correlations of up to 0.71 using the Grayscale, and the nasality level in CLP with F-scores of up to 0.58 using the raw spectrogram. Although the color scales improved performance in some cases, it is not clear which color scale is the most suitable for the selected application, as we did not find significant differences in the results for each color scale.
T. Arias-Vergara—Work done during Ph.D. studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amiriparian, S., et al.: Snore sound classification using image-based deep spectrum features. In: Proceedings of the Interspeech 2017, pp. 3512–3516 (2017). https://doi.org/10.21437/Interspeech.2017-434
Barrett, P., Hunter, J., Miller, J.T., Hsu, J.C., Greenfield, P.: matplotlib-a portable python plotting package. In: Astronomical Data Analysis Software and Systems XIV, vol. 347, p. 91 (2005)
Bhidayasiri, R., Tarsy, D.: Parkinson’s disease: Hoehn and Yahr scale. In: Movement Disorders: a Video Atlas. CCN, pp. 4–5. Humana Press, Totowa, NJ (2012). https://doi.org/10.1007/978-1-60327-426-5_2
Carvajal-Castaño, H.A., Orozco-Arroyave, J.R.: Articulation analysis in the speech of children with cleft lip and palate. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds.) CIARP 2019. LNCS, vol. 11896, pp. 575–585. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33904-3_54
Cernak, M., Orozco-Arroyave, J.R., Rudzicz, F., Christensen, H., Vásquez-Correa, J.C., Nöth, E.: Characterisation of voice quality of Parkinson’s disease using differential phonological posterior features. Comput. Speech Lang. 46, 196–208 (2017)
Cummins, N., Amiriparian, S., Hagerer, G., Batliner, A., Steidl, S., Schuller, B.W.: An image-based deep spectrum feature representation for the recognition of emotional speech. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 478–484 (2017)
Dennis, J., Tran, H.D., Li, H.: Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process. Lett. 18(2), 130–133 (2010)
Dodderi, T., Narra, M., Varghese, S.M., et al.: Spectral analysis of hypernasality in cleft palate children: a pre-post surgery comparison. J. Clin. Diagn. Res. JCDR 10(1), MC01 (2016)
Duffy, J.R.: Motor Speech Disorders: Substrates, Differential Diagnosis, and Management. Elsevier Health Science (2013)
Garcia, N., Orozco-Arroyave, J.R., D’Haro, L.F., Dehak, N., Nöth, E.: Evaluation of the neurological state of people with Parkinson’s disease using i-vectors. In: Interspeech, pp. 299–303 (2017)
Goetz, C.G., et al.: Movement disorder society-sponsored revision of the unified Parkinson’s disease rating scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov. Disord. 23(15), 2129–2170 (2008)
Golabbakhsh, M., Abnavi, F., Kadkhodaei Elyaderani, M., et al.: Automatic identification of hypernasality in normal and cleft lip and palate patients with acoustic analysis of speech. J. Acoust. Soc. Am. 141(2), 929–935 (2017)
Hernández-Mena, C.D., Herrera-Camacho, J.A.: CIEMPIESS: a new open-sourced mexican spanish radio corpus. In: LREC, vol. 14, pp. 371–375 (2014)
Hornykiewicz, O.: Biochemical aspects of Parkinson’s disease. Neurology 51(2 Suppl 2), S2–S9 (1998)
Kummer, A.W.: Cleft Palate and Craniofacial Anomalies: Effects on Speech and Resonance. Nelson Education (2013)
Maier, A., Hönig, F., Bocklet, T., et al.: Automatic detection of articulation disorders in children with cleft lip and palate. J. Acoust. Soc. Am. 126(5), 2589–2602 (2009)
Mossey, P.A., Catilla, E.E., et al.: Global registry and database on craniofacial anomalies: report of a WHO registry meeting on craniofacial anomalies (2003)
Orozco-Arroyave, J.R., Arias-Londoño, J.D., Vargas-Bonilla, J.F., Gonzalez-Rátiva, M.C., Nöth, E.: New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. In: LREC, pp. 342–347 (2014)
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Pérez-Toro, P.A., et al.: Emotional state modeling for the assessment of depression in Parkinson’s disease. In: Ekštein, K., Pártl, F., Konopík, M. (eds.) TSD 2021. LNCS (LNAI), vol. 12848, pp. 457–468. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-83527-9_39
Vásquez-Correa, J.C., Arias-Vergara, T., Schuster, M., Orozco-Arroyave, J.R., Nöth, E.: Parallel representation learning for the classification of pathological speech: studies on Parkinson’s disease and cleft lip and palate. Speech Commun. 122, 56–67 (2020)
Vásquez-Correa, J.C., Orozco-Arroyave, J.R., Bocklet, T., Nöth, E.: Towards an automatic evaluation of the dysarthria level of patients with Parkinson’s disease. J. Commun. Disord. 76, 21–36 (2018)
Williams, A.C., Bearn, D., Mildinhall, S., et al.: Cleft lip and palate care in the United Kingdom-the Clinical Standards Advisory Group (CSAG) Study. Part 2: dentofacial outcomes and patient satisfaction. Cleft Palate-Craniofac. J. 38(1), 24–29 (2001)
Wyatt, R., Sell, D., Russell, J., Harding, A., Harland, K., Albery, L.: Cleft palate speech dissected: a review of current knowledge and analysis. Br. J. Plast. Surg. 49(3), 143–149 (1996)
Yang, C.C., Chung, Y.M., Chi, L.Y., Chen, H.H., Wang, Y.T.: Analysis of verbal diadochokinesis in normal speech using the diadochokinetic rate analysis program. J. Dent. Sci. 6(4), 221–226 (2011)
Zahid, L., et al.: A spectrogram-based deep feature assisted computer-aided diagnostic system for Parkinson’s disease. IEEE Access 8, 35482–35495 (2020)
Acknowledgements
This work was funded by the European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie grant agreement No. 766287, and partially funded by CODI at UdeA grant # PRG2020-34068. T. Arias-Vergara is under grants of Convocatoria Doctorado Nacional-785 financed by COLCIENCIAS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Pérez-Toro, P.A. et al. (2022). 50 Shades of Gray: Effect of the Color Scale for the Assessment of Speech Disorders. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-16270-1_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)