Automated ASD detection in children from raw speech using customized STFT-CNN model

Kurma Venkata Keerthana Sai¹,
Rompicharla Thanmayee Krishna¹,
Kodali Radha ORCID: orcid.org/0000-0002-8064-0440^1,2,
Dhulipalla Venkata Rao¹ &
…
Abdul Muneera¹

244 Accesses
Explore all metrics

Abstract

Autism spectrum disorder (ASD), a prevalent neurodevelopmental condition impacting cognitive, communicative, and behavioral aspects, typically manifests in early childhood due to genetic, environmental, and immunological factors. Employing a novel dataset termed children’s ASD speech corpus (CASD-SC), the research makes use of short-time Fourier transform (STFT) layered convolutional neural networks (CNN), incorporating an image input layer and a sequence input layer. The analysis encompasses data both with and without augmentation, exploring various CNN configurations. Results showcase that the log spectrogram-based STFT layered CNN model achieves 86.6% accuracy for the raw data, while the pre-emphasis filter (PEF) with learnables-based STFT layered CNN model attains 99.1% accuracy for the data with augmentation for detecting ASD in children. This investigation bridges the literature gap by evaluating child-specific raw speech data. The study underscores the significance of processing and training efficiency in ASD diagnosis and promotes early intervention techniques by improving ASD detection in children.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Hybrid deep transfer learning-based early diagnosis of autism spectrum disorder using scalogram representation of electroencephalography signals

Article 08 November 2023

Sch-net: a deep learning architecture for automatic detection of schizophrenia

Article Open access 03 August 2021

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Article 22 February 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The dataset is accessible upon request for research purposes, aimed at providing valuable insights to the scientific and research communities.

References

Bone, D., Bishop, S. L., Black, M. P., Goodwin, M. S., Lord, C., & Narayanan, S. S. (2016). Use of machine learning to improve autism screening and diagnostic instruments: Effectiveness, efficiency, and multi-instrument fusion. Journal of Child Psychology and Psychiatry, 57(8), 927–937.
Article Google Scholar
Carter, A. S., Davis, N. O., Klin, A., & Volkmar, F. R. (2005). Social development in autism. In Handbook of autism and pervasive developmental disorders, (Volume 1, pp. 312–334).
Article Google Scholar
Chauhan, A., Sahu, J. K., Jaiswal, N., Kumar, K., Agarwal, A., Kaur, J., Singh, S., & Singh, M. (2019). Prevalence of autism spectrum disorder in Indian children: A systematic review and meta-analysis. Neurology India, 67(1), 100–104.
Article Google Scholar
Cho, S., Liberman, M., Ryant, N., Cola, M., Schultz, R. T., & Parish-Morris, J. (2019). Automatic detection of autism spectrum disorder in children using acoustic and text features from brief natural conversations. In Interspeech, (pp. 2513–2517).
Divakar, C., Harsha, R., Radha, K., Rao, D. V., Madhavi, N., & Bharadwaj, T. (2024). Explainable AI for CNN-LSTM network in PCG-based valvular heart disease diagnosis. In 2024 14th international conference on cloud computing, data science & engineering (confluence), (pp. 92–97). IEEE.
Faja, S., & Dawson, G. (2017). Autism spectrum disorder. In Child and adolescent psychopathology, (Third Edn., pp. 745–782).
Hesketh, A., Dima, E., & Nelson, V. (2007). Teaching phoneme awareness to pre-literate children with speech disorder: A randomized controlled trial. International Journal of Language & Communication Disorders, 42(3), 251–271.
Article Google Scholar
Huemer, S. V., & Mann, V. (2010). A comprehensive profile of decoding and comprehension in autism spectrum disorders. Journal of Autism and Developmental Disorders, 40, 485–493.
Article Google Scholar
Hyman, S. L., Levy, S. E., Myers, S. M., Kuo, D. Z., Apkon, S., Davidson, L. F., Ellerbeck, K. A., Foster, J. E., Noritz, G. H., Leppert, M. O., et al. (2020). Identification, evaluation, and management of children with autism spectrum disorder. Pediatrics, 145(1), 90.
Google Scholar
Kakihara, Y., Takiguchi, T., Ariki, Y., Nakai, Y., Takada, S., Kakihara, Y., et al. (2015). Investigation of classification using pitch features for children with autism spectrum disorders and typically developing children. American Journal of Signal Processing, 5(1), 1–5.
Google Scholar
Kim, S. H., Paul, R., Tager-Flusberg, H., & Lord, C. (2014). Language and communication in autism. In Handbook of autism and pervasive developmental disorders, (Fourth Edn.).
Lahiri, R., Kumar, M., Bishop, S., & Narayanan, S. (2020). Learning domain invariant representations for child-adult classification from speech. In 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP 2020), (pp. 6749–6753). IEEE.
LeBlanc, J. J., Fagiolini, M., et al. (2011). Autism: A “critical period” disorder? Neural Plasticity, 2011.
Lee, J. H., Lee, G. W., Bong, G., Yoo, H. J., & Kim, H. K. (2020). Deep-learning-based detection of infants with autism spectrum disorder using auto-encoder feature representation. Sensors, 20(23), 6762.
Article Google Scholar
Lee, S., Potamianos, A., & Narayanan, S. (1999). Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. The Journal of the Acoustical Society of America, 105(3), 1455–1468.
Article Google Scholar
Liu, W., Li, M., & Yi, L. (2016). Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework. Autism Research, 9(8), 888–898.
Article Google Scholar
Lyall, K., Croen, L., Daniels, J., Fallin, M. D., Ladd-Acosta, C., Lee, B. K., Park, B. Y., Snyder, N. W., Schendel, D., Volk, H., et al. (2017). The changing epidemiology of autism spectrum disorders. Annual Review of Public Health, 38, 81–102.
Article Google Scholar
Marchi, E., Schuller, B., Baron-Cohen, S., Golan, O., Bölte, S., Arora, P., & Häb-Umbach, R. (2015). Typicality and emotion in the voice of children with autism spectrum condition: Evidence across three languages. In Interspeech. Doi 10.21437/Interspeech.2015-38.
Mohanta, A., & Mittal, V. K. (2022). Analysis and classification of speech sounds of children with autism spectrum disorder using acoustic features. Computer Speech & Language, 72, 101287.
Article Google Scholar
Radha, K. Children's speech recognition and speaker characterization through raw speech driven deep learning models.
Radha, K., & Bansal, M. (2022). Non-native children English speech (NNCES) corpus. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/4416485. https://www.kaggle.com/dsv/4416485
Radha, K., & Bansal, M. (2022). Audio augmentation for non-native children’s speech recognition through discriminative learning. Entropy, 24(10), 1490.
Article Google Scholar
Radha, K., & Bansal, M. (2023a). Automated detection and severity assessment of dysarthria using raw speech. In 2023 14th international conference on computing communication and networking technologies (ICCCNT), (pp. 1–7). https://doi.org/10.1109/ICCCNT56998.2023.10307923
Radha, K., & Bansal, M. (2023b). Feature fusion and ablation analysis in gender identification of preschool children from spontaneous speech. Circuits, Systems, and Signal Processing, 42(10), 6228–6252.
Article Google Scholar
Radha, K., & Bansal, M. (2023c). Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children. International Journal of Information Technology, 15(3), 1375–1385.
Article Google Scholar
Radha, K., Bansal, M. (2023d). Towards modeling raw speech in gender identification of children using SincNet over ERB scale. International Journal of Speech Technology, 26(3), 651–663.
Radha, K., Bansal, M., & Shabber, S. M. (2022). Accent classification of native and non-native children using harmonic pitch. In 2022 2nd international conference on artificial intelligence and signal processing (AISP), (pp. 1–6). IEEE.
Radha, K., Rao, D. V., Sai, K. V. K., Krishna, R. T., & Muneera, A. (2024). Detecting autism spectrum disorder from raw speech in children using STFT layered CNN model. In 2024 international conference on green energy, computing and sustainable technology (GECOST), (pp. 437–441). https://doi.org/10.1109/GECOST60902.2024.10474705
Radha, K., Bansal, M., & Dulipalla, V. R. (2024). Variable STFT layered CNN model for automated dysarthria detection and severity assessment using raw speech. Circuits, Systems, and Signal Processing, 43(5), 3261–3278.
Article Google Scholar
Radha, K., Bansal, M., & Pachori, R. B. (2024). Speech and speaker recognition using raw waveform modeling for adult and children’s speech: A comprehensive review. Engineering Applications of Artificial Intelligence, 131, 107661.
Article Google Scholar
Radha, K., Bansal, M., & Pachori, R. B. (2024). Automatic speaker and age identification of children from raw speech using sincnet over ERB scale. Speech Communication, 159, 103069.
Article Google Scholar
Rafiee, F., Rezvani Habibabadi, R., Motaghi, M., Yousem, D. M., & Yousem, I. J. (2022). Brain MRI in autism spectrum disorder: Narrative review and recent advances. Journal of Magnetic Resonance Imaging, 55(6), 1613–1624.
Article Google Scholar
Reddy, S. D., & Reddy, T. K. (2024). Delaunay triangulated simplicial complex generation for EEG signal classification. In IEEE sensors letters.
Reddy, S. D., & Reddy, T. K. (2024). GM-VRC: Semantic topological data ensemble approach for EEG signal classification. In 2024 IEEE international conference on acoustics, speech and signal processing (ICASSP 2024), (pp. 1971–1975). IEEE.
Reddy, S. D., Reddy, T. K., & Higashi, H. (2024). Chromatic alpha complex generation for EEG signal classification. In 2024 National Conference on Communications (NCC), (pp. 1–5). IEEE.
Reddy, S. D., Murugan, R., Nandi, A., & Goel, T. (2023). Classification of arrhythmia disease through electrocardiogram signals using sampling vector random forest classifier. Multimedia Tools and Applications, 82(17), 26797–26827.
Article Google Scholar
Rodier, P. M. (2000). The early origins of autism. Scientific American, 282(2), 56–63.
Article Google Scholar
Sajiha, S., Radha, K., Rao, D. V., Akhila, V., & Sneha, N. (2024). Dysarthria diagnosis and dysarthric speaker identification using raw speech model. In 2024 National Conference on Communications (NCC), (pp. 1–6). IEEE.
Sajiha, S., Radha, K., Venkata Rao, D., Sneha, N., Gunnam, S., & Bavirisetti, D. P. (2024). Automatic dysarthria detection and severity level assessment using CWT-layered CNN model. EURASIP Journal on Audio, Speech, and Music Processing, 2024(1), 33.
Article Google Scholar
Sanchack, K. E., & Thomas, C. A. (2016). Autism spectrum disorder: Primary care principles. American Family Physician, 94(12), 972–979.
Google Scholar
Schnell, K., & Lacroix, A. (2007). Time-varying pre-emphasis and inverse filtering of speech. In eighth annual conference of the International Speech Communication Association.
Shabber, S. M., Bansal, M., & Radha, K. (2023). A review and classification of amyotrophic lateral sclerosis with speech as a biomarker. In 2023 14th international conference on computing communication and networking technologies (ICCCNT), (pp. 1–7). IEEE.
Shabber, S. M., Bansal, M., & Radha, K. (2023). Machine learning-assisted diagnosis of speech disorders: a review of dysarthric speech. In 2023 international conference on electrical, electronics, communication and computers (ELEXCOM), (pp. 1–6). IEEE.
Suhas, B., Mallela, J., Illa, A., Yamini, B., Atchayaram, N., Yadav, R., Gope, D., & Ghosh, P. K. (2020). Speech task based automatic classification of ALS and parkinson’s disease and their severity using log mel spectrograms. In 2020 international conference on signal processing and communications (SPCOM), (pp. 1–5). IEEE.
SurveyLex. Retrieved Jan 01, 2022 from http://www.neurolex.co/uploads/voiceome/
Taylor, J. L., McPheeters, M. L., Sathe, N. A., Dove, D., Veenstra-VanderWeele, J., & Warren, Z. (2012). A systematic review of vocational interventions for young adults with autism spectrum disorders. Pediatrics, 130(3), 531–538.
Article Google Scholar
Travis, L. L., & Sigman, M. (1998). Social deficits and interpersonal relationships in autism. Mental Retardation and Developmental Disabilities Research Reviews, 4(2), 65–72.
Article Google Scholar
Vakadkar, K., Purkayastha, D., & Krishnan, D. (2021). Detection of autism spectrum disorder in children using machine learning techniques. SN Computer Science, 2, 1–9.
Article Google Scholar
Visser, J. C., Rommelse, N. N., Greven, C. U., & Buitelaar, J. K. (2016). Autism spectrum disorder and attention-deficit/hyperactivity disorder in early childhood: A review of unique and shared characteristics and developmental antecedents. Neuroscience & Biobehavioral Reviews, 65, 229–263.
Article Google Scholar
Volden, J., & Lord, C. (1991). Neologisms and idiosyncratic language in autistic speakers. Journal of Autism and Developmental Disorders, 21(2), 109–130.
Article Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the collection of healthy children datasets from Maddi Subba Rao English Medium High School, Vijayawada, under the supervision of the Principal Mrs. Potnuri Syamala. Additionally, the collection of ASD children datasets was facilitated by Home Occupational Therapy Services, Bharathi Nagar, Vijayawada, under the direction of Dr. Sushil Kumar, and Autism Child Guidance Center, Vijayawada, under the guidance of Director Mr. Praveen Kumar. Their cooperation and support were invaluable to the completion of this research project.

Funding

There was no external funding for this research.

Author information

Authors and Affiliations

Department of Electronics & Communication Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru, Vijayawada, Andhra Pradesh, 520007, India
Kurma Venkata Keerthana Sai, Rompicharla Thanmayee Krishna, Kodali Radha, Dhulipalla Venkata Rao & Abdul Muneera
Division of Pediatric Neurology, Department of Pediatrics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
Kodali Radha

Authors

Kurma Venkata Keerthana Sai
View author publications
You can also search for this author in PubMed Google Scholar
Rompicharla Thanmayee Krishna
View author publications
You can also search for this author in PubMed Google Scholar
Kodali Radha
View author publications
You can also search for this author in PubMed Google Scholar
Dhulipalla Venkata Rao
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Muneera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kodali Radha.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sai, K.V.K., Krishna, R.T., Radha, K. et al. Automated ASD detection in children from raw speech using customized STFT-CNN model. Int J Speech Technol 27, 701–716 (2024). https://doi.org/10.1007/s10772-024-10131-7

Download citation

Received: 26 May 2024
Accepted: 15 July 2024
Published: 26 July 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s10772-024-10131-7

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid deep transfer learning-based early diagnosis of autism spectrum disorder using scalogram representation of electroencephalography signals

Sch-net: a deep learning architecture for automatic detection of schizophrenia

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Automated ASD detection in children from raw speech using customized STFT-CNN model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid deep transfer learning-based early diagnosis of autism spectrum disorder using scalogram representation of electroencephalography signals

Sch-net: a deep learning architecture for automatic detection of schizophrenia

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Explore related subjects

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now