More Web Proxy on the site http://driver.im/

Article

Explainable Stuttering Recognition Using Axial Attention

Authors:

Björn W. Schuller,

Yoshiharu YamamotoAuthors Info & Claims

Advanced Intelligent Computing Technology and Applications: 19th International Conference, ICIC 2023, Zhengzhou, China, August 10–13, 2023, Proceedings, Part III

Pages 209 - 220

https://doi.org/10.1007/978-981-99-4749-2_18

Published: 10 August 2023 Publication History

Abstract

Stuttering is a complex speech disorder that disrupts the flow of speech, and recognizing persons who stutter (PWS) and understanding their significant struggles is crucial. With advancements in computer vision, deep neural networks offer potential for recognizing stuttering events through image-based features. In this paper, we extract image features of Wavelet Transformation (WT) and Histograms of Oriented Gradient (HOG) from audio signals. We also generate explainable images using Gradient-weighted Class Activation Mapping (Grad-CAM) as input for our final recognition model–an axial attention-based EfficientNetV2, which is trained on the Kassel State of Fluency Dataset (KSoF) to perform 8 classes recognition. Our experimental results achieved a relative percentage increase in unweighted average recall (UAR) of 4.4% compared to the baseline of ComParE 2022, demonstrating that the axial attention-based EfficientNetV2, combined with the explainable input, has the capability to detect and recognise multiple types of stuttering.

References

[1]

Hu B, Shen J, Zhu L, Dong Q, Cai H, and Qian K Fundamentals of computational psychophysiology: theory and methodology IEEE Trans. Comput. Soc. Syst. 2022 9 2 349-355

[2]

Shen J, Zhang X, Hu B, Wang G, Ding Z, and Hu B An improved empirical mode decomposition of electroencephalogram signals for depression detection IEEE Trans. Affect. Comput. 2022 13 1 262-271

[3]

Zhang, X., Shen, J., ud Din, Z., Liu, J., Wang, G., Hu, B.: Multimodal depression detection: fusion of electroencephalography and paralinguistic behaviors using a novel strategy for classifier ensemble. IEEE J. Biomed. Health Inform. 23(6), 2265–2275 (2019)

[4]

Banerjee N, Borah S, and Sethi N Intelligent stuttering speech recognition: a succinct review Multimed. Tools Appl. 2022 81 1-22

[5]

Lickley, R.: Disfluency in typical and stuttered speech. Fattori Sociali E Biologici Nella Variazione Fonetica-Social and Biological Factors in Speech Variation (2017)

[6]

Junuzovic-Zunic L, Sinanovic O, and Majic B Neurogenic stuttering: etiology, symptomatology, and treatment Med. Arch. 2021 75 6 456

[7]

Catalano, G., Robben, D.L., Catalano, M.C., Kahn, D.A.: Olanzapine for the treatment of acquired neurogenic stuttering. J. Psychiatr. Pract.® 15(6), 484–488 (2009)

[8]

Oue, S., Marxer, R., Rudzicz, F.: Automatic dysfluency detection in dysarthric speech using deep belief networks. In: Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, pp. 60–64 (2015)

[9]

Sheikh, S.A., Sahidullah, M., Hirsch, F., Ouni, S.: StutterNet: stuttering detection using time delay neural network. In: 29th European Signal Processing Conference (EUSIPCO), pp. 426–430 (2021)

[10]

Qian K et al. A bag of wavelet features for snore sound classification Ann. Biomed. Eng. 2019 47 4 1000-1011

[11]

Qian K, Zhang Z, Yamamoto Y, and Schuller BW Artificial intelligence Internet of Things for the elderly: from assisted living to health-care monitoring IEEE Signal Process. Mag. 2021 38 4 78-88

[12]

Qian K et al. Computer audition for healthcare: opportunities and challenges Front. Digit. Health 2020 2 5

[13]

Shen, J., Zhao, S., Yao, Y., Wang, Y., Feng, L.: A novel depression detection method based on pervasive EEG and EEG splitting criterion. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1879–1886. IEEE (2017)

[14]

Shen J et al. An optimal channel selection for EEG-based depression detection via kernel-target alignment IEEE J. Biomed. Health Inform. 2020 25 7 2545-2556

[15]

Yang M, Ma Y, Liu Z, Cai H, Hu X, and Hu B Undisturbed mental state assessment in the 5G era: a case study of depression detection based on facial expressions IEEE Wirel. Commun. 2021 28 3 46-53

[16]

Zhang K et al. Research on mine vehicle tracking and detection technology based on YOLOv5 Syst. Sci. Control Eng. 2022 10 1 347-366

[17]

Shen J et al. Exploring the intrinsic features of EEG signals via empirical mode decomposition for depression recognition IEEE Trans. Neural Syst. Rehabil. Eng. 2022 31 356-365

[18]

Shen J et al. Depression recognition from EEG signals using an adaptive channel fusion method via improved focal loss IEEE J. Biomed. Health Inform. 2023 27 3234-3245

[19]

Rosenberg J et al. Conflict processing networks: a directional analysis of stimulus-response compatibilities using MEG PLoS ONE 2021 16 2 e0247408

[20]

Dong Q et al. Integrating convolutional neural networks and multi-task dictionary learning for cognitive decline prediction with longitudinal images J. Alzheimer’s Dis. 2020 75 3 971-992

[21]

Wu Y et al. Person reidentification by multiscale feature representation learning with random batch feature mask IEEE Trans. Cogn. Dev. Syst. 2020 13 4 865-874

[22]

Demir, F., Sengur, A., Cummins, N., Amiriparian, S., Schuller, B.W.: Low level texture features for snore sound discrimination. In: 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 413–416 (2018)

[23]

Barrett L, Hu J, and Howell P Systematic review of machine learning approaches for detecting developmental stuttering IEEE/ACM Trans. Audio Speech Lang. Process. 2022 30 1160-1172

[24]

Howell, P., Sackin, S.: Automatic recognition of repetitions and prolongations in stuttered speech. In: Proceedings of the First World Congress on Fluency Disorders, vol. 2, pp. 372–374. University Press Nijmegen Nijmegen, The Netherlands (1995)

[25]

Gupta S, Shukla RS, Shukla RK, and Verma R Deep learning bidirectional LSTM based detection of prolongation and repetition in stuttered speech using weighted MFCC Int. J. Adv. Comput. Sci. Appl. 2020 11 9 1-12

[26]

Świetlicka I, Kuniszyk-Jóźkowiak W, and Smołka E Artificial neural networks in the disabled speech analysis Comput. Recogn. Syst. 2009 3 347-354

[27]

Ravikumar KM, Rajagopal R, and Nagaraj H An approach for objective assessment of stuttered speech using MFCC features ICGST Int. J. Digit. Signal Process. 2009 9 1 19-24

[28]

Chee, L.S., Ai, O.C., Hariharan, M., Yaacob, S.: MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA. In: 2009 IEEE Student Conference on Research and Development (SCOReD), pp. 146–149. IEEE (2009)

[29]

Ai OC, Hariharan M, Yaacob S, and Chee LS Classification of speech dysfluencies with MFCC and LPCC features Expert Syst. Appl. 2012 39 2 2157-2165

[30]

Mahesha P and Vinod D Support vector machine-based stuttering dysfluency classification using gmm supervectors Int. J. Grid Util. Comput. 2015 6 3–4 143-149

[31]

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)

[32]

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: MobilenetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018)

[33]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

[34]

Xu H, Ma J, Jiang J, Guo X, and Ling H U2Fusion: a unified unsupervised image fusion network IEEE Trans. Pattern Anal. Mach. Intell. 2020 44 1 502-518

[35]

Tan, M., Le, Q.: EfficientnetV2: smaller models and faster training. In: International Conference on Machine Learning (ICML), pp. 10096–10106 (2021)

[36]

Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019)

[37]

Bayerl, S.P., von Gudenberg, A.W., Hönig, F., Nöth, E., Riedhammer, K.: KSoF: the Kassel state of fluency dataset–a therapy centered dataset of stuttering. arXiv preprint arXiv:2203.05383 (2022)

[38]

Schuller, B.W., et al.: The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes, pp. 1–5. arXiv Preprint arXiv:2205.06799 (2022)

[39]

McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference, vol. 8, pp. 18–25 (2015)

[40]

Hunter JD Matplotlib: a 2D graphics environment Comput. Sci. Eng. 2007 9 03 90-95

Recommendations

Psycho-acoustics inspired automatic speech recognition
Abstract
Understanding the human spoken language recognition process is still a far scientific goal. Nowadays, commercial automatic speech recognisers (ASRs) achieve high performance at recognising clean speech, but their approaches are poorly ...
Highlights
- We propose a novel Automatic Speech Recognizer inspired by psycho-acoustic studies.
Text dependant speaker recognition using MFCC, LPC and DWT

The objective of this work is to investigate the benefit of discrete wavelet transform combined with LPC, for speaker identification system applied for Algerian Berber language, compared to the traditional Mel frequency analysis. We've developed a ...
Effects of Speaking Rate on Speech and Silent Speech Recognition
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems

Speaking rate or the speed at which a person speaks is a fundamental user characteristic. This work investigates the rate in which users speak when interacting with speech and silent speech-based methods. Results revealed that native users speak about ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Advanced Intelligent Computing Technology and Applications: 19th International Conference, ICIC 2023, Zhengzhou, China, August 10–13, 2023, Proceedings, Part III

Aug 2023

834 pages

ISBN:978-981-99-4748-5

DOI:10.1007/978-981-99-4749-2

Editors:
De-Shuang Huang
Department of Computer Science, Eastern Institute of Technology, Zhejiang, China
,
Prashan Premaratne
University of Wollongong, North Wollongong, NSW, Australia
,
Baohua Jin
Zhengzhou University of Light Industry, Zhengzhou, China
,
Boyang Qu
Zhong Yuan University of Technology, Zhengzhou, China
,
Kang-Hyun Jo
University of Ulsan, Ulsan, Korea (Republic of)
,
Abir Hussain
Department of Computer Science, Liverpool John Moores University, Liverpool, UK

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 August 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents