More Web Proxy on the site http://driver.im/

short-paper

Speech-based Diagnosis of Autism Spectrum Condition by Generative Adversarial Network Representations

Authors:

Nicholas Cummins,

Maximilian Schmitt,

Fabien Ringeval,

Björn SchullerAuthors Info & Claims

DH '17: Proceedings of the 2017 International Conference on Digital Health

Pages 53 - 57

https://doi.org/10.1145/3079452.3079492

Published: 02 July 2017 Publication History

Abstract

Machine learning paradigms based on child vocalisations show great promise as an objective marker of developmental disorders such as Autism. In conventional detection systems, hand-crafted acoustic features are usually fed into a discriminative classifier (e.g, Support Vector Machines); however it is well known that the accuracy and robustness of such a system is limited by the size of the associated training data. This paper explores, for the first time, the use of feature representations learnt using a deep Generative Adversarial Network (GAN) for classifying children's speech affected by developmental disorders. A comparative evaluation of our proposed system with different acoustic feature sets is performed on the Child Pathological and Emotional Speech database. Key experimental results presented demonstrate that GAN based methods exhibit competitive performance with the conventional paradigms in terms of the unweighted average recall metric.

References

[1]

American Psychiatric Association. Diagnostic and statistical manual of mental disorders. Washington, D.C., 4th edition, 2000.

[2]

Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798--1828, 2013.

Digital Library

[3]

D. Bone, T. Chaspari, K. Audkhasi, J. Gibson, A. Tsiartas, M. V. Segbroeck, M. Li, S. Lee, and S. Narayanan. Classifying language-related developmental disorders from speech cues: the promise and the potential confounds. In ISCA, editor, Proceedings of INTERSPEECH, pages 182--186, Lyon, France, 2013.

[4]

D. Bone, C.-C. Lee, M. Black, M. Williams, S. Lee, P. Levitt, and S. Narayanan. The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody. Journal of Speech, Language, and Hearing Research, 57(4):1162--1177, 2014.

[5]

M. Carpenter, M. Tomasello, and T. Striano. Role reversal imitation and language in typically developing infants and children with autism. Infancy, 8(3):253--278, 2005.

[6]

N. Davis and A. Carter. Parenting stress in mothers and fathers of toddlers with autism spectrum disorders: Associations with child characteristics. Journal of Autism and Developmental Disorders, 38(7):1278--1291, 2008.

[7]

J. Deng, Z. Zhang, F. Eyben, and B. Schuller. Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Processing Letters, 21(9):1068--1072, 2014.

[8]

J. Deng, Z. Zhang, E. Marchi, and B. Schuller. Sparse autoencoder-based feature transfer learning for speech emotion recognition. In Proceedings 5th International Conference on Affective Computing and Intelligent Interaction, pages 511--516, Geneva, Switzerland, 2013.

Digital Library

[9]

F. Eyben, K. Scherer, B. Schuller, J. Sundberg, E. André, C. Busso, L. Devillers, J. Epps, P. Laukka, S. Narayanan, and K. Truong. The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2):190--202, 2016.

Digital Library

[10]

F. Eyben, F. Weninger, F. Groß, and B. Schuller. Recent developments in openSMILE, the munich open-source multimedia feature extractor. In Proceedings 21st ACM International Conference on Multimedia, pages 835--838, Barcelona, Spain, 2013. ACM.

Digital Library

[11]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Courville, A.and Bengio. Generative adversarial nets. In Proceedings Neural Information Processing Systems, pages 2672--2680, Montreal, QC, Canada, 2014.

Digital Library

[12]

L. Kanner. Autistic disturbances of affective contact. The nervous child, 2:217--250, 1943.

[13]

M. Kjelgaard and H. Tager-Flusberg. An investigation of language impairment in autism: Implications for genetic subgroups. Language and Cognitive Processes, 16(2--3):287--308, 2001.

[14]

M. Kjelgaard and H. Tager-Flusberg. Update on the language disorders of individuals on the autistic spectrum. Brain & Development, 25(3):166--172, 2003.

[15]

A. Le Couteur, G. Haden, D. Hammal, and H. McConachie. Diagnosing autism spectrum disorders in pre-school children using two standardised assessment instruments: The ADI-R and the ADOS. Journal of Autism and Developmental Disorders, 38(2):362--372, 2008.

[16]

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521:436--444, 2015.

[17]

R. Lotfian and C. Busso. Emotion recognition using synthetic speech as neutral reference. In Proceedings 40th IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4759--4763, Brisbane, QLD, Australia, 2015.

[18]

E. Marchi, B. Schuller, S. Baron-Cohen, O. Golan, S. Bölte, P. Arora, and R. H\"ab-Umbach. Typicality and emotion in the voice of children with autism spectrum condition: Evidence across three languages. In Proceedings of INTERSPEECH, pages 115--119, Dresden, Germany, 2015. ISCA.

[19]

E. Marchi, Y. Zhang, F. Eyben, F. Ringeval, and B. Schuller. Autism and speech, language, and emotion -- a survey. In H. Patil and M. Kulshreshtha, editors, Evaluating the role of speech technology in medical case management. De Gruyter, Berlin, Germany, 2015.

[20]

E. Mower, M. Black, E. Flores, M. Williams, and S. Narayanan. Rachel: Design of an emotionally targeted interactive agent for children with autism. In Proceedings IEEE International Conference on Multimedia and Expo, pages 1--6, Barcelona, Spain, 2011.

Digital Library

[21]

D. Oller, P. Niyogi, S. Gray, J. Richards, J. Gilkerson, D. Xu, U. Yapanel, and S. Warren. Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. Proceedings of the National Academy of Sciences, 107(30):13354--13359, 2010.

[22]

S. Pascual, A. Bonafonte, and J. Serrà. SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452, 2017.

[23]

A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.

[24]

F. Ringeval, J. Demouy, G. Szaszák, M. Chetouani, L. Robel, J. Xavier, D. Cohen, and M. Plaza. Automatic intonation recognition for prosodic assessment of language impaired children. IEEE Transactions on Audio, Speech & Language Processing, 19(5):1328--1342, 2011.

Digital Library

[25]

F. Ringeval, E. Marchi, C. Grossard, J. Xavier, M. Chetouani, D. Cohen, and B. Schuller. Automatic analysis of typical and atypical encoding of spontaneous emotion in the voice of children. In Proceedings of INTERSPEECH, pages 1210--1214, San Francisco, CA, US, 2016. ISCA.

[26]

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training GANs. In Proceedings Neural Information Processing Systems, pages 2226--2234, Barcelona, Spain, 2016.

[27]

M. Schmitt, C. Janott, V. Pandit, K. Qian, C. Heiser, W. Hemmert, and B. Schuller. A bag-of-audio-words approach for snore sounds' excitation localisation. In Proceedings 14th ITG Conference on Speech Communication, volume 267 of ITG-Fachbericht, pages 230--234, Paderborn, Germany, 2016. ITG/VDE, IEEE/VDE.

[28]

M. Schmitt, E. Marchi, F. Ringeval, and B. Schuller. Towards cross-lingual automatic diagnosis of autism spectrum condition in children's voices. In Proceedings 14th ITG Conference on Speech Communication, volume 267 of ITG-Fachbericht, pages 264--268, Paderborn, Germany, 2016. ITG/VDE, IEEE/VDE.

[29]

M. Schmitt, F. Ringeval, and B. Schuller. At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech. In Proceedings of INTERSPEECH, pages 495--499, San Francisco, CA, US, 2016. ISCA.

[30]

B. Schuller and F. Burkhardt. Learning with synthesized speech for automatic emotion recognition. In Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 5150--515, Dallas, TX, US, 2010.

[31]

B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. Narayanan. Paralinguistics in speech and language -- state-of-the-art and the challenge. Computer Speech and Language, Special Issue on Paralinguistics in Naturalistic Speech and Language, 27(1):4--39, 2013.

Digital Library

[32]

B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, H. Salamin, A. Polychroniou, F. Valente, and S. Kim. The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In Proceedings of INTERSPEECH, pages 148--152, Lyon, France, 2013. ISCA.

[33]

D. Serdyuk, K. Audhkhasi, P. Brakel, B. Ramabhadran, S. Thomas, and Y. Bengio. Invariant representations for noisy speech recognition. CoRR, abs/1612.01928, 2016.

[34]

B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified activations in convolutional network. CoRR, abs/1505.00853, 2015.

Cited By

Lee CChaspari TProvost ENarayanan S(2023)An Engineering View on Emotions and Speech: From Analysis and Predictive Models to Responsible Human-Centered ApplicationsProceedings of the IEEE10.1109/JPROC.2023.3276209111:10(1142-1158)Online publication date: Oct-2023
https://doi.org/10.1109/JPROC.2023.3276209
Sahu SGupta REspy-Wilson C(2022)Modeling Feature Representations for Affective Speech Using Generative Adversarial NetworksIEEE Transactions on Affective Computing10.1109/TAFFC.2020.299811813:2(1098-1110)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TAFFC.2020.2998118
Li TZhao HHuang JLi K(2022)Cross-domain image translation with a novel style-guided diversity loss designKnowledge-Based Systems10.1016/j.knosys.2022.109731255(109731)Online publication date: Nov-2022
https://doi.org/10.1016/j.knosys.2022.109731
Show More Cited By

Index Terms

Speech-based Diagnosis of Autism Spectrum Condition by Generative Adversarial Network Representations
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

CapsuleGAN: Generative Adversarial Capsule Network
Computer Vision – ECCV 2018 Workshops
Abstract
We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) ...
Data augmentation using generative adversarial networks for robust speech recognition
Highlights
- This paper utilizes three different GANs for data augmentation to improve speech recognition under noise conditions.
Abstract
For noise robust speech recognition, data mismatch between training and testing is a significant challenge. Data augmentation is an effective way to enlarge the size and diversity of training data and solve this problem. Different from ...
Perception-guided generative adversarial network for end-to-end speech enhancement
Abstract
Single channel speech enhancement has reached a great progress recently with the development of deep learning. However, it is still a challenging problem to achieve promising performance on unseen noisy conditions. The introduction of generative ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

DH '17: Proceedings of the 2017 International Conference on Digital Health

July 2017

256 pages

ISBN:9781450352499

DOI:10.1145/3079452

General Chair:
Patty Kostkova
University College London
,
Program Chairs:
Floriana Grasso
University of Liverpool
,
Carlos Castillo
Eurecat
,
Yelena Mejova
QCRI
,
Arnold Bosman
Transmissible

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

DH '17

DH '17: International Conference on Digital Health

July 2 - 5, 2017

London, United Kingdom

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
475
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee CChaspari TProvost ENarayanan S(2023)An Engineering View on Emotions and Speech: From Analysis and Predictive Models to Responsible Human-Centered ApplicationsProceedings of the IEEE10.1109/JPROC.2023.3276209111:10(1142-1158)Online publication date: Oct-2023
https://doi.org/10.1109/JPROC.2023.3276209
Sahu SGupta REspy-Wilson C(2022)Modeling Feature Representations for Affective Speech Using Generative Adversarial NetworksIEEE Transactions on Affective Computing10.1109/TAFFC.2020.299811813:2(1098-1110)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TAFFC.2020.2998118
Li TZhao HHuang JLi K(2022)Cross-domain image translation with a novel style-guided diversity loss designKnowledge-Based Systems10.1016/j.knosys.2022.109731255(109731)Online publication date: Nov-2022
https://doi.org/10.1016/j.knosys.2022.109731
Karpagam CRohini S(2022)Autism Detection Using Machine Learning Approach: A ReviewMachine Intelligence and Smart Systems10.1007/978-981-16-9650-3_14(179-197)Online publication date: 24-May-2022
https://doi.org/10.1007/978-981-16-9650-3_14
Kalikar SSinha ASrivastava SAggarwal G(2022)Early Detection of Autism Spectrum Disorder (ASD) Using Machine Learning Techniques: A ReviewProceedings of Third International Conference on Communication, Computing and Electronics Systems10.1007/978-981-16-8862-1_66(1015-1027)Online publication date: 20-Mar-2022
https://doi.org/10.1007/978-981-16-8862-1_66
Cummins NSchuller B(2022)Latest Advances in Computational Speech Analysis for Mobile SensingDigital Phenotyping and Mobile Sensing10.1007/978-3-030-98546-2_12(209-228)Online publication date: 23-Jul-2022
https://doi.org/10.1007/978-3-030-98546-2_12
Jayasree TShia S(2021)Combined Signal Processing Based Techniques and Feed Forward Neural Networks for Pathological Voice Detection and ClassificationSound&Vibration10.32604/sv.2021.01173455:2(141-161)Online publication date: 2021
https://doi.org/10.32604/sv.2021.011734
Kruyt JBeňuš Š(2021)Prosodic entrainment in individuals with autism spectrum disorderTopics in Linguistics10.2478/topling-2021-001022:2(47-61)Online publication date: 30-Dec-2021
https://doi.org/10.2478/topling-2021-0010
Latif SQadir JQayyum AUsama MYounis S(2021)Speech Technology for Healthcare: Opportunities, Challenges, and State of the ArtIEEE Reviews in Biomedical Engineering10.1109/RBME.2020.300686014(342-356)Online publication date: 2021
https://doi.org/10.1109/RBME.2020.3006860
Alizadeh MTabibian S(2021)A Persian speaker-independent dataset to diagnose autism infected children based on speech processing techniques2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)10.1109/ICSPIS54653.2021.9729345(01-05)Online publication date: 29-Dec-2021
https://doi.org/10.1109/ICSPIS54653.2021.9729345
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents