[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3382507.3418875acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Detecting Depression in Less Than 10 Seconds: Impact of Speaking Time on Depression Detection Sensitivity

Published: 22 October 2020 Publication History

Abstract

This article investigates whether it is possible to detect depression using less than 10 seconds of speech. The experiments have involved 59 participants (including 29 that have been diagnosed with depression by a professional psychiatrist) and are based on a multimodal approach that jointly models linguistic (what people say) and acoustic (how people say it) aspects of speech using four different strategies for the fusion of multiple data streams. On average, every interview has lasted for 242.2 seconds, but the results show that 10 seconds or less are sufficient to achieve the same level of recall (roughly 70%) observed after using the entire inteview of every participant. In other words, it is possible to maintain the same level of sensitivity (the name of recall in clinical settings) while reducing by 95%, on average, the amount of time requireed to collect the necessary data.

Supplementary Material

MP4 File (3382507.3418875.mp4)
This article investigates whether it is possible to detect depression using less than 10 seconds of speech. The experiments have involved 59 participants (including 29 that have been diagnosed with depression by a professional psychiatrist) and are based on a multimodal approach that jointly models linguistic (what people say) and acoustic (how people say it) aspects of speech using four different strategies for the\r\nfusion of multiple data streams. On average, every interview has lasted for 242.2 seconds, but the results show that 10 seconds or less are sufficient to achieve the same level of recall (roughly 70%) observed after using the entire interview of every participant. In other words, it is possible to maintain the same level of sensitivity (the name of recall in clinical settings) while reducing by 95%, on average, the amount of\r\ntime required to collect the necessary data.

References

[1]
T. Al Hanai, M.M. Ghassemi, and J.R. Glass. 2018. Detecting Depression with Audio/Text Sequence Modeling of Interviews. In Proceedings of Interspeech. 1716--1720.
[2]
M. Al Jazaery and G. Guo. 2019. Video-based depression level analysis by encoding deep spatiotemporal features. IEEE Transactions on Affective Computing (to appear) (2019).
[3]
S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Hyett, G. Parker, and M. Breakspear. 2018. Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors. IEEE Transactions on Affective Computing 9, 4 (2018), 478--490.
[4]
S. Alghowinem, R. Goecke, M. Wagner, G. Parkerx, and M. Breakspear. 2013. Head pose and movement analysis as an indicator of depression. In Proceedings of the IEEE International Conference on Affective Computing and Intelligent Interaction. 283--288.
[5]
L. Andrade, J.J. Caraveo-Anduaga, P. Berglund, R.V. Bijl, R. De Graaf, W. Vollebergh, E. Dragomirecka, R. Kohn, M. Keller, Detecting Depression in Less Than 10 Seconds ICMI '20, October 25--29, 2020, Virtual event, Netherlands R.C. Kessler, N. Kawakami, C. Kili¸ c, D. Offord, T. Bedirhan Ustun, and H.-U. Wittchen. 2003. The epidemiology of major depressive episodes: Results from the International Consortium of Psychiatric Epidemiology (ICPE) Surveys. International Journal of Methods in Psychiatric Research 12, 1 (2003), 3--21.
[6]
J. Arevalo, T. Solorio, M. Montes-y G/?omez, and F.A. Gonz´ alez. 2017. Gated multimodal units for information fusion. arXiv:1702.01992. arXiv.
[7]
T. Baltru? saitis, C. Ahuja, and L.-P. Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 423--443.
[8]
A.T. Beck and B.A. Alford. 2009. Depression: Causes and Treatment. University of Pennsylvania Press.
[9]
E.H. Bos, A.L. Bouhuys, E. Geerts, T.W.D.P. Van Os, and J. Ormel. 2006. Lack of association between conversation partners? nonverbal behavior predicts recurrence of depression, independently of personality. Psychiatry Research 142, 1 (2006), 79--88.
[10]
H. Cai, X. Zhang, Y. Zhang, Z. Wang, and B. Hu. 2019. A casebased reasoning model for depression based on three-electrode EEG data. IEEE Transactions on Affective Computing (to appear) (2019).
[11]
E. Charniak. 2018. Introduction to Deep Learning. MIT Press.
[12]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P.P. Kuksa. 2011. Natural Language Processing (almost) from Scratch. CoRR abs/1103.0398 (2011). http://arxiv.org/abs/1103.0398
[13]
N. Cummins, V. Sethu, J. Epps, S. Schnieder, and J. Krajewski. 2015. Analysis of acoustic space variability in speech affected by depression. Speech Communication 75 (2015), 27--49.
[14]
N. Cummins, V. Sethu, J. Epps, J.R. Williamson, T.F. Quatieri, and J. Krajewski. 2019. Generalized Two-Stage Rank Regression Framework for Depression Score Prediction from Speech. IEEE Transactions on Affective Computing (to appear) (2019).
[15]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[16]
J. Garber, C.M. Gallerani, and S. A. Frankel. 2009. Depression in children. In Depression in children, I.H. Gotlib and C.L. Hammen (Eds.). The Guilford Press, 405--443.
[17]
E. Geerts, N. Bouhuys, and R.H. Van den Hoofdakker. 1996. Nonverbal attunement between depressed patients and an interviewer predicts subsequent improvement. Journal of Affective Disorders 40, 1--2 (1996), 15--21.
[18]
S. Gilbody, D. Richards, S. Brealey, and C. Hewitt. 2007. Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): A diagnostic meta-analysis. Journal of General Internal Medicine 22, 11 (2007), 1596--1602.
[19]
J.M. Girard, J.F. Cohn, M.H. Mahoor, S. Mavadati, and D.P. Rosenwald. 2013. Social risk and depression: Evidence from manual and automatic facial expression analysis. In Proceedings of the IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. 1--8.
[20]
A. Graves. 2012. Supervised Sequence Labelling with Recurrent Neural Networks. Springer Verlag.
[21]
Z. Huang, J. Epps, D. Joachim, and M. Chen. 2018. Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions. In Proceedings of Interspeech. 3393--3397.
[22]
C. Irons. 2014. Depression. Palgrave.
[23]
J. Joshi, R. Goecke, G. Parker, and M. Breakspear. 2013. Can body expressions contribute to automatic depression analysis? In Proceedings of the IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 1--7.
[24]
R.C. Kessler, P. Berglund, O. Demler, R. Jin, D. Koretz, K.R. Merikangas, A.J. Rush, E.E. Walters, and P.S. Wang. 2003. The epidemiology of major depressive disorder: Results from the National Comorbidity Survey Replication (NCS-R). Journal of the American Medical Association 289, 23 (2003), 3095--3105.
[25]
R.C. Kessler and E.E. Walters. 1998. Epidemiology of DSM-III-R major depression and minor depression among adolescents and young adults in the national comorbidity survey. Depression and Anxiety 7, 1 (1998), 3--14.
[26]
D.P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[27]
J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 3 (1998), 226--239.
[28]
L. A. Low, N. C. Maddage, M. Lech, L.B. Sheeber, and N.B. Allen. 2011. Detection of Clinical Depression in Adolescents? Speech During Family Interactions. IEEE Transactions on Biomedical Engineering 58, 3 (2011), 574--586.
[29]
F.A. McDougall, F.E. Matthews, K. Kvaal, M.E. Dewey, and C. Brayne. 2007. Prevalence and symptomatology of depression in older people living in institutions in England and Wales. Age and Ageing 36, 5 (2007), 562--568.
[30]
M.R. Morales and R. Levitan. 2016. Speech vs. text: A comparative analysis of features for depression detection systems. In proceedings of the IEEE Spoken Language Technology Workshop. 136--143.
[31]
T. Nguyen, D. Phung, B. Dao, S. Venkatesh, and M. Berk. 2014. Affective and content analysis of online depression communities. IEEE Transactions on Affective Computing 5, 3 (2014), 217-- 226.
[32]
R. Pascanu, T. Mikolov, and Y. Bengio. 2012. Understanding the exploding gradient problem. CoRR, abs/1211.5063 2 (2012), 417.
[33]
E. Rejaibi, A. Komaty, F. Meriaudeau, S. Agrebi, and A. Othmani. 2019. MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech. arXiv:1909.07208
[34]
F. Ringeval, B. Schuller, M. Valstar, J. Gratch, R. Cowie, S. Scherer, S. Mozgai, N. Cummins, M. Schmitt, and M. Pantic. 2017. AVEC 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the International Workshop on Audio/Visual Emotion Challenge. 3--9.
[35]
M. Rohanian, J. Hough, and M. Purver. 2019. Detecting Depression with Word-Level Multimodal Fusion. In Proceedings of Interspeech. 1443--1447.
[36]
B. Schuller and A. Batliner. 2013. Computational paralinguistics: emotion, affect and personality in speech and language processing. John Wiley & Sons.
[37]
M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. Torres Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic. 2016. AVEC 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the International Workshop on Audio/Visual Emotion Challenge. 3--10.
[38]
M. Valstar, B. Schuller, J. Krajewski, J. Cohn, R. Cowie, and M. Pantic. 2014. AVEC 2014 -- The Three Dimensional Affect and Depression Challenge. In Proceedings of the ACM International Workshop on Audio/Visual Emotion Challenge. 1--9.
[39]
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic. 2013. AVEC 2013: The continuous audio/visual emotion and depression recognition challenge. In Proceedings of the ACM International Workshop on Audio/visual Emotion Challenge. 3--10.
[40]
J.C. Wakefield and S. Demazeux (Eds.). 2015. Sadness Or Depression?: International Perspectives on the Depression Epidemic and Its Meaning. Springer Verlag.
[41]
WHO Document Production Services. 2017. Depression and other common mental disorders. Technical Report. World Health Organization.
[42]
J.R. Williamson, E. Godoy, M. Cha, A. Schwarzentruber, P. Khorrami, Y. Gwon, H.-T. Kung, C. Dagli, and T.F. Quatieri. 2016. Detecting Depression Using Vocal, Facial and Semantic Communication Cues. In Proceedings of the International Workshop on Audio/Visual Emotion Challenge. 11--18.
[43]
L. Yang, D. Jiang, and H. Sahli. 2019. Integrating Deep and Shallow Models for Multi-Modal Depression Analysis?Hybrid Architectures. IEEE Transactions on Affective Computing (to appear) (2019).
[44]
Y. Yang, C. Fairbairn, and J.F. Cohn. 2012. Detecting depression severity from vocal prosody. IEEE Transactions on Affective Computing 4, 2 (2012), 142--150.
[45]
X. Zhou, K. Jin, Y. Shang, and G. Guo. 2019. Visually interpretable representation learning for depression recognition from facial images. IEEE Transactions on Affective Computing (to appear) (2019).
[46]
Y. Zhu, Y. Shang, Z. Shao, and G. Guo. 2017. Automated depression diagnosis based on deep networks to encode facial appearance and dynamics. IEEE Transactions on Affective Computing 9, 4 (2017), 578--584.

Cited By

View all
  • (2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
  • (2024)KWHO-CNN: A Hybrid Metaheuristic Algorithm Based Optimzed Attention-Driven CNN for Automatic Clinical Depression RecognitionInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.35910:3Online publication date: 27-Sep-2024
  • (2024)On the effects of obfuscating speaker attributes in privacy-aware depression detectionPattern Recognition Letters10.1016/j.patrec.2024.10.016186(300-305)Online publication date: Oct-2024
  • Show More Cited By

Index Terms

  1. Detecting Depression in Less Than 10 Seconds: Impact of Speaking Time on Depression Detection Sensitivity

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
    October 2020
    920 pages
    ISBN:9781450375818
    DOI:10.1145/3382507
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. computational paralinguistics
    2. depression detection
    3. social signal processing
    4. speech, language

    Qualifiers

    • Research-article

    Funding Sources

    • UKRI
    • EPSRC
    • V:ALERE 2019

    Conference

    ICMI '20
    Sponsor:
    ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
    October 25 - 29, 2020
    Virtual Event, Netherlands

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)56
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
    • (2024)KWHO-CNN: A Hybrid Metaheuristic Algorithm Based Optimzed Attention-Driven CNN for Automatic Clinical Depression RecognitionInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.35910:3Online publication date: 27-Sep-2024
    • (2024)On the effects of obfuscating speaker attributes in privacy-aware depression detectionPattern Recognition Letters10.1016/j.patrec.2024.10.016186(300-305)Online publication date: Oct-2024
    • (2024)Cross-Cultural Automatic Depression Detection Based on Audio SignalsSpeech and Computer10.1007/978-3-031-77961-9_23(309-323)Online publication date: 22-Nov-2024
    • (2022)Depressed Mood Prediction of Elderly People with a Wearable BandSensors10.3390/s2211417422:11(4174)Online publication date: 31-May-2022
    • (2022)Multimodal Depression Classification using Articulatory Coordination Features and Hierarchical Attention Based text EmbeddingsICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP43922.2022.9747462(6252-6256)Online publication date: 23-May-2022
    • (2021)A Privacy-Oriented Approach for Depression Signs Detection Based on Speech AnalysisElectronics10.3390/electronics1023298610:23(2986)Online publication date: 1-Dec-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media