[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3452940.3453038acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciteeConference Proceedingsconference-collections
short-paper

Autoencoder Based on Cepstrum Separation to Detect Depression from Speech

Published: 17 May 2021 Publication History

Abstract

Depression has become a common mental disorder that plagues more and more people. This paper uses speech signals to study a method for predicting the degree of depression and help clinicians judge the degree of depression in patients. In this paper, a autoencoder model based on Bidirectional Gated Recurrent Unit (BiGRU) is proposed to extract deep speech features, with the original speech as the network input, the signal after the cepstrum separation as the training target. In this model, we take the original speech as the input of the network, and the homomorphic speech as the training target of the model. The long-term deep features extracted by the model and the short-time shallow features extracted by the Opensmile toolkit were sent into Random Forest (RF) respectively, and finally the Support Vector Regression (SVR) was used for decision fusion recognition. In this paper, experiments are conducted on the DAIC-WOZ data set, and the Root Mean Square Error (RMSE) is 5.68 and the Mean Absolute Error (MAE) is 4.64.

References

[1]
Kurniawan H, Maslov A V, Pechenizkiy M. Stress detection from speech and galvanic skin response signals[C]//Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. IEEE, 2013: 209--214.
[2]
Huilian L, Weiping H, Yan W. Speech Emotion Recognition Based on BLSTM and CNN Feature Fusion[C]//Proceedings of the 2020 4th International Conference on Digital Signal Processing. 2020: 169--172.
[3]
John W. Kim, Rif A. Saurous. 2018. Emotion Recognition from Human Speech Using Temporal Information and Deep Learning. In Proceedings of the Interspeech 2018. 2018-1132.
[4]
Zhao J, Su W, Jia J, et al. Research on depression detection algorithm combine acoustic rhythm with sparse face recognition[J]. Cluster Computing, 2019: 1--12.
[5]
Yang L, Jiang D, Xia X, et al. Multimodal measurement of depression using deep learning models[C]//Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge.
[6]
Senoussaoui M, Cardinal P, Koerich A L. Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction[J]. arXiv preprint arXiv:1907.04928, 2019.
[7]
Dinkel H, Zhang P, Wu M, et al. Depa: Self-supervised audio embedding for depression detection[J]. arXiv preprint arXiv:1910.13028, 2019.
[8]
A. Bouzid and N. Ellouze, "Glottal opening instant detection from speech signal, " 2004 12th European Signal Processing Conference, Vienna, 2004, pp. 729--732.
[9]
Oppenheim A V. Speech analysis-synthesis system based on homomorphic filtering[J]. The Journal of the Acoustical Society of America, 1969, 45(2): 458--465.
[10]
Ringeval F, Schuller B, Valstar M, et al. Avec 2017: Real-life depression, and affect recognition workshop and challenge[C]//Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. 2017: 3--9.
[11]
Schuller B, Steidl S, Batliner A. The interspeech 2009 emotion challenge[C]//Tenth Annual Conference of the International Speech Communication Association. 2009.

Cited By

View all
  • (2022)Prediction of Depression Severity Based on Transformer Encoder and CNN Model2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)10.1109/ISCSLP57327.2022.10038064(339-343)Online publication date: 11-Dec-2022
  • (2022)Deep learning for depression recognition with audiovisual cuesInformation Fusion10.1016/j.inffus.2021.10.01280:C(56-86)Online publication date: 1-Apr-2022
  • (2021)Learning from Limited Data for Speech-based Traumatic Brain Injury (TBI) Detection2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA52953.2021.00239(1482-1486)Online publication date: Dec-2021

Index Terms

  1. Autoencoder Based on Cepstrum Separation to Detect Depression from Speech
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          ICITEE '20: Proceedings of the 3rd International Conference on Information Technologies and Electrical Engineering
          December 2020
          687 pages
          ISBN:9781450388665
          DOI:10.1145/3452940
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 17 May 2021

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. BiGRU
          2. Cepstrum separation
          3. Depression recognition
          4. Original speech

          Qualifiers

          • Short-paper
          • Research
          • Refereed limited

          Funding Sources

          Conference

          ICITEE2020

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)21
          • Downloads (Last 6 weeks)1
          Reflects downloads up to 21 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2022)Prediction of Depression Severity Based on Transformer Encoder and CNN Model2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)10.1109/ISCSLP57327.2022.10038064(339-343)Online publication date: 11-Dec-2022
          • (2022)Deep learning for depression recognition with audiovisual cuesInformation Fusion10.1016/j.inffus.2021.10.01280:C(56-86)Online publication date: 1-Apr-2022
          • (2021)Learning from Limited Data for Speech-based Traumatic Brain Injury (TBI) Detection2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA52953.2021.00239(1482-1486)Online publication date: Dec-2021

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media