More Web Proxy on the site http://driver.im/

article

Exploring similarity-based classification of larynx disorders from human voice

Authors:

Evaldas Vaiciukynas,

Antanas Verikas,

Marija Bacauskiene,

Virgilijus UlozaAuthors Info & Claims

Speech Communication, Volume 54, Issue 5

Pages 601 - 610

https://doi.org/10.1016/j.specom.2011.04.004

Published: 01 June 2012 Publication History

Abstract

In this paper identification of laryngeal disorders using cepstral parameters of human voice is researched. Mel-frequency cepstral coefficients (MFCCs), extracted from audio recordings of patient's voice, are further approximated, using various strategies (sampling, averaging, and clustering by Gaussian mixture model). The effectiveness of similarity-based classification techniques in categorizing such pre-processed data into normal voice, nodular, and diffuse vocal fold lesion classes is explored and schemes to combine binary decisions of support vector machines (SVMs) are evaluated. Most practiced RBF kernel was compared to several constructed custom kernels: (i) a sequence kernel, defined over a pair of matrices, rather than over a pair of vectors and calculating the kernelized principal angle (KPA) between subspaces; (ii) a simple supervector kernel using only means of patient's GMM; (iii) two distance kernels, specifically tailored to exploit covariance matrices of GMM and using the approximation of the Kullback-Leibler divergence from the Monte-Carlo sampling (KL-MCS), and the Kullback-Leibler divergence combined with the Earth mover's distance (KL-EMD) as similarity metrics. The sequence kernel and the distance kernels both outperformed the popular RBF kernel, but the difference is statistically significant only in the distance kernels case. When tested on voice recordings, collected from 410 subjects (130 normal voice, 140 diffuse, and 140 nodular vocal fold lesions), the KL-MCS kernel, using GMM with full covariance matrices, and the KL-EMD kernel, using GMM with diagonal covariance matrices, provided the best overall performance. In most cases, SVM reached higher accuracy than least squares SVM, except for common binary classification using distance kernels. The results indicate that features, modeled with GMM, and kernel methods, exploiting this information, is an interesting fusion of generative (probabilistic) and discriminative (hyperplane) models for similarity-based classification.

References

[1]

Springer Handbook of Speech Processing. Springer-Verlag, New York, Inc., Secaucus, NJ, USA.

[2]

Similarity-based classification: concepts and algorithms. Journal of Machine Learning Research. v10. 747-776.

[3]

Doremalen, J. 2007. Hierarchical temporal memory networks for spoken digit recognition, Ph.D. thesis, Radboud University Nijmegen, Department of Language & Speech.

[4]

Dubnov, S., Yazdani, M. 2008. Computer audition toolbox (catbox). URL <http://cosmal.ucsd.edu/cal/projects/CATbox>.

[5]

Automated speech analysis applied to laryngeal disease categorization. Computer Methods and Programs in Biomedicine. v91. 36-47.

[6]

Support vector machines applied to the detection of voice disorders. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (Eds.), Lecture Notes in Computer Science, vol. 3817. Springer, Berlin / Heidelberg. pp. 219-230.

[7]

Nonparametric speaker recognition method using Earth mover's distance. IEICE Transactions on Information and Systems. vE89-D. 1074-1081.

[8]

Levina, E., Bickel, P. 2001. The Earth mover's distance is the mallows distance: some insights from statistics. In: Proceedings of the 8th IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 251-256.

[9]

Markaki, M.E., Stylianou, Y., Arias-Londoño, J.D., Godino-Llorente, J.I. 2010. Dysphonia detection based on modulation spectral features and cepstral coefficients. In: Proceedings of the 35th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5162-5165.

[10]

Applying SVMS and weight-based factor analysis to unsupervised adaptation for speaker verification. Computer Speech & Language. v25 i2. 327-340.

[11]

Pampalk, E. 2004 A matlab toolbox to compute music similarity from audio. In: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR).

[12]

Pouchoulin, G., Fredouille, C., Bonastre, J.-F., Ghio, A., Giovanni, A. 2007. Frequency study for the characterization of the dysphonic voices. In: Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech), p. 11981201.

[13]

Least squares support vector machine classifiers. Neural Processing Letters. v9 i3. 293-300.

[14]

Similarity, separability, and the triangle inequality. Psychological Review. v89 i2. 123-154.

[15]

Wang, X., Zhang, J., Yan, Y., 2011. Discrimination between pathological and normal voices using GMM-SVM approach., Journal of Voice. <http://www.jvoice.org/article/S0892-1997(09)00119-2/abstract>.

[16]

Weston, J., Sinz, F. 2006. Matlab toolbox for kernel methods: Spider. URL <http://www.kyb.tuebingen.mpg.de/bs/people/spider/>.

[17]

Weston, J., Watkins, C. 1999. Support vector machines for multi-class pattern recognition. In: Proceedings of the 7th European Symposium on Artificial Neural Networks (ESANN).

[18]

Learning over sets using kernel principal angles. Journal of Machine Learning Research. v4. 913-931.

[19]

Class-incremental generalized discriminant analysis. Neural Computations. v18 i4. 979-1006.

Cited By

Gidaye GNirmal JEzzine KFrikha M(2020)Wavelet sub-band features for voice disorder detection and classificationMultimedia Tools and Applications10.1007/s11042-020-09424-179:39-40(28499-28523)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.1007/s11042-020-09424-1
Vaiciukynas EVerikas AGelzinis ABacauskiene MMinelga JHållander MPadervinskis EUloza V(2015)Fusing voice and query data for non-invasive detection of laryngeal disordersExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.07.00142:22(8445-8453)Online publication date: 1-Dec-2015
https://dl.acm.org/doi/10.1016/j.eswa.2015.07.001

Exploring similarity-based classification of larynx disorders from human voice
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

On robustness of speech based biometric systems against voice conversion attack

Graphical abstractDisplay Omitted HighlightsEvaluation of robustness of SID and SV systems against VC spoofing attack.The vulnerability in decreasing order of VC techniques is GMM, WFW and WFW-.In SV systems, GMM-SVM is more resilient than GMM-UBM for ...
Automatic Detection of Voice Disorders
SLSP 2015: Proceedings of the Third International Conference on Statistical Language and Speech Processing - Volume 9449

Speech and communication are the bases of our society and the quality of the speech can seriously affect any person's life. Besides the irregularities in the voice production can be caused by different diseases which can be treated better if they are ...
Automatic speaker age and gender recognition using acoustic and prosodic level information fusion

The paper presents a novel automatic speaker age and gender identification approach which combines seven different methods at both acoustic and prosodic levels to improve the baseline performance. The three baseline subsystems are (1) Gaussian mixture ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Speech Communication

Speech Communication Volume 54, Issue 5

June, 2012

98 pages

ISSN:0167-6393

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2011.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 June 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gidaye GNirmal JEzzine KFrikha M(2020)Wavelet sub-band features for voice disorder detection and classificationMultimedia Tools and Applications10.1007/s11042-020-09424-179:39-40(28499-28523)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.1007/s11042-020-09424-1
Vaiciukynas EVerikas AGelzinis ABacauskiene MMinelga JHållander MPadervinskis EUloza V(2015)Fusing voice and query data for non-invasive detection of laryngeal disordersExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.07.00142:22(8445-8453)Online publication date: 1-Dec-2015
https://dl.acm.org/doi/10.1016/j.eswa.2015.07.001

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents