[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Fusion of voice signal information for detection of mild laryngeal pathology

Published: 01 May 2014 Publication History

Abstract

Graphical abstractDisplay Omitted HighlightsVarious information are extracted from a voice signal for the task of detecting mild disorder of human larynx.Feature-level and decision-level fusion is explored using SVM and RF classifiers, wrapper-based AMDE feature selection is also investigated.We contribute to feature set decomposition approaches by introducing a simple feature oriented technique for ensemble design, which is based on partitioning the feature set by k-means clustering.Results indicate that ensemble of RF classifiers, induced using the proposed technique, significantly improve decision-level and outperform feature-level fusion. Detection of mild laryngeal disorders using acoustic parameters of human voice is the main objective in this study. Observations of sustained phonation (audio recordings of vocalized /a/) are labeled by clinical diagnosis and rated by severity (from 0 to 3). Research is exclusively constrained to healthy (severity 0) and mildly pathological (severity 1) cases - two the most difficult classes to distinguish between.Comprehensive voice signal characterization and information fusion constitute the approach adopted here. Characterization is obtained through diverse feature set, containing 26 feature subsets of varying size, extracted from the voice signal. Usefulness of feature-level and decision-level fusion is explored using support vector machine (SVM) and random forest (RF) as basic classifiers. For both types of fusion we also investigate the influence of feature selection on model accuracy. To improve the decision-level fusion we introduce a simple unsupervised technique for ensemble design, which is based on partitioning the feature set by k-means clustering, where the parameter k controls the size and diversity of the prospective ensemble.All types of the fusion resulted in an evident improvement over the best individual feature subset. However, none of the types, including fusion setups comprising feature selection, proved to be significantly superior over the rest. The proposed ensemble design by feature set decomposition discernibly enhanced decision-level and significantly outperformed feature-level fusion. Ensemble of RF classifiers, induced from a cluster-based partitioning of the feature set, achieved equal error rate of 13.1 1.8% in the detection of mildly pathological larynx. This is a very encouraging result, considering that detection of mild laryngeal disorder is a more challenging task than a common discrimination between healthy and a wide spectrum of pathological cases.

References

[1]
S.B. Davis, Acoustic Characteristics of Normal and Pathological Voices, Speech and Language: Advances in Basic Research and Practice, vol. 1, Academic Press, New York, 1979.
[2]
H. Behrbohm, O. Kaschke, T. Nawka, A. Swift, Ear, Nose and Throat Diseases: With Head and Neck Surgery, Thieme Medical, New York, 2009.
[3]
A. Gelzinis, A. Verikas, M. Bacauskiene, Automated speech analysis applied to laryngeal disease categorization, Computer Methods and Programs in Biomedicine, 91 (2008) 36-47.
[4]
P. Dejonckere, A. Giordano, J. Schoentgen, S. Fraj, L. Bocchi, C. Manfredi, To what degree of voice perturbation are jitter measurements valid? A novel approach with synthesized vowels and visuo-perceptual pattern recognition, Biomedical Signal Processing and Control, 7 (2012) 37-42.
[5]
M.E. Markaki, Y. Stylianou, Voice pathology detection and discrimination based on modulation spectral features, IEEE Transactions on Audio Speech & Language Processing, 19 (2011) 1938-1948.
[6]
F.N. Batalla, C.S. Nieto, M.M. Melón, C.M. Estrada, E.M. Alvarez, M.C. Villarreal, Objective evaluation of vocal pathology in childhood, Acta Otorrinolaringológica Espa nola, 50 (1999) 525-529.
[7]
G. Pouchoulin, C. Fredouille, J.-F. Bonastre, A. Ghio, A. Giovanni, Frequency study for the characterization of the dysphonic voices, in: Proceedings of the 8th Annual Conference of the International Speech Communication Association (INTERSPEECH), 2007, pp. 1198-1201.
[8]
K. Murphy, P.G. Vilda, Digital signal processing techniques for application in the analysis of pathological voice and normophonic singing voice, Technical University of Madrid, 2008.
[9]
J.-Y. Lee, M. Hahn, Automatic assessment of pathological voice quality using higher-order statistics in the LPC residual domain, EURASIP Journal on Advances in Signal Processing, 2009 (2009) 1-9.
[10]
J.I. Godino-Llorente, P. Gómez-Vilda, F. Cruz-Roldán, M. Blanco-Velasco, R. Fraile, Pathological likelihood index as a measurement of the degree of voice normality and perceived hoarseness, Journal of Voice, 24 (2010) 667-677.
[11]
Z. Kons, A. Satt, R. Hoory, On feature extraction for voice pathology detection from speech signals, in: Proceedings of the 1st Annual Afeka-AVIOS Speech Processing Conference, Tel Aviv Academic College of Engineering, Tel Aviv, Israel, 2011.
[12]
V. Wolfe, D. Martin, Acoustic correlates of dysphonia: type and severity, Journal of Communication Disorders, 30 (1997) 403-416.
[13]
A. Alpan, J. Schoentgen, Y. Maryn, F. Grenez, P. Murphy, Assessment of disordered voice via the first rahmonic, Speech Communication, 54 (2012) 655-663.
[14]
J. Hillenbrand, R.A. Houde, Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech, Journal of Speech Language and Hearing Research, 39 (1996) 311-321.
[15]
Y.D. Heman-Ackah, R.J. Heuer, D.D. Michael, R. Ostrowski, M. Horman, M.M. Baroody, J. Hillenbrand, R.T. Sataloff, Cepstral peak prominence: a more reliable measure of dysphonia, Annals of Otology Rhinology and Laryngology, 112 (2003) 324-333.
[16]
J.A.G. Garcia, J.I. Godino-Llorente, G. Castellanos-Domínguez, Complexity analysis using nonuniform embedding techniques for voice pathological discrimination, in: Proceedings of the 5th International Conference on Nonlinear Speech Processing (NOLISP), Lecture Notes in Computer Science, vol. 7015, Springer-Verlag, Las Palmas de Gran Canaria, Spain, 2011, pp. 262-269.
[17]
J.D. Arias-Londo no, J.I. Godino-Llorente, N. Sáenz-Lechón, V. Osma-Ruiz, G. Castellanos-Domínguez, An improved method for voice pathology detection by means of a HMM-based feature space transformation, Pattern Recognition, 43 (2010) 3100-3112.
[18]
Y. Maryn, P. Corthals, P.V. Cauwenberge, N. Roy, M.D. Bodt, Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels, Journal of Voice, 24 (2010) 540-555.
[19]
M.K. Arjmandi, M. Pooyan, M. Mikaili, M. Vali, A. Moqarehzadeh, Identification of voice disorders using long-time features and support vector machine with different feature reduction methods, Journal of Voice, 25 (2011) 275-289.
[20]
H. Khadivi Heris, B. Seyed Aghazadeh, M. Nikkhah-Bahrami, Optimal feature selection for the assessment of vocal fold disorders, Computers in Biology and Medicine, 39 (2009) 860-868.
[21]
M. Markaki, Y. Stylianou, J.D. Arias-Londo no, J.I. Godino-Llorente, Dysphonia detection based on modulation spectral features and cepstral coefficients, in: Proceedings of the 35th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2010.
[22]
L. Rokach, O. Maimon, Data Mining using Decomposition Methods, Data Mining and Knowledge Discovery Handbook, Springer, New York/Dordrecht/Heidelberg/London, 2010.
[23]
L. Rokach, O. Maimon, Feature set decomposition for decision trees, Intelligent Data Analysis, 9 (2005) 131-158.
[24]
E. Pampalk, A Matlab toolbox to compute music similarity from audio, in: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR), 2004.
[25]
L. Breiman, Random forests, Machine Learning, 45 (2001) 5-32.
[26]
A. Jaiantilal, Random forest (regression, classification and clustering) implementation for Matlab (and standalone), 2012.
[27]
C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2 (2011) 1-27.
[28]
S. Das, P.N. Suganthan, Differential evolution: a survey of the state-of-the-art, IEEE Transactions on Evolutionary Computation, 15 (2011) 4-31.
[29]
N. Sáenz-Lechón, J.I. Godino-Llorente, V. Osma-Ruiz, P. Gómez-Vilda, Methodological issues in the development of automatic systems for voice pathology detection, Biomedical Signal Processing and Control, 1 (2006) 120-128.
[30]
N. Brümmer, E. de Villiers, The BOSARIS toolkit user guide: theory, algorithms and code for binary classifier score processing, 2011.
[31]
G. Pampara, A.P. Engelbrecht, N. Franken, Binary differential evolution, in: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), 2006, pp. 1873-1879.
[32]
A. Mukhopadhyay, M. De, U. Maulik, Selection of GO-based semantic similarity measures through AMDE for predicting protein-protein interactions, in: Swarm, Evolutionary, and Memetic Computing, Lecture Notes in Computer Science, vol. 7077, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 55-62.
[33]
R. Nilsson, J.M. Pe na, J. Björkegren, J. Tegnér, Evaluating feature selection for SVMs in high dimensions, in: Proceedings of the 17th European Conference on Machine Learning (ECML), Springer-Verlag, Berlin, Heidelberg, 2006, pp. 719-726.
[34]
J. Kujala, T. Aho, T. Elomaa, A walk from 2-norm SVM to 1-norm SVM, in: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), IEEE Computer Society, Miami, FL, USA, 2009, pp. 836-841.
[35]
J.H. Friedman, B.E. Popescu, Gradient directed regularization for linear regression and classification, Stanford University, 2004.
[36]
H. Xu, C. Caramanis, S. Mannor, Sparse algorithms are not stable: a no-free-lunch theorem, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (2012) 187-193.
[37]
D.W. He, B. Strege, H. Tolle, A. Kusiak, Decomposition in automatic generation of Petri nets for manufacturing system control and scheduling, International Journal of Production Research, 38 (2000) 1437-1457.
[38]
M. Aly, A.F. Atiya, Novel methods for the feature subset ensembles approach, ICGST International Journal on Artificial Intelligence and Machine Learning AIML, 6 (2006) 21-27.
[39]
L.I. Kuncheva, J.J. Rodriguez, C.O. Plumpton, D.E.J. Linden, S.J. Johnston, Random subspace ensembles for fMRI classification, IEEE Transactions on Medical Imaging, 29 (2010) 531-542.
[40]
N.C. Oza, K. Tumer, Input decimated ensembles: decorrelation through dimensionality reduction, in: Proceedings of the 2nd International Workshop on Multiple Classifier Systems, Springer-Verlag, Cambridge, UK, 2001, pp. 238-249.
[41]
S. Prasad, L.M. Bruce, Decision fusion with confidence-based weight assignment for hyperspectral target recognition, IEEE Transactions on Geoscience and Remote Sensing, 46 (2008) 1448-1456.
[42]
P. Mitra, C. Murthy, S. Pal, Unsupervised feature selection using feature similarity, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2002) 301-312.
[43]
Y. Liao, J.E. Moody, Constructing heterogeneous committees using input feature grouping: application to economic forecasting, in: Proceedings of the 13th Annual Conference on Neural Information Processing Systems: Natural & Synthetic (NIPS), Advances in Neural Information Processing Systems, vol. 12, MIT Press, Cambridge, MA, USA, 1999, pp. 921-927.
[44]
I. Guyon, Practical Feature Selection: From Correlation to Causality, NATO Science for Peace and Security Series D: Information and Communication Security, vol. 19, IOS Press, 2008.
[45]
M.W. Mitchell, Bias of the random forest out-of-bag (OOB) error for certain input parameters, Open Journal of Statistics, 1 (2011) 205-211.

Cited By

View all
  • (2016)Voice data mining for laryngeal pathology assessmentComputers in Biology and Medicine10.1016/j.compbiomed.2015.07.02669:C(270-276)Online publication date: 1-Feb-2016
  • (2015)Fusing voice and query data for non-invasive detection of laryngeal disordersExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.07.00142:22(8445-8453)Online publication date: 1-Dec-2015
  • (2014)A system for ubiquitous distributed acquisition of voice alteration samples through a mobile applicationProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2649387.2660853(813-818)Online publication date: 20-Sep-2014

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Applied Soft Computing
Applied Soft Computing  Volume 18, Issue C
May 2014
338 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 May 2014

Author Tags

  1. Angle modulated differential evolution
  2. Ensemble of classifiers
  3. Feature selection
  4. Pathological voice
  5. Random forest
  6. SVM

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Voice data mining for laryngeal pathology assessmentComputers in Biology and Medicine10.1016/j.compbiomed.2015.07.02669:C(270-276)Online publication date: 1-Feb-2016
  • (2015)Fusing voice and query data for non-invasive detection of laryngeal disordersExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.07.00142:22(8445-8453)Online publication date: 1-Dec-2015
  • (2014)A system for ubiquitous distributed acquisition of voice alteration samples through a mobile applicationProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2649387.2660853(813-818)Online publication date: 20-Sep-2014

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media