More Web Proxy on the site http://driver.im/

article

An optimal two stage feature selection for speech emotion recognition using acoustic features

Authors:

Swarna Kuchibhotla,

Hima Deepthi Vankayalapati,

Koteswara Rao AnneAuthors Info & Claims

International Journal of Speech Technology, Volume 19, Issue 4

Pages 657 - 667

https://doi.org/10.1007/s10772-016-9358-0

Published: 01 December 2016 Publication History

Abstract

Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15---20 % with two stage feature selection method when compared with performance of the classifier with feature fusion.

References

[1]

Anne, K. R., Kuchibhotla, S., & Vankayalapati, H. D. (2015). Acoustic modeling for emotion recognition. Berlin: Springer.

[2]

Bou-Ghazale, S. E., & Hansen, J. H. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8(4), 429-442.

[3]

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of german emotional speech. Interspeech, 5, 1517-1520.

[4]

Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40(1), 5-32.

Digital Library

[5]

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32-80.

[6]

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap (Vol. 57). Boca Raton, FL: CRC Press.

[7]

El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572-587.

Digital Library

[8]

Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861-874.

Digital Library

[9]

Fernandez, R. (2003). A computational model for the automatic recognition of affect in speech. PhD thesis, Massachusetts Institute of Technology.

[10]

Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. In LREC.

[11]

Iohnstone, T., & Scherer, K. (2000). Vocal communication of emotion. Handbook of emotion (pp. 220-235). New York: Guilford.

[12]

Jaimes, A., & Sebe, N. (2007). Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding, 108(1), 116-134.

Digital Library

[13]

Jain, A., & Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 153-158.

Digital Library

[14]

John, G. H., Kohavi, R., Pfleger, K., et al. (1994). Irrelevant features and the subset selection problem. ICML, 94, 121-129.

Digital Library

[15]

Kim, S., Georgiou, P. G., Lee, S., & Narayanan, S. (2007). Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In MMSP 2007, IEEE 9th Workshop on Multimedia Signal Processing (pp. 48-51). IEEE.

[16]

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273-324.

Digital Library

[17]

Kuchibhotla, S., Vankayalapati, H., Vaddi, R., & Anne, K. (2014a). A comparative analysis of classifiers in emotion recognition through acoustic features. International Journal of Speech Technology, 17(4), 401-408.

Digital Library

[18]

Kuchibhotla, S., Yalamanchili, B., Vankayalapati, H., & Anne, K. (2014b). Speech emotion recognition using regularized discriminant analysis. In Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013 (pp. 363-369). Springer.

[19]

Kuchibhotlaa, S., Vankayalapati, H. D., Yalamanchili, B., & Anne, K. R. (2015). Analysis and evaluation of discriminant analysis techniques for multiclass classification of human vocal emotions. In Advances in Intelligent Informatics (pp. 325-333). Springer.

[20]

Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In INTERSPEECH.Beijing: Citeseer.

[21]

Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.

Digital Library

[22]

Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490-501.

Digital Library

[23]

Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93, 1097.

[24]

Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden markov models. Speech Communication, 41(4), 603-623.

[25]

Pantic, M., & Rothkrantz, L. J. (2003). Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9), 1370-1390.

[26]

Pohjalainen, J., Räsänen, O., & Kadioglu, S. (2013). Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Computer Speech & Language, 29(1), 145-171.

[27]

Pudil, P., Novovi¿ová, J., & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(11), 1119-1125.

Digital Library

[28]

Sato, N., & Obuchi, Y. (2007). Emotion recognition using mel-frequency cepstral coefficients. Information and Media Technologies, 2(3), 835-848.

[29]

Scherer, K. R., Banse, R., Wallbott, H. G., & Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding. Motivation and Emotion, 15(2), 123-148.

[30]

Sedaaghi, M. H., Ververidis, D., & Kotropoulos, C. (2007). Improving speech emotion recognition using adaptive genetic algorithms. In Proceedings of European Signal Processing Conference (EUSIPCO).

[31]

¿imundic, A. M. (2008). Measures of diagnostic accuracy: Basic definitions. Medical and Biological Sciences, 22(4), 61-65.

[32]

Somol, P., Pudil, P., Novovi¿ová, J., & Paclik, P. (1999). Adaptive floating search methods in feature selection. Pattern Recognition Letters, 20(11), 1157-1163.

Digital Library

[33]

Tao, J., & Tan, T. (2005). Affective computing: A review. In International Conference on Affective computing and intelligent interaction (pp. 981-995). Springer.

[34]

Vankayalapati, H., Anne, K., & Kyamakya, K. (2010). Extraction of visual and acoustic features of the driver for monitoring driver ergonomics applied to extended driver assistance systems. In Data and Mobility (pp. 83-94). Springer.

[35]

Vankayalapati, H. D., Siddha, V. R., Kyamakya, K., & Anne, K. R. (2011). Driver emotion detection from the acoustic features of the driver for real-time assessment of driving ergonomics process. International Society for Advanced Science and Technology (ISAST) Transactions on Computers and Intelligent Systems, 3(1), 65-73.

[36]

Ververidis, D., & Kotropoulos, C. (2006). Fast sequential floating forward selection applied to emotional speech features estimated on des and susas data collections. In textit 2006 14th European on Signal Processing Conference.

[37]

Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956-2970.

Digital Library

[38]

Vogt, T., André, E, & Wagner, J. (2008). Automatic recognition of emotions from speech: A review of the literature and recommendations for practical realisation. In Affect and Emotion in Human-Computer Interaction (pp. 75-91). Springer.

[39]

Whitney, A. W. (1971). A direct method of nonparametric measurement selection. IEEE Transactions on Computers, 100(9), 1100-1103.

Digital Library

[40]

Williams, C. E., & Stevens, K. N. (1981). Vocal correlates of emotional states. In J. K. Darby (Ed.), Speech evaluation in psychiatry (pp. 221-240). New York: Grune & Stratton.

Cited By

Paul BBera SDey TPhadikar S(2024)Machine learning approach of speech emotions recognition using feature fusion techniqueMultimedia Tools and Applications10.1007/s11042-023-16036-y83:3(8663-8688)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-16036-y
Gupta AKumar RKumar Y(2023)An automatic speech recognition system in Indian and foreign languagesIntelligent Decision Technologies10.3233/IDT-22022817:2(505-526)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/IDT-220228
Panda SJena APanda MPanda S(2023)Speech emotion recognition using multimodal feature fusion with machine learning approachMultimedia Tools and Applications10.1007/s11042-023-15275-382:27(42763-42781)Online publication date: 21-Apr-2023
https://dl.acm.org/doi/10.1007/s11042-023-15275-3
Show More Cited By

Recommendations

Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends

Tracing 20 years of progress in making machines hear our emotions based on speech signal properties.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Speech Technology

International Journal of Speech Technology Volume 19, Issue 4

December 2016

298 pages

ISSN:1381-2416

Issue’s Table of Contents

Copyright © Copyright © 2016 Springer Science+Business Media New York.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 December 2016

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Paul BBera SDey TPhadikar S(2024)Machine learning approach of speech emotions recognition using feature fusion techniqueMultimedia Tools and Applications10.1007/s11042-023-16036-y83:3(8663-8688)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-16036-y
Gupta AKumar RKumar Y(2023)An automatic speech recognition system in Indian and foreign languagesIntelligent Decision Technologies10.3233/IDT-22022817:2(505-526)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/IDT-220228
Panda SJena APanda MPanda S(2023)Speech emotion recognition using multimodal feature fusion with machine learning approachMultimedia Tools and Applications10.1007/s11042-023-15275-382:27(42763-42781)Online publication date: 21-Apr-2023
https://dl.acm.org/doi/10.1007/s11042-023-15275-3
Kaur KSingh P(2023)Trends in speech emotion recognition: a comprehensive surveyMultimedia Tools and Applications10.1007/s11042-023-14656-y82:19(29307-29351)Online publication date: 22-Feb-2023
https://dl.acm.org/doi/10.1007/s11042-023-14656-y
Kaur KSingh P(2022)Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural NetworkACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351188821:5(1-23)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3511888
Yalamanchili BSamayamantula SAnne K(2022)Neural network-based blended ensemble learning for speech emotion recognitionMultidimensional Systems and Signal Processing10.1007/s11045-022-00845-933:4(1323-1348)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s11045-022-00845-9
Albadr MTiun SAyob MAL-Dhief FOmar KMaen M(2022)Speech emotion recognition using optimized genetic algorithm-extreme learning machineMultimedia Tools and Applications10.1007/s11042-022-12747-w81:17(23963-23989)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1007/s11042-022-12747-w
Yalamanchili BAnne KSamayamantula S(2022)Speech Emotion Recognition using Time Distributed 2D-Convolution layers for CAPSULENETSMultimedia Tools and Applications10.1007/s11042-022-12112-x81:12(16945-16966)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11042-022-12112-x
Bakhshi AChalup SHarimi AMirhassani S(2020)Recognition of emotion from speech using evolutionary cepstral coefficientsMultimedia Tools and Applications10.1007/s11042-020-09591-179:47-48(35739-35759)Online publication date: 1-Dec-2020
https://dl.acm.org/doi/10.1007/s11042-020-09591-1
Shenoi VKuchibhotla SKotturu P(2020)An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithmsInternational Journal of Speech Technology10.1007/s10772-020-09726-723:3(625-632)Online publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1007/s10772-020-09726-7

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents