[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

An optimal two stage feature selection for speech emotion recognition using acoustic features

Published: 01 December 2016 Publication History

Abstract

Feature Fusion plays an important role in speech emotion recognition to improve the classification accuracy by combining the most popular acoustic features for speech emotion recognition like energy, pitch and mel frequency cepstral coefficients. However the performance of the system is not optimal because of the computational complexity of the system, which occurs due to high dimensional correlated feature set after feature fusion. In this paper, a two stage feature selection method is proposed. In first stage feature selection, appropriate features are selected and fused together for speech emotion recognition. In second stage feature selection, optimal feature subset selection techniques [sequential forward selection (SFS) and sequential floating forward selection (SFFS)] are used to eliminate the curse of dimensionality problem due to high dimensional feature vector after feature fusion. Finally the emotions are classified by using several classifiers like Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Support Vector Machine (SVM) and K Nearest Neighbor (KNN). The performance of overall emotion recognition system is validated over Berlin and Spanish databases by considering classification rate. An optimal uncorrelated feature set is obtained by using SFS and SFFS individually. Results reveal that SFFS is a better choice as a feature subset selection method because SFS suffers from nesting problem i.e it is difficult to discard a feature after it is retained into the set. SFFS eliminates this nesting problem by making the set not to be fixed at any stage but floating up and down during the selection based on the objective function. Experimental results showed that the efficiency of the classifier is improved by 15---20 % with two stage feature selection method when compared with performance of the classifier with feature fusion.

References

[1]
Anne, K. R., Kuchibhotla, S., & Vankayalapati, H. D. (2015). Acoustic modeling for emotion recognition. Berlin: Springer.
[2]
Bou-Ghazale, S. E., & Hansen, J. H. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8(4), 429-442.
[3]
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of german emotional speech. Interspeech, 5, 1517-1520.
[4]
Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40(1), 5-32.
[5]
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32-80.
[6]
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap (Vol. 57). Boca Raton, FL: CRC Press.
[7]
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572-587.
[8]
Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861-874.
[9]
Fernandez, R. (2003). A computational model for the automatic recognition of affect in speech. PhD thesis, Massachusetts Institute of Technology.
[10]
Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. In LREC.
[11]
Iohnstone, T., & Scherer, K. (2000). Vocal communication of emotion. Handbook of emotion (pp. 220-235). New York: Guilford.
[12]
Jaimes, A., & Sebe, N. (2007). Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding, 108(1), 116-134.
[13]
Jain, A., & Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 153-158.
[14]
John, G. H., Kohavi, R., Pfleger, K., et al. (1994). Irrelevant features and the subset selection problem. ICML, 94, 121-129.
[15]
Kim, S., Georgiou, P. G., Lee, S., & Narayanan, S. (2007). Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In MMSP 2007, IEEE 9th Workshop on Multimedia Signal Processing (pp. 48-51). IEEE.
[16]
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273-324.
[17]
Kuchibhotla, S., Vankayalapati, H., Vaddi, R., & Anne, K. (2014a). A comparative analysis of classifiers in emotion recognition through acoustic features. International Journal of Speech Technology, 17(4), 401-408.
[18]
Kuchibhotla, S., Yalamanchili, B., Vankayalapati, H., & Anne, K. (2014b). Speech emotion recognition using regularized discriminant analysis. In Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013 (pp. 363-369). Springer.
[19]
Kuchibhotlaa, S., Vankayalapati, H. D., Yalamanchili, B., & Anne, K. R. (2015). Analysis and evaluation of discriminant analysis techniques for multiclass classification of human vocal emotions. In Advances in Intelligent Informatics (pp. 325-333). Springer.
[20]
Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In INTERSPEECH.Beijing: Citeseer.
[21]
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.
[22]
Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490-501.
[23]
Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93, 1097.
[24]
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden markov models. Speech Communication, 41(4), 603-623.
[25]
Pantic, M., & Rothkrantz, L. J. (2003). Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9), 1370-1390.
[26]
Pohjalainen, J., Räsänen, O., & Kadioglu, S. (2013). Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Computer Speech & Language, 29(1), 145-171.
[27]
Pudil, P., Novovi¿ová, J., & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(11), 1119-1125.
[28]
Sato, N., & Obuchi, Y. (2007). Emotion recognition using mel-frequency cepstral coefficients. Information and Media Technologies, 2(3), 835-848.
[29]
Scherer, K. R., Banse, R., Wallbott, H. G., & Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding. Motivation and Emotion, 15(2), 123-148.
[30]
Sedaaghi, M. H., Ververidis, D., & Kotropoulos, C. (2007). Improving speech emotion recognition using adaptive genetic algorithms. In Proceedings of European Signal Processing Conference (EUSIPCO).
[31]
¿imundic, A. M. (2008). Measures of diagnostic accuracy: Basic definitions. Medical and Biological Sciences, 22(4), 61-65.
[32]
Somol, P., Pudil, P., Novovi¿ová, J., & Paclik, P. (1999). Adaptive floating search methods in feature selection. Pattern Recognition Letters, 20(11), 1157-1163.
[33]
Tao, J., & Tan, T. (2005). Affective computing: A review. In International Conference on Affective computing and intelligent interaction (pp. 981-995). Springer.
[34]
Vankayalapati, H., Anne, K., & Kyamakya, K. (2010). Extraction of visual and acoustic features of the driver for monitoring driver ergonomics applied to extended driver assistance systems. In Data and Mobility (pp. 83-94). Springer.
[35]
Vankayalapati, H. D., Siddha, V. R., Kyamakya, K., & Anne, K. R. (2011). Driver emotion detection from the acoustic features of the driver for real-time assessment of driving ergonomics process. International Society for Advanced Science and Technology (ISAST) Transactions on Computers and Intelligent Systems, 3(1), 65-73.
[36]
Ververidis, D., & Kotropoulos, C. (2006). Fast sequential floating forward selection applied to emotional speech features estimated on des and susas data collections. In textit 2006 14th European on Signal Processing Conference.
[37]
Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956-2970.
[38]
Vogt, T., André, E, & Wagner, J. (2008). Automatic recognition of emotions from speech: A review of the literature and recommendations for practical realisation. In Affect and Emotion in Human-Computer Interaction (pp. 75-91). Springer.
[39]
Whitney, A. W. (1971). A direct method of nonparametric measurement selection. IEEE Transactions on Computers, 100(9), 1100-1103.
[40]
Williams, C. E., & Stevens, K. N. (1981). Vocal correlates of emotional states. In J. K. Darby (Ed.), Speech evaluation in psychiatry (pp. 221-240). New York: Grune & Stratton.

Cited By

View all
  • (2024)Machine learning approach of speech emotions recognition using feature fusion techniqueMultimedia Tools and Applications10.1007/s11042-023-16036-y83:3(8663-8688)Online publication date: 1-Jan-2024
  • (2023)An automatic speech recognition system in Indian and foreign languagesIntelligent Decision Technologies10.3233/IDT-22022817:2(505-526)Online publication date: 1-Jan-2023
  • (2023)Speech emotion recognition using multimodal feature fusion with machine learning approachMultimedia Tools and Applications10.1007/s11042-023-15275-382:27(42763-42781)Online publication date: 21-Apr-2023
  • Show More Cited By
  1. An optimal two stage feature selection for speech emotion recognition using acoustic features

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image International Journal of Speech Technology
        International Journal of Speech Technology  Volume 19, Issue 4
        December 2016
        298 pages

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 01 December 2016

        Author Tags

        1. Classification
        2. Feature fusion
        3. Optimal feature set selection
        4. Speech emotion recognition

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 12 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Machine learning approach of speech emotions recognition using feature fusion techniqueMultimedia Tools and Applications10.1007/s11042-023-16036-y83:3(8663-8688)Online publication date: 1-Jan-2024
        • (2023)An automatic speech recognition system in Indian and foreign languagesIntelligent Decision Technologies10.3233/IDT-22022817:2(505-526)Online publication date: 1-Jan-2023
        • (2023)Speech emotion recognition using multimodal feature fusion with machine learning approachMultimedia Tools and Applications10.1007/s11042-023-15275-382:27(42763-42781)Online publication date: 21-Apr-2023
        • (2023)Trends in speech emotion recognition: a comprehensive surveyMultimedia Tools and Applications10.1007/s11042-023-14656-y82:19(29307-29351)Online publication date: 22-Feb-2023
        • (2022)Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural NetworkACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351188821:5(1-23)Online publication date: 29-Apr-2022
        • (2022)Neural network-based blended ensemble learning for speech emotion recognitionMultidimensional Systems and Signal Processing10.1007/s11045-022-00845-933:4(1323-1348)Online publication date: 1-Dec-2022
        • (2022)Speech emotion recognition using optimized genetic algorithm-extreme learning machineMultimedia Tools and Applications10.1007/s11042-022-12747-w81:17(23963-23989)Online publication date: 1-Jul-2022
        • (2022)Speech Emotion Recognition using Time Distributed 2D-Convolution layers for CAPSULENETSMultimedia Tools and Applications10.1007/s11042-022-12112-x81:12(16945-16966)Online publication date: 1-May-2022
        • (2020)Recognition of emotion from speech using evolutionary cepstral coefficientsMultimedia Tools and Applications10.1007/s11042-020-09591-179:47-48(35739-35759)Online publication date: 1-Dec-2020
        • (2020)An efficient state detection of a person by fusion of acoustic and alcoholic features using various classification algorithmsInternational Journal of Speech Technology10.1007/s10772-020-09726-723:3(625-632)Online publication date: 1-Sep-2020

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media