Abstract
Speech-based emotional state recognition must have a significant impact on artificial intelligence as machine learning advances. When it comes to emotion recognition, proper feature selection is critical. As a result, feature fusion technology is offered in this work as a means of achieving high prediction accuracy by emphasizing the extraction of sole features. Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Mel Spectrogram, Short-time Fourier transform (STFT) and Root Mean Square (RMS) are extracted, and four different feature fusion techniques are used on five standard machine learning classifiers: XGBoost, Support Vector Machine (SVM), Random Forest, Decision-Tree (D-Tree), and K Nearest Neighbor (KNN). The successful use of feature fusion techniques on our suggested classifier yields a satisfactory recognition rate of 99.64% on the female only dataset (TESS), 91% on SAVEE (male only dataset) and 86% on CREMA-D (both male and female) dataset. The proposed model shows that effective feature fusion improves the accuracy and applicability of emotion detection systems.
Similar content being viewed by others
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Code availability
Not applicable.
References
Saikat B, Jaybrata C, Arnab B et al (2017) A review on emotion recognition using speech. Int Conf Inventive Commun Comput Technol (ICICCT) (2017):109–114
Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1:6–9
Chen L, Mao X, Xue Y et al (2012) Speech emotion recognition: Features and classification models. Digit Signal Process 22:1154–1160
Akash RC, Anik G, Rahul P, et al. (2018) Emotion Recognition from Speech Signals using Excitation Source and Spectral Features. IEEE Applied Signal Proc (ASPCON) 257–261
Prashengit D, Sunanda G (2021) A System to Predict Emotion from Bengali Speech. Int J Math Sci Comput. https://doi.org/10.5815/IJMSC.2021.01.04
Ghaleb E, Popa M, Asteriadis S (2020) Metric Learning-Based Multimodal Audio-Visual Emotion Recognition. IEEE Multi Med 27(1):37–48. https://doi.org/10.1109/MMUL.2019.2960219
Harimi A, Esmaileyan Z (2014) A database for automatic Persian speech emotion recognition: collection, processing and evaluation. Int J Eng 27:79–90
Ngoc-Huynh H, Hyung-Jeong Y, Soo-Hyung K et al (2020) Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network. IEEE Access 8:61672-61686
Ingale AB, Chaudhari D (2012) Speech emotion recognition. Int J Soft Comput Eng (IJSCE) 2:235–238
Kanwal S, Asghar S (2021) Speech Emotion Recognition Using Clustering Based GA- Optimized Feature Set. IEEE Access 9:125830–125842. https://doi.org/10.1109/ACCESS.2021.3111659
Ko T, Peddinti V, Povey D, Khudanpur S (2015) Audio Augmentation for Speech Recognition
Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23:45–55
Koolagudi SG, Rao KS (2012) Emotion recognition from speech using source, system, and prosodic features. Int J Speech Technol 15:265–289
Kuchibhotla S, Vankayalapati HD, Anne KR (2016) An optimal two stage feature selection for speech emotion recognition using acoustic features. Int J Speech Technol 19:657–667
Kumbhar H, Bhandari S (2019) Speech Emotion Recognition using MFCC features and LSTM network, IEEE International Conference On Computing, Communication, Control And Automation, pp. 1–3
Liu Zhen-Tao, Bao-Han Wu, Li Dan-Yun, Xiao Peng, Mao Jun-Wei (2020) Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment. Sens 20(8):2297
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623
Ooi CS, Seng KP, Ang L-M et al (2014) A new approach of audio emotion recognition. Expert Syst Appl 41:5858–5869
Palo HK, Mohanty MN (2018) Comparative analysis of neural networks for speech emotion recognition. Int J Eng Technol 7:111–126
Yixiong P, Peipei S, Liping S et al (2012) Speech Emotion Recognition Using Support Vector Machine.
Pappagari, R. et al (2020) X-Vectors Meet Emotions: A Study On Dependencies Between Emotion and Speaker Recognition. Int Conf Acoust Speech Signal Process (ICASSP) 7169–7173
Rao KS, Kumar TP, Anusha K et al (2012) Emotion recognition from speech. Int J Comput Sci Inf Technol 3:3603–3607
Shah RD, Anil D, Suthar C (2016) Speech emotion recognition based on SVM using MATLAB. Int J Innov Res Comput Commun Eng 4
Shambhavi S, Nitnaware V (2015) Emotion speech recognition using MFCC and SVM. Int J Eng Res Technol 4:1067–1070
Anwer S, Mohamed H, Mounir Z et al. Emotion recognition from speech using spectrograms and shallow neural networks. Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia (2020): n. pag.
Wang K, An N, Li BN et al (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6:69–75
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423
Author information
Authors and Affiliations
Contributions
The authors’ contributions are summarized below. Sandeep Kumar Panda made substantial contributions to the conception and design and were involved in drafting the manuscript. Ajay Kumar Jena and Mohit Ranjan Panda acquired data and analysis and conducted the interpretation of the data. The critically important intellectual contents of this manuscript were revised by Susmita Panda. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Panda, S.K., Jena, A.K., Panda, M.R. et al. Speech emotion recognition using multimodal feature fusion with machine learning approach. Multimed Tools Appl 82, 42763–42781 (2023). https://doi.org/10.1007/s11042-023-15275-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15275-3