Speech emotion recognition using multimodal feature fusion with machine learning approach

Sandeep Kumar Panda ORCID: orcid.org/0000-0002-0752-4267¹,
Ajay Kumar Jena²,
Mohit Ranjan Panda² &
…
Susmita Panda³

726 Accesses
Explore all metrics

Abstract

Speech-based emotional state recognition must have a significant impact on artificial intelligence as machine learning advances. When it comes to emotion recognition, proper feature selection is critical. As a result, feature fusion technology is offered in this work as a means of achieving high prediction accuracy by emphasizing the extraction of sole features. Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Mel Spectrogram, Short-time Fourier transform (STFT) and Root Mean Square (RMS) are extracted, and four different feature fusion techniques are used on five standard machine learning classifiers: XGBoost, Support Vector Machine (SVM), Random Forest, Decision-Tree (D-Tree), and K Nearest Neighbor (KNN). The successful use of feature fusion techniques on our suggested classifier yields a satisfactory recognition rate of 99.64% on the female only dataset (TESS), 91% on SAVEE (male only dataset) and 86% on CREMA-D (both male and female) dataset. The proposed model shows that effective feature fusion improves the accuracy and applicability of emotion detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Machine learning approach of speech emotions recognition using feature fusion technique

Article 19 June 2023

An optimal two stage feature selection for speech emotion recognition using acoustic features

Article 02 August 2016

Emotion Recognition in Speech Using MFCC and Classifiers

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Code availability

Not applicable.

References

Saikat B, Jaybrata C, Arnab B et al (2017) A review on emotion recognition using speech. Int Conf Inventive Commun Comput Technol (ICICCT) (2017):109–114
Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1:6–9
Google Scholar
Chen L, Mao X, Xue Y et al (2012) Speech emotion recognition: Features and classification models. Digit Signal Process 22:1154–1160
Article MathSciNet Google Scholar
Akash RC, Anik G, Rahul P, et al. (2018) Emotion Recognition from Speech Signals using Excitation Source and Spectral Features. IEEE Applied Signal Proc (ASPCON) 257–261
Prashengit D, Sunanda G (2021) A System to Predict Emotion from Bengali Speech. Int J Math Sci Comput. https://doi.org/10.5815/IJMSC.2021.01.04
Ghaleb E, Popa M, Asteriadis S (2020) Metric Learning-Based Multimodal Audio-Visual Emotion Recognition. IEEE Multi Med 27(1):37–48. https://doi.org/10.1109/MMUL.2019.2960219
Article Google Scholar
Harimi A, Esmaileyan Z (2014) A database for automatic Persian speech emotion recognition: collection, processing and evaluation. Int J Eng 27:79–90
Google Scholar
Ngoc-Huynh H, Hyung-Jeong Y, Soo-Hyung K et al (2020) Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network. IEEE Access 8:61672-61686
Ingale AB, Chaudhari D (2012) Speech emotion recognition. Int J Soft Comput Eng (IJSCE) 2:235–238
Google Scholar
Kanwal S, Asghar S (2021) Speech Emotion Recognition Using Clustering Based GA- Optimized Feature Set. IEEE Access 9:125830–125842. https://doi.org/10.1109/ACCESS.2021.3111659
Article Google Scholar
Ko T, Peddinti V, Povey D, Khudanpur S (2015) Audio Augmentation for Speech Recognition
Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23:45–55
Article Google Scholar
Koolagudi SG, Rao KS (2012) Emotion recognition from speech using source, system, and prosodic features. Int J Speech Technol 15:265–289
Article Google Scholar
Kuchibhotla S, Vankayalapati HD, Anne KR (2016) An optimal two stage feature selection for speech emotion recognition using acoustic features. Int J Speech Technol 19:657–667
Article Google Scholar
Kumbhar H, Bhandari S (2019) Speech Emotion Recognition using MFCC features and LSTM network, IEEE International Conference On Computing, Communication, Control And Automation, pp. 1–3
Liu Zhen-Tao, Bao-Han Wu, Li Dan-Yun, Xiao Peng, Mao Jun-Wei (2020) Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment. Sens 20(8):2297
Article Google Scholar
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623
Article Google Scholar
Ooi CS, Seng KP, Ang L-M et al (2014) A new approach of audio emotion recognition. Expert Syst Appl 41:5858–5869
Article Google Scholar
Palo HK, Mohanty MN (2018) Comparative analysis of neural networks for speech emotion recognition. Int J Eng Technol 7:111–126
Google Scholar
Yixiong P, Peipei S, Liping S et al (2012) Speech Emotion Recognition Using Support Vector Machine.
Pappagari, R. et al (2020) X-Vectors Meet Emotions: A Study On Dependencies Between Emotion and Speaker Recognition. Int Conf Acoust Speech Signal Process (ICASSP) 7169–7173
Rao KS, Kumar TP, Anusha K et al (2012) Emotion recognition from speech. Int J Comput Sci Inf Technol 3:3603–3607
Google Scholar
Shah RD, Anil D, Suthar C (2016) Speech emotion recognition based on SVM using MATLAB. Int J Innov Res Comput Commun Eng 4
Shambhavi S, Nitnaware V (2015) Emotion speech recognition using MFCC and SVM. Int J Eng Res Technol 4:1067–1070
Google Scholar
Anwer S, Mohamed H, Mounir Z et al. Emotion recognition from speech using spectrograms and shallow neural networks. Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia (2020): n. pag.
Wang K, An N, Li BN et al (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6:69–75
Article Google Scholar
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Data Science and Artificial Intelligence, Faculty of Science and Technology (IcfaiTech), ICFAI Foundation for Higher Education (Deemed to Be University), Hyderabad, Telangana, India
Sandeep Kumar Panda
School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha, India
Ajay Kumar Jena & Mohit Ranjan Panda
Department of Computer Science and Engineering, SOA (Deemed to Be University), Bhubaneswar, Odisha, India
Susmita Panda

Authors

Sandeep Kumar Panda
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Kumar Jena
View author publications
You can also search for this author in PubMed Google Scholar
Mohit Ranjan Panda
View author publications
You can also search for this author in PubMed Google Scholar
Susmita Panda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors’ contributions are summarized below. Sandeep Kumar Panda made substantial contributions to the conception and design and were involved in drafting the manuscript. Ajay Kumar Jena and Mohit Ranjan Panda acquired data and analysis and conducted the interpretation of the data. The critically important intellectual contents of this manuscript were revised by Susmita Panda. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sandeep Kumar Panda.

Ethics declarations

Conflicts of interest/Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Panda, S.K., Jena, A.K., Panda, M.R. et al. Speech emotion recognition using multimodal feature fusion with machine learning approach. Multimed Tools Appl 82, 42763–42781 (2023). https://doi.org/10.1007/s11042-023-15275-3

Download citation

Received: 17 May 2022
Revised: 23 July 2022
Accepted: 06 April 2023
Published: 21 April 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11042-023-15275-3

Speech emotion recognition using multimodal feature fusion with machine learning approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine learning approach of speech emotions recognition using feature fusion technique

An optimal two stage feature selection for speech emotion recognition using acoustic features

Emotion Recognition in Speech Using MFCC and Classifiers

Data Availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Speech emotion recognition using multimodal feature fusion with machine learning approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine learning approach of speech emotions recognition using feature fusion technique

An optimal two stage feature selection for speech emotion recognition using acoustic features

Emotion Recognition in Speech Using MFCC and Classifiers

Data Availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now