[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (375)

Search Parameters:
Keywords = emotion recognition system

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 1564 KiB  
Article
Gait-To-Gait Emotional Human–Robot Interaction Utilizing Trajectories-Aware and Skeleton-Graph-Aware Spatial–Temporal Transformer
by Chenghao Li, Kah Phooi Seng and Li-Minn Ang
Sensors 2025, 25(3), 734; https://doi.org/10.3390/s25030734 (registering DOI) - 25 Jan 2025
Viewed by 278
Abstract
The emotional response of robotics is crucial for promoting the socially intelligent level of human–robot interaction (HRI). The development of machine learning has extensively stimulated research on emotional recognition for robots. Our research focuses on emotional gaits, a type of simple modality that [...] Read more.
The emotional response of robotics is crucial for promoting the socially intelligent level of human–robot interaction (HRI). The development of machine learning has extensively stimulated research on emotional recognition for robots. Our research focuses on emotional gaits, a type of simple modality that stores a series of joint coordinates and is easy for humanoid robots to execute. However, a limited amount of research investigates emotional HRI systems based on gaits, indicating an existing gap in human emotion gait recognition and robotic emotional gait response. To address this challenge, we propose a Gait-to-Gait Emotional HRI system, emphasizing the development of an innovative emotion classification model. In our system, the humanoid robot NAO can recognize emotions from human gaits through our Trajectories-Aware and Skeleton-Graph-Aware Spatial–Temporal Transformer (TS-ST) and respond with pre-set emotional gaits that reflect the same emotion as the human presented. Our TS-ST outperforms the current state-of-the-art human-gait emotion recognition model applied to robots on the Emotion-Gait dataset. Full article
17 pages, 3422 KiB  
Article
TheraSense: Deep Learning for Facial Emotion Analysis in Mental Health Teleconsultation
by Hayette Hadjar, Binh Vu and Matthias Hemmje
Electronics 2025, 14(3), 422; https://doi.org/10.3390/electronics14030422 - 22 Jan 2025
Viewed by 354
Abstract
Background: This paper presents TheraSense, a system developed within the Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions and evidence (Smile) and Sensor Enabled Affective Computing for Enhancing Medical Care (SenseCare) projects. TheraSense is designed to enhance teleconsultation services by [...] Read more.
Background: This paper presents TheraSense, a system developed within the Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions and evidence (Smile) and Sensor Enabled Affective Computing for Enhancing Medical Care (SenseCare) projects. TheraSense is designed to enhance teleconsultation services by leveraging deep learning for real-time emotion recognition through facial expressions. It integrates with the Knowledge Management-Ecosystem Portal (SenseCare KM-EP) platform to provide mental health practitioners with valuable emotional insights during remote consultations. Method: We describe the conceptual design of TheraSense, including its use case contexts, architectural structure, and user interface layout. The system’s interoperability is discussed in detail, highlighting its seamless integration within the teleconsultation workflow. The evaluation methods include both quantitative assessments of the video-based emotion recognition system’s performance and qualitative feedback through heuristic evaluation and survey analysis. Results: The performance evaluation shows that TheraSense effectively recognizes emotions in video streams, with positive user feedback on its usability and integration. The system’s real-time emotion detection capabilities provide valuable support for mental health practitioners during remote sessions. Conclusions: TheraSense demonstrates its potential as an innovative tool for enhancing teleconsultation services. By providing real-time emotional insights, it supports better-informed decision-making in mental health care, making it an effective addition to remote telehealth platforms. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Process of facial emotion recognition using CNNs.</p>
Full article ">Figure 2
<p>TheraSense system: use cases.</p>
Full article ">Figure 3
<p>UC 1.3 diagram of TheraSense teleconsultation with emotion detection.</p>
Full article ">Figure 4
<p>TheraSense user interface diagram.</p>
Full article ">Figure 5
<p>TheraSense implementation architecture.</p>
Full article ">Figure 6
<p>Real-time WebSocket architecture for TheraSense.</p>
Full article ">Figure 7
<p>Patient emotion recognition process in the browser during teleconsultation.</p>
Full article ">Figure 8
<p>Patient emotions during consultation.</p>
Full article ">Figure 9
<p>Comparison of real-time face detection models.</p>
Full article ">
15 pages, 259 KiB  
Article
Challenges of Religious and Cultural Diversity in the Child Protection System with Children Migrating “Alone” in Catalonia and Melilla
by Montserrat Freixa Niella, Francisca Ruiz Garzón, Angelina Sánchez-Martí and Ruth Vilà Baños
Religions 2025, 16(2), 109; https://doi.org/10.3390/rel16020109 - 22 Jan 2025
Viewed by 545
Abstract
Cultural and religious diversity in Spain, driven by recent decades of migratory flows, has not been exempt from generating social tensions and, unfortunately, an increasing stigmatization of migrant children. This article examines how power dynamics and exclusion impact the identity construction of these [...] Read more.
Cultural and religious diversity in Spain, driven by recent decades of migratory flows, has not been exempt from generating social tensions and, unfortunately, an increasing stigmatization of migrant children. This article examines how power dynamics and exclusion impact the identity construction of these young people, particularly within the child protection system. Through interviews and focus groups with young people and professionals in Barcelona and Melilla, this study highlights the resistance strategies these young individuals employ to counteract stigmatizing narratives. The findings indicate that, despite inclusion policies, imposed labels reinforce their vulnerability and limit their social and community participation. Although interfaith dialogue is proposed as a tool to mitigate these tensions, professionals working with these children emphasize the lack of institutional support and insufficient training in socio-cultural diversity, which hinders their efforts. The study underscores the importance of developing interfaith competencies that foster mutual respect and recognition, concluding with a critique of the current protection system. It advocates for a comprehensive approach to addressing these young people’s emotional, social, and spiritual needs beyond solely legal and educational aspects. Full article
(This article belongs to the Section Religions and Health/Psychology/Social Sciences)
15 pages, 4304 KiB  
Article
Face and Voice Recognition-Based Emotion Analysis System (EAS) to Minimize Heterogeneity in the Metaverse
by Surak Son and Yina Jeong
Appl. Sci. 2025, 15(2), 845; https://doi.org/10.3390/app15020845 - 16 Jan 2025
Viewed by 644
Abstract
The metaverse, where users interact through avatars, is evolving to closely mirror the real world, requiring realistic object responses based on users’ emotions. While technologies like eye-tracking and hand-tracking transfer physical movements into virtual spaces, accurate emotion detection remains challenging. This study proposes [...] Read more.
The metaverse, where users interact through avatars, is evolving to closely mirror the real world, requiring realistic object responses based on users’ emotions. While technologies like eye-tracking and hand-tracking transfer physical movements into virtual spaces, accurate emotion detection remains challenging. This study proposes the “Face and Voice Recognition-based Emotion Analysis System (EAS)” to bridge this gap, assessing emotions through both voice and facial expressions. EAS utilizes a microphone and camera to gauge emotional states, combining these inputs for a comprehensive analysis. It comprises three neural networks: the Facial Emotion Analysis Model (FEAM), which classifies emotions using facial landmarks; the Voice Sentiment Analysis Model (VSAM), which detects vocal emotions even in noisy environments using MCycleGAN; and the Metaverse Emotion Recognition Model (MERM), which integrates FEAM and VSAM outputs to infer overall emotional states. EAS’s three primary modules—Facial Emotion Recognition, Voice Emotion Recognition, and User Emotion Analysis—analyze facial features and vocal tones to detect emotions, providing a holistic emotional assessment for realistic interactions in the metaverse. The system’s performance is validated through dataset testing, and future directions are suggested based on simulation outcomes. Full article
Show Figures

Figure 1

Figure 1
<p>The operation process of the facial expression-based emotion recognition and voice-based emotion recognition models used in EAS.</p>
Full article ">Figure 2
<p>The neural network models A, B, C, and D configured for the FEAM.</p>
Full article ">Figure 3
<p>Architecture of MCycle GAN(4).</p>
Full article ">Figure 4
<p>The architecture of MERM.</p>
Full article ">Figure 5
<p>The training and validation accuracy for FEAM models A–D across epochs during the training process.</p>
Full article ">Figure 6
<p>A graph comparing the accuracy of each model using test data after 500 iterations of training.</p>
Full article ">Figure 7
<p>Accuracy of existing CNN models confirmed with test data.</p>
Full article ">Figure 8
<p>The F1 scores for each model.</p>
Full article ">Figure 9
<p>The loss of the generator and the loss of the discriminator during the training of each model over 40 epochs.</p>
Full article ">Figure 9 Cont.
<p>The loss of the generator and the loss of the discriminator during the training of each model over 40 epochs.</p>
Full article ">
40 pages, 7115 KiB  
Article
Emotion Recognition from EEG Signals Using Advanced Transformations and Deep Learning
by Jonathan Axel Cruz-Vazquez, Jesús Yaljá Montiel-Pérez, Rodolfo Romero-Herrera and Elsa Rubio-Espino
Mathematics 2025, 13(2), 254; https://doi.org/10.3390/math13020254 - 14 Jan 2025
Viewed by 1185
Abstract
Affective computing aims to develop systems capable of effectively interacting with people through emotion recognition. Neuroscience and psychology have established models that classify universal human emotions, providing a foundational framework for developing emotion recognition systems. Brain activity related to emotional states can be [...] Read more.
Affective computing aims to develop systems capable of effectively interacting with people through emotion recognition. Neuroscience and psychology have established models that classify universal human emotions, providing a foundational framework for developing emotion recognition systems. Brain activity related to emotional states can be captured through electroencephalography (EEG), enabling the creation of models that classify emotions even in uncontrolled environments. In this study, we propose an emotion recognition model based on EEG signals using deep learning techniques on a proprietary database. To improve the separability of emotions, we explored various data transformation techniques, including Fourier Neural Networks and quantum rotations. The convolutional neural network model, combined with quantum rotations, achieved a 95% accuracy in emotion classification, particularly in distinguishing sad emotions. The integration of these transformations can further enhance overall emotion recognition performance. Full article
(This article belongs to the Special Issue Deep Neural Networks: Theory, Algorithms and Applications)
Show Figures

Figure 1

Figure 1
<p>Emotion classification processing workflow with EEG signals.</p>
Full article ">Figure 2
<p>Recording protocol and stimulus exposure at different times of the day.</p>
Full article ">Figure 3
<p>Recording room for the experiment.</p>
Full article ">Figure 4
<p>(<b>a</b>) Diagram of electrode positions on the scalp according to the 10–20 system; (<b>b</b>) representation of brain regions corresponding to the electrodes.</p>
Full article ">Figure 5
<p>Graph of raw EEG signals from 14 channels over time.</p>
Full article ">Figure 6
<p>Notch filter application: (<b>a</b>) before applying the notch filter, (<b>b</b>) after applying the notch filter.</p>
Full article ">Figure 7
<p>Spatial distribution of independent components (ICA).</p>
Full article ">Figure 8
<p>Detailed analysis of ICA004 component: (<b>a</b>) topographic map, (<b>b</b>) segment image and ERP/ERF, (<b>c</b>) frequency spectrum, (<b>d</b>) dropped segments.</p>
Full article ">Figure 9
<p>Compares EEG signals before and after cleaning artifacts using ICA.</p>
Full article ">Figure 10
<p>Recording of EEG signals from 14 channels during ERP segmentation.</p>
Full article ">Figure 11
<p>Average ERP signals across the 14 EEG channels.</p>
Full article ">Figure 12
<p>Topographic maps of temporal evolution of an ERP.</p>
Full article ">Figure 13
<p>Representation of the 14 EEG channels and topographic maps of an ERP.</p>
Full article ">Figure 14
<p>Scatter plots of extracted features.</p>
Full article ">Figure 15
<p>Correlation matrix of EEG time and frequency domain features.</p>
Full article ">Figure 16
<p>Distribution of features transformed with the Fourier Neural Network.</p>
Full article ">Figure 17
<p>Quantum rotations: (<b>a</b>) emotions before applying quantum rotations, (<b>b</b>) emotions after applying quantum rotations.</p>
Full article ">Figure 18
<p>Quantum rotated features for different emotional states: (<b>a</b>) quantum-rotated features for happy, (<b>b</b>) quantum-rotated features for sad, (<b>c</b>) quantum-rotated features for neutral.</p>
Full article ">Figure 19
<p>Performance of dense network with Fourier features: (<b>a</b>) confusion matrix, (<b>b</b>) precision curves for training and validation, (<b>c</b>) loss curves for training and validation.</p>
Full article ">Figure 20
<p>Performance of dense network with quantum-rotated features: (<b>a</b>) confusion matrix, (<b>b</b>) precision curves for training and validation, (<b>c</b>) loss curves for training and validation.</p>
Full article ">Figure 21
<p>Performance of convolutional neural network (CNN) with Fourier features: (<b>a</b>) confusion matrix, (<b>b</b>) precision curves for training and validation, (<b>c</b>) loss curves for training and validation.</p>
Full article ">Figure 22
<p>Performance of convolutional neural network (CNN) with quantum-rotated features: (<b>a</b>) confusion matrix, (<b>b</b>) precision curves for training and validation, (<b>c</b>) loss curves for training and validation.</p>
Full article ">
25 pages, 18134 KiB  
Article
Advancing Emotion Recognition: EEG Analysis and Machine Learning for Biomedical Human–Machine Interaction
by Sara Reis, Luís Pinto-Coelho, Maria Sousa, Mariana Neto and Marta Silva
BioMedInformatics 2025, 5(1), 5; https://doi.org/10.3390/biomedinformatics5010005 - 10 Jan 2025
Viewed by 648
Abstract
Background: Human emotions are subjective psychophysiological processes that play an important role in the daily interactions of human life. Emotions often do not manifest themselves in isolation; people can experience a mixture of them and may not express them in a visible or [...] Read more.
Background: Human emotions are subjective psychophysiological processes that play an important role in the daily interactions of human life. Emotions often do not manifest themselves in isolation; people can experience a mixture of them and may not express them in a visible or perceptible way; Methods: This study seeks to uncover EEG patterns linked to emotions, as well as to examine brain activity across emotional states and optimise machine learning techniques for accurate emotion classification. For these purposes, the DEAP dataset was used to comprehensively analyse electroencephalogram (EEG) data and understand how emotional patterns can be observed. Machine learning algorithms, such as SVM, MLP, and RF, were implemented to predict valence and arousal classifications for different combinations of frequency bands and brain regions; Results: The analysis reaffirms the value of EEG as a tool for objective emotion detection, demonstrating its potential in both clinical and technological contexts. By highlighting the benefits of using fewer electrodes, this study emphasises the feasibility of creating more accessible and user-friendly emotion recognition systems; Conclusions: Further improvements in feature extraction and model generalisation are necessary for clinical applications. This study highlights not only the potential of emotion classification to develop biomedical applications, but also to enhance human–machine interaction systems. Full article
Show Figures

Figure 1

Figure 1
<p>Brain anatomy with the emotion-related functional areas labelled (adapted from InjuryMap, CC-BY-SA-4.0, <a href="https://commons.wikimedia.org/wiki/File:Brain_anatomy.svg" target="_blank">https://commons.wikimedia.org/wiki/File:Brain_anatomy.svg</a>, accessed on 5 January 2025).</p>
Full article ">Figure 2
<p>Plutchik’s wheel of emotions [<a href="#B16-biomedinformatics-05-00005" class="html-bibr">16</a>].</p>
Full article ">Figure 3
<p>Human brain structure [<a href="#B18-biomedinformatics-05-00005" class="html-bibr">18</a>].</p>
Full article ">Figure 4
<p>Block diagram of the proposed emotion classification system.</p>
Full article ">Figure 5
<p>Irregularity detection with the HBOS algorithm.</p>
Full article ">Figure 6
<p>Welch periodogram: (<b>a</b>) Fp1 electrode; (<b>b</b>) AF3 electrode; (<b>c</b>) F3 electrode.</p>
Full article ">Figure 7
<p>Welch periodogram for electrode Fp1: (<b>a</b>) theta wave; (<b>b</b>) alpha wave; (<b>c</b>) beta wave; (<b>d</b>) gamma wave.</p>
Full article ">Figure 7 Cont.
<p>Welch periodogram for electrode Fp1: (<b>a</b>) theta wave; (<b>b</b>) alpha wave; (<b>c</b>) beta wave; (<b>d</b>) gamma wave.</p>
Full article ">Figure 8
<p>Classification of videos watched in terms of arousal and valence levels.</p>
Full article ">Figure 9
<p>Statistical analysis of the combination of valence and arousal.</p>
Full article ">Figure 10
<p>Topographic map for the theta wave.</p>
Full article ">Figure 11
<p>Topographic map for the alpha wave.</p>
Full article ">Figure 12
<p>Topographic map for the beta wave.</p>
Full article ">Figure 13
<p>Topographic map for the gamma wave.</p>
Full article ">Figure 14
<p>Topographical map for the HAHV emotional state: (<b>a</b>) theta wave; (<b>b</b>) alpha wave; (<b>c</b>) beta wave; (<b>d</b>) gamma wave.</p>
Full article ">Figure 15
<p>Topographical map for the HALV emotional state: (<b>a</b>) theta wave; (<b>b</b>) alpha wave; (<b>c</b>) beta wave; (<b>d</b>) gamma wave.</p>
Full article ">Figure 16
<p>Topographical map for the LAHV emotional state: (<b>a</b>) theta wave; (<b>b</b>) alpha wave; (<b>c</b>) beta wave; (<b>d</b>) gamma wave.</p>
Full article ">Figure 17
<p>Topographical map for the LALV emotional state: (<b>a</b>) theta wave; (<b>b</b>) alpha wave; (<b>c</b>) beta wave; (<b>d</b>) gamma wave.</p>
Full article ">Figure 18
<p>Statistical analysis of the topographic maps for the different emotional states (HALV, HAHV, LALV, LAHV).</p>
Full article ">Figure 19
<p>Prediction of valence labels: (<b>a</b>) Theta wave in the parietal region; (<b>b</b>) Beta wave in the frontal region; (<b>c</b>) Gamma wave in the parietal region; (<b>d</b>) Alpha wave in the occipital region.</p>
Full article ">Figure 20
<p>Prediction of arousal labels: (<b>a</b>) Theta wave in the parietal region; (<b>b</b>) Beta wave in the frontal region; (<b>c</b>) Gamma wave in the parietal region; (<b>d</b>) Alpha wave in the occipital region.</p>
Full article ">
47 pages, 6533 KiB  
Review
Affective Computing for Learning in Education: A Systematic Review and Bibliometric Analysis
by Rajamanickam Yuvaraj, Rakshit Mittal, A. Amalin Prince and Jun Song Huang
Educ. Sci. 2025, 15(1), 65; https://doi.org/10.3390/educsci15010065 - 10 Jan 2025
Viewed by 1192
Abstract
Affective computing is an emerging area of education research and has the potential to enhance educational outcomes. Despite the growing number of literature studies, there are still deficiencies and gaps in the domain of affective computing in education. In this study, we systematically [...] Read more.
Affective computing is an emerging area of education research and has the potential to enhance educational outcomes. Despite the growing number of literature studies, there are still deficiencies and gaps in the domain of affective computing in education. In this study, we systematically review affective computing in the education domain. Methods: We queried four well-known research databases, namely the Web of Science Core Collection, IEEE Xplore, ACM Digital Library, and PubMed, using specific keywords for papers published between January 2010 and July 2023. Various relevant data items are extracted and classified based on a set of 15 extensive research questions. Following the PRISMA 2020 guidelines, a total of 175 studies were selected and reviewed in this work from among 3102 articles screened. The data show an increasing trend in publications within this domain. The most common research purpose involves designing emotion recognition/expression systems. Conventional textual questionnaires remain the most popular channels for affective measurement. Classrooms are identified as the primary research environments; the largest research sample group is university students. Learning domains are mainly associated with science, technology, engineering, and mathematics (STEM) courses. The bibliometric analysis reveals that most publications are affiliated with the USA. The studies are primarily published in journals, with the majority appearing in the Frontiers in Psychology journal. Research gaps, challenges, and potential directions for future research are explored. This review synthesizes current knowledge regarding the application of affective computing in the education sector. This knowledge is useful for future directions to help educational researchers, policymakers, and practitioners deploy affective computing technology to broaden educational practices. Full article
Show Figures

Figure 1

Figure 1
<p>The review protocol (following PRISMA 2020 guidelines flow diagram) used in this study.</p>
Full article ">Figure 2
<p>Learning domain distribution.</p>
Full article ">Figure 3
<p>Distribution of channels for affective measurement utilized.</p>
Full article ">Figure 4
<p>Learning environment distribution.</p>
Full article ">Figure 5
<p>Sample size histogram (for size &lt;500).</p>
Full article ">Figure 6
<p>Sample age range of learners in reviewed articles. Note: Given the extensive number of references involved in data extraction, we have chosen not to list them alongside the figure captions. This convention is applied across the subsequent figures. All relevant sources are properly cited within the main text.</p>
Full article ">Figure 7
<p>Sample group distribution of reviewed articles.</p>
Full article ">Figure 8
<p>Accuracy distribution of interventions in reviewed articles.</p>
Full article ">Figure 9
<p>Emotions and frequency reported in reviewed articles.</p>
Full article ">Figure 10
<p>Temporal trends of reviewed articles.</p>
Full article ">Figure 11
<p>Publication venue distribution.</p>
Full article ">Figure 12
<p>Availability of reviewed articles.</p>
Full article ">Figure 13
<p>Quality score distribution of reviewed articles.</p>
Full article ">Figure 14
<p>Research purpose distribution of reviewed articles.</p>
Full article ">Figure 15
<p>Citation links of reviewed articles. The normalization method used is the association strength normalization method. The resolution is 3.00, and the minimum cluster size is 2.</p>
Full article ">Figure 16
<p>Citation links of authoring countries of reviewed articles. The normalization method used is the association strength normalization method. The resolution is 3.00, and the minimum cluster size is 2.</p>
Full article ">Figure 17
<p>Co-citation links of the references in the corpus. The normalization method used is the association strength normalization method, with a resolution = 3.00 and minimum cluster size = 2.</p>
Full article ">Figure 18
<p>Bibliometric coupling links of reviewed articles. The normalization method used is the association strength normalization method, with a resolution = 3.00 and minimum cluster size = 2.</p>
Full article ">
24 pages, 3261 KiB  
Article
A Video-Based Cognitive Emotion Recognition Method Using an Active Learning Algorithm Based on Complexity and Uncertainty
by Hongduo Wu, Dong Zhou, Ziyue Guo, Zicheng Song, Yu Li, Xingzheng Wei and Qidi Zhou
Appl. Sci. 2025, 15(1), 462; https://doi.org/10.3390/app15010462 - 6 Jan 2025
Viewed by 450
Abstract
The cognitive emotions of individuals during tasks largely determine the success or failure of tasks in various fields such as the military, medical, industrial fields, etc. Facial video data can carry more emotional information than static images because emotional expression is a temporal [...] Read more.
The cognitive emotions of individuals during tasks largely determine the success or failure of tasks in various fields such as the military, medical, industrial fields, etc. Facial video data can carry more emotional information than static images because emotional expression is a temporal process. Video-based Facial Expression Recognition (FER) has received increasing attention from the relevant scholars in recent years. However, due to the high cost of marking and training video samples, feature extraction is inefficient and ineffective, which leads to a low accuracy and poor real-time performance. In this paper, a cognitive emotion recognition method based on video data is proposed, in which 49 emotion description points were initially defined, and the spatial–temporal features of cognitive emotions were extracted from the video data through a feature extraction method that combines geodesic distances and sample entropy. Then, an active learning algorithm based on complexity and uncertainty was proposed to automatically select the most valuable samples, thereby reducing the cost of sample labeling and model training. Finally, the effectiveness, superiority, and real-time performance of the proposed method were verified utilizing the MMI Facial Expression Database and some real-time-collected data. Through comparisons and testing, the proposed method showed satisfactory real-time performance and a higher accuracy, which can effectively support the development of a real-time monitoring system for cognitive emotions. Full article
(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)
Show Figures

Figure 1

Figure 1
<p>PAD model.</p>
Full article ">Figure 2
<p>Framework of the proposed cognitive emotion recognition method.</p>
Full article ">Figure 3
<p>Schematic diagram of the proposed emotion description points: (<b>a</b>) original picture, (<b>b</b>) emotion description points on the picture.</p>
Full article ">Figure 4
<p>Process of positioning emotion description points based on AAM.</p>
Full article ">Figure 5
<p>The flow diagram of the active learning algorithm based on uncertainty and complexity.</p>
Full article ">Figure 6
<p>Examples of selected facial expression videos.</p>
Full article ">Figure 7
<p>The average feature curves of the 4 cognitive emotions based on “geodesic distance + sample entropy”: (<b>a</b>) frustration, (<b>b</b>) boredom, (<b>c</b>) doubt, (<b>d</b>) pleasure.</p>
Full article ">Figure 8
<p>The relationship between the performance of the proposed method and the number of samples selected through active learning.</p>
Full article ">Figure 9
<p>The confusion matrix with the best average F1-score (91.99%).</p>
Full article ">Figure 10
<p>Real-time recognition performance of the developed cognitive emotion recognition application.</p>
Full article ">
21 pages, 3818 KiB  
Article
EEG-Based Emotion Recognition with Combined Fuzzy Inference via Integrating Weighted Fuzzy Rule Inference and Interpolation
by Fangyi Li, Fusheng Yu, Liang Shen, Hexi Li, Xiaonan Yang and Qiang Shen
Mathematics 2025, 13(1), 166; https://doi.org/10.3390/math13010166 - 5 Jan 2025
Viewed by 723
Abstract
Emotions play a significant role in shaping psychological activities, behaviour, and interpersonal communication. Reflecting this importance, automated emotion classification has become a vital research area in artificial intelligence. Electroencephalogram (EEG)-based emotion recognition is particularly promising due to its high temporal resolution and resistance [...] Read more.
Emotions play a significant role in shaping psychological activities, behaviour, and interpersonal communication. Reflecting this importance, automated emotion classification has become a vital research area in artificial intelligence. Electroencephalogram (EEG)-based emotion recognition is particularly promising due to its high temporal resolution and resistance to manipulation. This study introduces an advanced fuzzy inference algorithm for EEG data-driven emotion recognition, effectively addressing the ambiguity of emotional states. By combining adaptive fuzzy rule generation, feature evaluation, and weighted fuzzy rule interpolation, the proposed approach achieves accurate emotion classification while handling incomplete knowledge. Experimental results demonstrate that the integrated fuzzy system outperforms state-of-the-art techniques, offering improved recognition accuracy and robustness under uncertainty. Full article
(This article belongs to the Special Issue The Recent Advances in Computational Intelligence)
Show Figures

Figure 1

Figure 1
<p>Framework of fuzzy inference system for EEG-based emotion recognition.</p>
Full article ">Figure 2
<p>Flowchart of EEG emotion recognition via weighted fuzzy reasoning.</p>
Full article ">Figure 3
<p>Example of fuzzy partition of emotional dimension.</p>
Full article ">Figure 4
<p>Emotional rule base generation method via fuzzy space partitioning.</p>
Full article ">Figure 5
<p>Fuzzy partitions of four subjects for arousal emotion dimension.</p>
Full article ">Figure 6
<p>Fuzzy partitions of four subjects for valence emotion dimension.</p>
Full article ">Figure 7
<p>Classification performance over individual subject for arousal.</p>
Full article ">Figure 8
<p>Classification performance over individual subject for valence.</p>
Full article ">
27 pages, 964 KiB  
Article
An Examination of the Leadership and Management Traits and Style in the Forest Fire Incident Command System: The Cyprus Forest Fire Service
by Nicolas-George Homer Eliades, Achilleas Karayiannis, Georgios Tsantopoulos and Spyros Galatsidas
Fire 2025, 8(1), 6; https://doi.org/10.3390/fire8010006 - 26 Dec 2024
Viewed by 606
Abstract
Since the early 21st century, wildlands have witnessed an effusion of wildfires, with climate and social changes resulting in unanticipated wildfire activity and impact. For forest fires to be prevented and suppressed effectively, forest firefighting forces have adopted a specific administrative system for [...] Read more.
Since the early 21st century, wildlands have witnessed an effusion of wildfires, with climate and social changes resulting in unanticipated wildfire activity and impact. For forest fires to be prevented and suppressed effectively, forest firefighting forces have adopted a specific administrative system for organizing and managing the fighting force. Under the administrative system, a debate on desired “leadership and management qualities” arises, and hence, this study sought to identify the leadership and management traits that should distinguish individuals in the forest fire incident command system (FFICS) applied by the Department of Forests (Cyprus). The research subject was addressed using mixed method research, employing quantitative and qualitative data. Both datasets were used to distinguish the purposes of the applied triangulation, enabling the examination of differentiation between the trends/positions recorded in terms of the object of study. These findings point to ideal forms of transformational leadership and neoclassical management. The outcomes suggest that at the individual level, the leaders of each of the operating structures should develop leadership qualities related to emotional intelligence, empathy, judgment, critical thinking, and especially self-awareness of strengths and weaknesses. At the stage of pre-suppression, a democratic leadership style (or guiding style) is supported, while during the operational progress stage of the FFICS, a “hybrid” leadership style is suggested, borrowing elements from the democratic and authoritarian (or managerial) leadership styles. The administrative skills of FFICS leaders should include the moral and psychological rewards of subordinates, job satisfaction and recognition, and two-way communication. The current study illustrates the need for divergent leadership and management traits and styles among the different hierarchical structures of the FFICS. Full article
Show Figures

Figure 1

Figure 1
<p>Chart flow of the data analysis design for the current study.</p>
Full article ">
16 pages, 1101 KiB  
Article
Enhancing Human–Robot Interaction: Development of Multimodal Robotic Assistant for User Emotion Recognition
by Sergio Garcia, Francisco Gomez-Donoso and Miguel Cazorla
Appl. Sci. 2024, 14(24), 11914; https://doi.org/10.3390/app142411914 - 19 Dec 2024
Viewed by 896
Abstract
This paper presents a study on enhancing human–robot interaction (HRI) through multimodal emotional recognition within social robotics. Using the humanoid robot Pepper as a testbed, we integrate visual, auditory, and textual analysis to improve emotion recognition accuracy and contextual understanding. The proposed framework [...] Read more.
This paper presents a study on enhancing human–robot interaction (HRI) through multimodal emotional recognition within social robotics. Using the humanoid robot Pepper as a testbed, we integrate visual, auditory, and textual analysis to improve emotion recognition accuracy and contextual understanding. The proposed framework combines pretrained neural networks with fine-tuning techniques tailored to specific users, demonstrating that high accuracy in emotion recognition can be achieved by adapting the models to the individual emotional expressions of each user. This approach addresses the inherent variability in emotional expression across individuals, making it feasible to deploy personalized emotion recognition systems. Our experiments validate the effectiveness of this methodology, achieving high precision in multimodal emotion recognition through fine-tuning, while maintaining adaptability in real-world scenarios. These enhancements significantly improve Pepper’s interactive and empathetic capabilities, allowing it to engage more naturally with users in assistive, educational, and healthcare settings. This study not only advances the field of HRI but also provides a reproducible framework for integrating multimodal emotion recognition into commercial humanoid robots, bridging the gap between research prototypes and practical applications. Full article
Show Figures

Figure 1

Figure 1
<p>Interaction flow diagram.</p>
Full article ">Figure 2
<p>Examples of different images to describe. (<b>a</b>) Student in a classroom. (<b>b</b>) Person in an urban street.</p>
Full article ">Figure 3
<p>Accuracy graph of the intention classifier.</p>
Full article ">Figure 4
<p>Accuracy graph of the fine-tuned facial emotion recognition model using the personalized dataset.</p>
Full article ">Figure 5
<p>Accuracy graph of the fine-tuned auditory emotion recognition model with the personalized dataset.</p>
Full article ">
17 pages, 918 KiB  
Article
Fractal Analysis of Electrodermal Activity for Emotion Recognition: A Novel Approach Using Detrended Fluctuation Analysis and Wavelet Entropy
by Luis R. Mercado-Diaz, Yedukondala Rao Veeranki, Edward W. Large and Hugo F. Posada-Quintero
Sensors 2024, 24(24), 8130; https://doi.org/10.3390/s24248130 - 19 Dec 2024
Viewed by 618
Abstract
The field of emotion recognition from physiological signals is a growing area of research with significant implications for both mental health monitoring and human–computer interaction. This study introduces a novel approach to detecting emotional states based on fractal analysis of electrodermal activity (EDA) [...] Read more.
The field of emotion recognition from physiological signals is a growing area of research with significant implications for both mental health monitoring and human–computer interaction. This study introduces a novel approach to detecting emotional states based on fractal analysis of electrodermal activity (EDA) signals. We employed detrended fluctuation analysis (DFA), Hurst exponent estimation, and wavelet entropy calculation to extract fractal features from EDA signals obtained from the CASE dataset, which contains physiological recordings and continuous emotion annotations from 30 participants. The analysis revealed significant differences in fractal features across five emotional states (neutral, amused, bored, relaxed, and scared), particularly those derived from wavelet entropy. A cross-correlation analysis showed robust correlations between fractal features and both the arousal and valence dimensions of emotion, challenging the conventional view of EDA as a predominantly arousal-indicating measure. The application of machine learning for emotion classification using fractal features achieved a leave-one-subject-out accuracy of 84.3% and an F1 score of 0.802, surpassing the performance of previous methods on the same dataset. This study demonstrates the potential of fractal analysis in capturing the intricate, multi-scale dynamics of EDA signals for emotion recognition, opening new avenues for advancing emotion-aware systems and affective computing applications. Full article
(This article belongs to the Special Issue Advanced Signal Processing for Affective Computing)
Show Figures

Figure 1

Figure 1
<p>The circumplex model of emotions.</p>
Full article ">Figure 2
<p>DFA and Hurst exponent moving average following the arousal and valance trends.</p>
Full article ">Figure 3
<p>Distribution of top wavelet entropy features across emotional states.</p>
Full article ">
17 pages, 2088 KiB  
Article
Personalized Clustering for Emotion Recognition Improvement
by Laura Gutiérrez-Martín, Celia López-Ongil, Jose M. Lanza-Gutiérrez and Jose A. Miranda Calero
Sensors 2024, 24(24), 8110; https://doi.org/10.3390/s24248110 - 19 Dec 2024
Viewed by 528
Abstract
Emotion recognition through artificial intelligence and smart sensing of physical and physiological signals (affective computing) is achieving very interesting results in terms of accuracy, inference times, and user-independent models. In this sense, there are applications related to the safety and well-being of people [...] Read more.
Emotion recognition through artificial intelligence and smart sensing of physical and physiological signals (affective computing) is achieving very interesting results in terms of accuracy, inference times, and user-independent models. In this sense, there are applications related to the safety and well-being of people (sexual assaults, gender-based violence, children and elderly abuse, mental health, etc.) that require even more improvements. Emotion detection should be done with fast, discrete, and non-luxurious systems working in real time and real life (wearable devices, wireless communications, battery-powered). Furthermore, emotional reactions to violence are not equal in all people. Then, large general models cannot be applied to a multi-user system for people protection, and health and social workers and law enforcement agents would welcome customized and lightweight AI models. These semi-personalized models will be applicable to clusters of subjects sharing similarities in their emotional reactions to external stimuli. This customization requires several steps: creating clusters of subjects with similar behaviors, creating AI models for every cluster, continually updating these models with new data, and enrolling new subjects in clusters when required. An initial approach for clustering labeled data compiled (physiological data, together with emotional labels) is presented in this work, as well as the method to ensure the enrollment of new users with unlabeled data once the AI models are generated. The idea is that this complete methodology can be exportable to any other expert systems where unlabeled data are added during in-field operation and different profiles exist in terms of data. Experimental results demonstrate an improvement of 5% in accuracy and 4% in F1 score with respect to our baseline general model, along with a 32% to 58% reduction in variability, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>User profile clustering based on labeled observations training and testing scheme.</p>
Full article ">Figure 2
<p>Unlabeled observation clustering assignment scheme.</p>
Full article ">Figure 3
<p>Scheme of phase 1 and 2 of the experimental procedure for evaluating the impact of the methodologies on fear detection.</p>
Full article ">Figure 4
<p>Scheme of phase 3 of the experimental procedure for evaluating the impact of the methodologies on fear detection.</p>
Full article ">Figure 5
<p>Average performance metrics (accuracy and F1 score) per typology cluster with a parameter sweep for the general model contribution threshold. 0: only personalized model; 1: only general model.</p>
Full article ">
40 pages, 20840 KiB  
Article
Facial Biosignals Time–Series Dataset (FBioT): A Visual–Temporal Facial Expression Recognition (VT-FER) Approach
by João Marcelo Silva Souza, Caroline da Silva Morais Alves, Jés de Jesus Fiais Cerqueira, Wagner Luiz Alves de Oliveira, Orlando Mota Pires, Naiara Silva Bonfim dos Santos, Andre Brasil Vieira Wyzykowski, Oberdan Rocha Pinheiro, Daniel Gomes de Almeida Filho, Marcelo Oliveira da Silva and Josiane Dantas Viana Barbosa
Electronics 2024, 13(24), 4867; https://doi.org/10.3390/electronics13244867 - 10 Dec 2024
Viewed by 646
Abstract
Visual biosignals can be used to analyze human behavioral activities and serve as a primary resource for Facial Expression Recognition (FER). FER computational systems face significant challenges, arising from both spatial and temporal effects. Spatial challenges include deformations or occlusions of facial geometry, [...] Read more.
Visual biosignals can be used to analyze human behavioral activities and serve as a primary resource for Facial Expression Recognition (FER). FER computational systems face significant challenges, arising from both spatial and temporal effects. Spatial challenges include deformations or occlusions of facial geometry, while temporal challenges involve discontinuities in motion observation due to high variability in poses and dynamic conditions such as rotation and translation. To enhance the analytical precision and validation reliability of FER systems, several datasets have been proposed. However, most of these datasets focus primarily on spatial characteristics, rely on static images, or consist of short videos captured in highly controlled environments. These constraints significantly reduce the applicability of such systems in real-world scenarios. This paper proposes the Facial Biosignals Time–Series Dataset (FBioT), a novel dataset providing temporal descriptors and features extracted from common videos recorded in uncontrolled environments. To automate dataset construction, we propose Visual–Temporal Facial Expression Recognition (VT-FER), a method that stabilizes temporal effects using normalized measurements based on the principles of the Facial Action Coding System (FACS) and generates signature patterns of expression movements for correlation with real-world temporal events. To demonstrate feasibility, we applied the method to create a pilot version of the FBioT dataset. This pilot resulted in approximately 10,000 s of public videos captured under real-world facial motion conditions, from which we extracted 22 direct and virtual metrics representing facial muscle deformations. During this process, we preliminarily labeled and qualified 3046 temporal events representing two emotion classes. As a proof of concept, these emotion classes were used as input for training neural networks, with results summarized in this paper and available in an open-source online repository. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>An illustration of the complete process, from face detection to the semantic level, where each face image is correlated with labeled events. The steps include: (1) face cropping, (2) facial landmark detection, (3) landmark normalization, (4) feature extraction, and (5) analysis and event correlation. Illustration created by the authors. Image of the person generated by AI [<a href="#B55-electronics-13-04867" class="html-bibr">55</a>].</p>
Full article ">Figure 2
<p>The proposed pipeline for generating the FBioT dataset consists of the following modules: Flow [A] (Indexer, Feature Extractor L1, Video Adjuster, Measure Maker), and Flow [B] (Manual and Automatic Labelers). Each module produces its own dataset as output. Indexing can be performed using streaming videos or local videos.</p>
Full article ">Figure 3
<p>The Feature Extractor L1 module extracts image features, including (1)–(2) from the region of interest, (3) the main facial features, and (4) facial landmarks. These features are utilized to (5) identify and standardize the biosignals, which each point has the coordinates X and Y. Illustration created by the authors. Image of the person generated by AI [<a href="#B55-electronics-13-04867" class="html-bibr">55</a>].</p>
Full article ">Figure 4
<p>An example illustrating how changes in image dimensionality occur due to camera movement along the Z axis in (2) and (3), with (1) demonstrating the effect of dimensionality normalization. Illustration created by the authors. Image of the person generated by AI [<a href="#B55-electronics-13-04867" class="html-bibr">55</a>].</p>
Full article ">Figure 5
<p>Example of the same open mouth seen from different poses resulting in distortions in the absolute pixel-by-pixel measurements (red line). With the Video Adjuster module it is possible to estimate these distortions and calculate that the measurements belong to the same mouth opening (blue line). Three-dimensional model by [<a href="#B60-electronics-13-04867" class="html-bibr">60</a>].</p>
Full article ">Figure 6
<p>(<b>a</b>) Example of a schematic representation of the theoretical landmarks of the FACS system for action unit detection. Image of the person generated by AI [<a href="#B55-electronics-13-04867" class="html-bibr">55</a>]. (<b>b</b>) Diagram of the landmarks detected by the Dlib model, where the enumeration corresponds to the following: face contours (1–17); eyebrows left (18–22); eyebrows right (23–27); nose top (28–31); nose base (32–36); eye left (37–42); eye right (43–48); mouth and lips (49–68). Illustration adapted from [<a href="#B61-electronics-13-04867" class="html-bibr">61</a>].</p>
Full article ">Figure 7
<p>Example of measurement acquisition of vertical mouth opening (<math display="inline"><semantics> <msub> <mi>d</mi> <mi>n</mi> </msub> </semantics></math> = distance between 15–19 points, see <a href="#electronics-13-04867-f006" class="html-fig">Figure 6</a>) over time, which results in time–series measurements.</p>
Full article ">Figure 8
<p>The schematic flow of the manual labeling process consists of three steps. In step (1), representative measures are selected to label an expression. In (2), the start and end frames of the movement related to the expression are identified. In (3), the class name and the selected measurements are added to the labeling file on the rows corresponding to the selected interval frames. The facial expression images are adapted from [<a href="#B10-electronics-13-04867" class="html-bibr">10</a>].</p>
Full article ">Figure 9
<p>To identify subseries similar to a given pattern, the Euclidean distance was calculated. In step (1), the measures of interest that characterize the expression are selected, and sequences that are the most similar to the patterns for each measure are identified individually. In step (2), a similarity filter is applied to select intervals where the patterns occurred in both measures of interest. Furthermore, in step (3), the class name and selected measures are added to the labeling file for the frames corresponding to the identified intervals.</p>
Full article ">Figure 10
<p>Schematic representation of the AU1 measurement process, which involves raising the eyebrows. Motion detection measures the distance between the landmarks on the eyebrows and the nasal bone. The facial expression images are adapted from [<a href="#B10-electronics-13-04867" class="html-bibr">10</a>].</p>
Full article ">Figure 11
<p>Temporal evolution of AU9 movement. The facial expression images are adapted from [<a href="#B10-electronics-13-04867" class="html-bibr">10</a>].</p>
Full article ">Figure 12
<p>The graph shows the rotation angles of the BIWI ground truth annotations compared to the results estimated using the model developed (Biosignals) and the OpenFace framework. The values correspond to the processing of video 22 from the BIWI dataset.</p>
Full article ">Figure 13
<p>Normalization result of the M3 measurement from a video, where it is possible to observe that the face remains stable even during translation movement. The facial expression images are adapted from [<a href="#B10-electronics-13-04867" class="html-bibr">10</a>].</p>
Full article ">Figure 14
<p>Result of the dynamic effect of rotation around the Z axis throughout video ID = 2 of the dataset. Values before and after normalization are as follows: mean = −0.06, STD = 5.80; mean = −2.12, STD = 1.06, respectively.</p>
Full article ">Figure 15
<p>Mean and standard deviation for rotation variations across all videos in the dataset.</p>
Full article ">Figure 16
<p>Percentage of face Z axis translation normalization over time for video ID = 21. Values before and after normalization are as follows: mean = 0.36, STD = 0.20; mean = 0.46, STD = 0.07, respectively.</p>
Full article ">Figure 17
<p>Percentage of normalization (translation in Z axis), in terms of mean and standard deviation for all videos in the dataset.</p>
Full article ">Figure 18
<p>A smiling signature of video ID = 21 from the CK dataset with the main measurements over time. The facial expression images are adapted from [<a href="#B10-electronics-13-04867" class="html-bibr">10</a>].</p>
Full article ">Figure 19
<p>Z rotation normalization for video 24 in CK+ dataset.</p>
Full article ">Figure 20
<p>Z translation normalization for video 24 in CK+ dataset.</p>
Full article ">Figure 21
<p>Similarity matrix obtained from the cross-test between the M1 (<b>left</b>) and M12 (<b>right</b>) measurements of each CK+ happy sample.</p>
Full article ">Figure 22
<p>Architecture of the network with reference data, consisting of four layers: sequential, LSTM, dropout, and dense.</p>
Full article ">Figure 23
<p>Training results: accuracy and loss curves for the reference network.</p>
Full article ">Figure 24
<p>ROC curve and confusion matrix.</p>
Full article ">Figure 25
<p>Z rotation normalization for video 70 in AFEW dataset.</p>
Full article ">Figure 26
<p>Z translation normalization for video 70 in AFEW dataset.</p>
Full article ">Figure 27
<p>Training process for arousal neural network. <b>Left</b>—accuracy; <b>right</b>—loss.</p>
Full article ">Figure 28
<p>Training process for valence neural network. <b>Left</b>—accuracy; <b>right</b>—loss.</p>
Full article ">Figure 29
<p>ROC curves for the AFEW reference neural network. <b>Left</b>—arousal; <b>right</b>—valence.</p>
Full article ">Figure 30
<p>The Manual Labeler L0 module comprises the following processes: (<b>a</b>) graphical analysis of time–series measures, (<b>b</b>) selection of the start and end frames of the expression through visualization of each frame, and (<b>c</b>) visualization of annotated data [<a href="#B10-electronics-13-04867" class="html-bibr">10</a>]. Sample video accessible at [<a href="#B65-electronics-13-04867" class="html-bibr">65</a>].</p>
Full article ">Figure 31
<p>Similarity matrix generated by cross-testing measurements M1 (<b>left</b>) and M12 (<b>right</b>) of each automatically found seed from the FBioT dataset.</p>
Full article ">Figure 32
<p>Coincident samples found based on seed search from the Automatic Labeler module.</p>
Full article ">Figure 33
<p>Summarized results of the seed search, grouped in blocks of five units, versus the number of occurrences.</p>
Full article ">Figure 34
<p>Neural network architecture for the proposed dataset prototype.</p>
Full article ">Figure 35
<p>Training results: accuracy and loss curves for the neural network prototype of biosignals.</p>
Full article ">Figure 36
<p>ROC curve (<b>left</b>) and confusion matrix (<b>right</b>). The samples of happy are represented by 0 and the neutral by 1.</p>
Full article ">Figure 37
<p>Prototype for visualizing measurements from local video, with respective graphs of measurements over time. Legend of the measurements: Red (M3), Blue (M13), Purple (M8) and Yellow (E1). The facial expression image is adapted from [<a href="#B10-electronics-13-04867" class="html-bibr">10</a>]. To exemplify the application’s functionality, it has been used a mirrored video of the expression acquired by the CK+ dataset to have a complete smile expression (onset-apex-offset).</p>
Full article ">Figure 38
<p>Prototype for visualizing time–series pattern inference from local video, for the measurement M12. The red sequence represents the corresponding neural network inference, while the blue represents other temporal measurements. Three-dimensional model by [<a href="#B60-electronics-13-04867" class="html-bibr">60</a>].</p>
Full article ">
14 pages, 769 KiB  
Article
Speech Emotion Recognition Using Multi-Scale Global–Local Representation Learning with Feature Pyramid Network
by Yuhua Wang, Jianxing Huang, Zhengdao Zhao, Haiyan Lan and Xinjia Zhang
Appl. Sci. 2024, 14(24), 11494; https://doi.org/10.3390/app142411494 - 10 Dec 2024
Viewed by 650
Abstract
Speech emotion recognition (SER) is important in facilitating natural human–computer interactions. In speech sequence modeling, a vital challenge is to learn context-aware sentence expression and temporal dynamics of paralinguistic features to achieve unambiguous emotional semantic understanding. In previous studies, the SER method based [...] Read more.
Speech emotion recognition (SER) is important in facilitating natural human–computer interactions. In speech sequence modeling, a vital challenge is to learn context-aware sentence expression and temporal dynamics of paralinguistic features to achieve unambiguous emotional semantic understanding. In previous studies, the SER method based on the single-scale cascade feature extraction module could not effectively preserve the temporal structure of speech signals in the deep layer, downgrading the sequence modeling performance. To address these challenges, this paper proposes a novel multi-scale feature pyramid network. The enhanced multi-scale convolutional neural networks (MSCNNs) significantly improve the ability to extract multi-granular emotional features. Experimental results on the IEMOCAP corpus demonstrate the effectiveness of the proposed approach, achieving a weighted accuracy (WA) of 71.79% and an unweighted accuracy (UA) of 73.39%. Furthermore, on the RAVDESS dataset, the model achieves an unweighted accuracy (UA) of 86.5%. These results validate the system’s performance and highlight its competitive advantage. Full article
Show Figures

Figure 1

Figure 1
<p>Functional diagram of SER system.</p>
Full article ">Figure 2
<p>The overview of proposed multi-scale feature pyramid network.</p>
Full article ">Figure 3
<p>Bottom-up pathway, where <math display="inline"><semantics> <mrow> <mi>k</mi> <mi>w</mi> </mrow> </semantics></math> denotes different kernel widths, and <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>S</mi> <mi>A</mi> </mrow> </semantics></math> denotes convolutional self-attention.</p>
Full article ">Figure 4
<p>Backward fusion structure, where <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math> represents the attention score calculation function as shown in Equation (<a href="#FD1-applsci-14-11494" class="html-disp-formula">1</a>), and <math display="inline"><semantics> <msub> <mi>F</mi> <mi>i</mi> </msub> </semantics></math> denotes the feature of the <span class="html-italic">i</span>-th layer.</p>
Full article ">Figure 5
<p>Convolutional self-attention (CSA) framework. (<b>a</b>) vanilla CSA; (<b>b</b>) improved CSA.</p>
Full article ">Figure 6
<p>The number of audio samples corresponding to each emotional label in IEMOCAP.</p>
Full article ">Figure 7
<p>The number of audio samples corresponding to each emotional label in RAVDESS.</p>
Full article ">Figure 8
<p>The t-SNE visualization of the proposed framework. (<b>a</b>) MSFPN; (<b>b</b>) DRN.</p>
Full article ">
Back to TopTop