Kumar et al., 2024 - Google Patents
Interpretable multimodal emotion recognition using hybrid fusion of speech and image dataKumar et al., 2024
View PDF- Document ID
- 1135947120697567032
- Author
- Kumar P
- Malik S
- Raman B
- Publication year
- Publication venue
- Multimedia Tools and Applications
External Links
Snippet
This paper proposes a multimodal emotion recognition system based on hybrid fusion that classifies the emotions depicted by speech utterances and corresponding images into discrete classes. A new interpretability technique has been developed to identify the …
- 230000008909 emotion recognition 0 title abstract description 84
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gheisari et al. | Deep learning: Applications, architectures, models, tools, and frameworks: A comprehensive survey | |
Zhu et al. | Multimodal sentiment analysis based on fusion methods: A survey | |
Alam et al. | Survey on deep neural networks in speech and vision systems | |
Zhang et al. | A quantum-like multimodal network framework for modeling interaction dynamics in multiparty conversational sentiment analysis | |
Niu et al. | A review on the attention mechanism of deep learning | |
Wadawadagi et al. | Sentiment analysis with deep neural networks: comparative study and performance assessment | |
Geetha et al. | Multimodal Emotion Recognition with deep learning: advancements, challenges, and future directions | |
Kumar et al. | Interpretable multimodal emotion recognition using hybrid fusion of speech and image data | |
Zulqarnain et al. | An efficient two-state GRU based on feature attention mechanism for sentiment analysis | |
Halvardsson et al. | Interpretation of swedish sign language using convolutional neural networks and transfer learning | |
Hofmann et al. | Innovating with artificial intelligence: capturing the constructive functional capabilities of deep generative learning | |
Kommineni et al. | Attention-based Bayesian inferential imagery captioning maker | |
Wankhade et al. | MAPA BiLSTM-BERT: multi-aspects position aware attention for aspect level sentiment analysis | |
Sharma et al. | Multilevel attention and relation network based image captioning model | |
Lei et al. | A multi-level mesh mutual attention model for visual question answering | |
Sreevidya et al. | Elder emotion classification through multimodal fusion of intermediate layers and cross-modal transfer learning | |
Paul et al. | A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis | |
Wu et al. | Sentimental visual captioning using multimodal transformer | |
Wieser et al. | Understanding auditory representations of emotional expressions with neural networks | |
Chatterjee et al. | Class-biased sarcasm detection using BiLSTM variational autoencoder-based synthetic oversampling | |
Yuan | [Retracted] A Classroom Emotion Recognition Model Based on a Convolutional Neural Network Speech Emotion Algorithm | |
Ghorbanali et al. | Capsule network-based deep ensemble transfer learning for multimodal sentiment analysis | |
Dixit et al. | Deep CNN with late fusion for real time multimodal emotion recognition | |
Yang et al. | SMFNM: Semi-supervised multimodal fusion network with main-modal for real-time emotion recognition in conversations | |
Jia et al. | Multimodal emotion distribution learning |