[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116662924A - Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism - Google Patents

Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism Download PDF

Info

Publication number
CN116662924A
CN116662924A CN202310273760.9A CN202310273760A CN116662924A CN 116662924 A CN116662924 A CN 116662924A CN 202310273760 A CN202310273760 A CN 202310273760A CN 116662924 A CN116662924 A CN 116662924A
Authority
CN
China
Prior art keywords
features
attention mechanism
image
channel
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310273760.9A
Other languages
Chinese (zh)
Inventor
梁燕
侯增辉
尹恩同
陈思旭
徐露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202310273760.9A priority Critical patent/CN116662924A/en
Publication of CN116662924A publication Critical patent/CN116662924A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an aspect-level multimode emotion analysis method of a double-channel and attention mechanism, which is based on a neural network, extracts emotion information contained in image features in a multi-scale mode through combining aspect word features and text features and introduces a GCN (generalized graphic communication network) network into an aspect-level multimode emotion analysis task, so that feature extraction and interaction fusion capability of a model is greatly improved. In the invention, a pre-training encoder is adopted in a feature extraction layer to extract aspect words, text features and image features, and after the aspect words and sentence features are fused in a bidirectional manner in an attention mechanism layer, final aspect word features and sentence feature representations are obtained. And the image features establish an image feature extraction network through a channel attention mechanism and a spatial attention mechanism, and finally, the interactive fusion features of all modes are dynamically extracted through a GCN module. In the experiment, the performance index of the multi-modal emotion analysis on the data set based on the aspect of the attention mechanism is improved.

Description

Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
Technical Field
The invention belongs to the computer language processing and emotion analysis directions, and particularly relates to an aspect-level multi-mode emotion analysis method based on a dual-channel and attention mechanism.
Background
In recent years, content released by users on various online platforms has grown rapidly. How to exploit the emotional tendency of a certain aspect contained in artificial intelligence and other related technologies is becoming a hot point of research in recent years.
Emotion expresses the person's attitudes to an objective thing, and usually transmits emotion tendencies in various ways such as limb language, facial expression, language words, etc. Emotion analysis (Sentiment Analysis, SA), also known as Opinion Mining (OM), aims to extract opinions from a large number of unstructured text and classify them as positive, neutral or negative emotion polarity. In the Internet age, social platforms such as microblogs, knowledges, weChats and the like are developed, and characters and pictures are gradually becoming main carriers for users to transfer target aspects or entity opinion emotion in the network world. The task of aspect-based emotion analysis has received extensive attention in academia and industry over the last decade.
In the early days, text features are usually generated by using machine learning methods such as emotion dictionaries, dependency relations, statistical methods and the like, but the traditional method needs to consume a great deal of manpower to select and extract the features, lacks the association between aspect words and sentence contexts among the features, and has poor mobility and robustness. The deep learning method is successful in various tasks of natural language processing, and meanwhile, the application of the neural network in aspect-level emotion analysis is promoted. By learning and extracting feature correlations of aspect words between sentence contexts using various neural network models in deep learning, the performance of the models is also gradually improved. A number of deep network model approaches such as convolutional neural networks (Convolutional Neural Network, CNN), recurrent neural networks (Recurrent Neural Network, RNN), graph neural networks (Graph Neural Network, GNN), and attention mechanisms (Attention Mechanism) have been proposed, with text-based aspect-level emotion analysis being further developed.
As the content of many online platforms becomes more and more modelled, the emotional polarity of information prediction targets from other modalities is also becoming of increasing interest to researchers. And the scientific research achievement obtained in the image processing field is deeply learned, so that a theoretical basis is provided for multi-mode emotion analysis based on aspect level. Xu et al firstly introduce image mode information into aspect-level emotion analysis, extract image features by using CNN, extract text features by using Long-short-term memory (Long-short term Memory, LSTM) network, and verify the feasibility of the proposed method through an interactive attention mechanism. Then Gu et al adopts a bidirectional gating circulation unit (Bidirectional Gate Recurrent Unit, biGRU) network and a multi-head self-attention mechanism to encode text semantic information, adopts a ResNet-152 model and a capsule network to extract image characteristics, adopts a multi-head attention network in multi-mode interaction fusion, furthest improves the contribution of each mode to emotion transmission, and improves the performance of the network. Yu et al propose a hierarchical interaction module for modeling pairwise interactions between given aspect words, text information and image information. In order to make up for the semantic difference between text features and image features, an auxiliary reconstruction module based on an automatic encoder concept is further provided, and model performance is improved. However, the existing model still has some defects, 1) channel information and spatial information in an image cannot be fully extracted in the image feature extraction process, so that emotion information in the image cannot be effectively combined with aspect word information. 2) Information fusion between modes cannot be effectively carried out, so that the performance of the model is not ideal. Thus, research is directed to aspect-level based multimodal emotion analysis tasks and more efficient models are presented herein.
CN114936623A, an aspect emotion analysis method integrating multi-mode data, firstly carries out data preprocessing, and adjusts text and image formats to adapt to the input requirement of a neural network; secondly, extracting text features by using Bi-LSTM after word embedding, and extracting image features by using a Resnet50 network; extracting and aligning multi-modal aspects, extracting aspect terms from the text by using a sequence labeling method, and performing implicit alignment of image areas and aspect words by using a memory network added with attention and Point-wise convolution operation; then based on the text characteristics of the position attention, gaussian modeling context explicit positions, and a memory network extracts text representations sensitive to the terms; then carrying out multi-mode data fusion, and fusing the multi-mode data by a fusion discrimination matrix; and finally, carrying out emotion classification, and carrying out emotion classification by utilizing the fused characteristic information. According to the method, the multi-modal data are used for carrying out aspect-level emotion analysis, multi-modal complementary information is extracted, and the accuracy of emotion analysis tasks is improved.
The method for using average aspect word vectors is easy to cause word sense confusion and is unfavorable for the interaction of aspect words with sentences and image features. Secondly, in the extraction of the image features, the auxiliary extraction effect of semantic information of sentence context on the image features is ignored. Thus, the above-described methods are limited in their ability to fuse multimodal data.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. An aspect-level multi-mode emotion analysis method based on a dual-channel and attention mechanism is provided. The technical scheme of the invention is as follows:
an aspect-level multi-modal emotion analysis method based on a dual-channel and attention mechanism comprises the following steps:
step 1: extracting hidden characteristic representations from sentence characteristics and aspect word characteristics in a data set by using a Bert pre-training encoder, and extracting picture characteristics by using a ResNet-152 pre-training network; an aspect word is a subsequence that belongs to a sentence;
step 2: calculating the feature correlation of sentence features and aspect word features through a multi-head attention mechanism, so that corresponding attention weighting is obtained between the high-similarity features; finally obtaining aspect word features guided by the text and text features guided by the aspect words;
step 3: weighting the original image characteristics by using the aspect word characteristics guided by the text, and obtaining the image channel characteristics through a channel attention mechanism;
step 4: weighting the image channel characteristics by using the text characteristics guided by the aspect words, and generating a spatial attention pattern by using the spatial relation of the characteristics in a spatial attention mechanism to obtain final characteristic representation of the image;
step 5: calculating a dynamic adjacency matrix by text features guided by aspect words and image features generated by channel attention and spatial attention; obtaining a final fusion feature representation using the aggregate capabilities and the messaging capabilities of the graph neural network;
step 6: and classifying the final fusion features, aspect word features and sentence features by using a pooling mechanism through a classification module.
Further, step 1 extracts hidden feature representations from sentence features and aspect word features in the dataset by using a Bert pre-training encoder, and extracts picture features by using a res net-152 pre-training network, specifically:
outputting text and aspect feature information through two Bert-based pre-training text feature encoders; extracting image features by using a pretrained ResNet network; providing better initialization parameters for the model by adopting a pre-training model, enabling the model to have better generalization performance by fine adjustment on a target task, and accelerating model convergence; bert pre-training model to obtain sentence characteristicsAnd aspect features->Wherein t represents the text length and d represents the output feature dimension; the image features being denoted as H I =ResNet(E I ),H I ∈R c×w×h Wherein ResNet represents a ResNet-152 model, c represents a channel of an image feature, and w and h represent the width and the height of the feature respectively; wherein E is T 、E S 、E I Representing original aspect words, sentences and images; h T 、H S HI represents aspect words, sentences and image features extracted through the pre-training network.
Further, the step 2 adopts a multi-head attention mechanism to fuse related information between aspect word characteristics and sentence characteristics, and the specific method is as follows:
in order to obtain the interactive features between sentences and aspect words, a multi-head attention mechanism is adopted to calculate the similarity between the sentences and the aspect words, and feature fusion between the sentences and the aspect words can be effectively realized, and the expression is as follows:
MHA represents a multi-head attention mechanism, Q, K, V represents an input feature, d k Is a scaling factor,Output representing ith layer in transducer, layerNorm representing layer normalization, glue is activation function, W 1 T 、W 2 T Respectively representing a matrix of trainable parameters.
Wherein the aspect word features and sentence features are respectively used as query matrix q to calculate aspect word features Y guided by sentence features T And sentence feature Y guided by aspect word feature S
Further, the step 3 weights the original image features by using the aspect word features guided by the text, and obtains the image channel features through a channel attention mechanism, and the specific method is as follows:
in order to introduce aspect word features into an image, the aspect word features and the image features are fused through a multi-head self-attention mechanism, and the specific formula is as follows:
H ca =HMA(H I ,Y T ,Y T ) (4)
M CH =σ(MLP(AvgPool(H ca ))+MLP(MaxPool(H ca ))) (5)
wherein in the channel attention mechanism, it inputs H ca Is a multi-layer perceptron represented by MLP, avgPoll, maxPoll, max, σ, relu activation function, by multi-head attention mechanism, aspect word directed image features. M is M CH Representing the output of the channel attention.
Further, the step 4: the text feature guided by the aspect words is used for weighting the image channel feature, in a spatial attention mechanism, a spatial attention pattern is generated by using the spatial relation of the feature, and the final feature representation of the image is obtained, wherein the method comprises the following specific steps:
through a multi-head attention mechanism, the sentence characteristics guided by the aspect words are used for weighting the image characteristics output by the channel attention mechanism, and important areas related to the emotion of the aspect words in the image characteristics are highlighted in the channel attention mechanism, wherein the specific formula is as follows:
H sa =MHA(H I ,M CH ,M CH ) (6)
M SP =σ(Conv(Concat(AvgPool(H sa );MaxPool(H sa )))) (7)
equation (7) represents implementation details of the channel attention mechanism, where Concat represents matrix concatenation, conv represents convolution operation, and σ represents the Relu activation function. H sa Representing image features, H, guided by sentence features via a multi-headed attention mechanism I Representing image features, M SP Input representing spatial attentionAnd (5) outputting.
Further, the step 5 calculates a dynamic adjacency matrix by the text features guided by the aspect words and the image features generated by the channel attention and the space attention; the aggregation capability and the message transmission capability of the graph neural network are used to obtain a final fusion characteristic representation, which specifically comprises the following steps:
the sentence features and the image features are spliced, and an attention matrix is obtained through a self-attention mechanism and is used as an adjacent matrix of the GCN, firstly, the attention matrix can capture related features between the sentence and the image features, so that the adjacent matrix is more flexible, and secondly, the importance degree of similar features between the sentence and the image can be adaptively adjusted; in GCN, for a given node, graph G= { V, A }, where V is all nodes in the graph, A is the adjacency matrix between all nodes, A is the concatenation matrix corresponding to sentence features and image features ij The weight of (2) depends on the similarity between the nodes;
H att =Concat(Y S ,M SP ) (8)
A=MHA(H att ,H att ,H att ) (9)
wherein H is att Output M representing attention of a splice channel SP Sentence feature Y guided by sum aspect words SFor node v i Is characterized by the output of the first layer, W l A trainable weight matrix of the GCN layer I, wherein sigma is a Relu activation function; since GCN completes feature extraction and coding work between associated nodes, output H of all nodes in layer I l Expressed as:
n represents the number of nodes
Further, the step 6: the final fusion features, aspect word features and sentence features are classified by using a pooling mechanism through a classification module, and the specific steps are as follows:
for aspect word features and sentence features, the [ CLS ] is added when the features are extracted by using a pre-training model]As a tag, the final hidden state of the tag is therefore taken as a collective representation of aspect words and sentence features, noted asAndfor the first characterization of the fusion output part with the GCN feature, since it is a weighted sum between the features, the feature is taken as the classified feature +.>The total output characteristic O after pooling and stitching can be expressed as:
in the classification phase:
p(y|O)=softmax(W T O) (13)
wherein the method comprises the steps of
Is a trainable weight, and a cross entropy Loss function is used to calculate a Loss value Loss. D. y is (j) The number of training samples and the actual labels of the samples are represented respectively.
The invention has the advantages and beneficial effects as follows:
the advantage of the invention is mainly that in step 3 of claim 1, the channels in the image can be regarded as feature extractors, the channel attention being directed to extracting important features in the image channels that are relevant to the aspect words. In order to integrate aspect word information into image channel features, both features are interacted with using a multi-head attention mechanism before channel attention, so that the guiding function of the aspect words in the channel attention is conveniently exerted. Then in step 4, the spatial attention mainly extracts the region features related to the aspect words in the image, and because the sentence features also have the region association related to the aspect words, the sentence features are introduced into the spatial attention mechanism, so that the spatial attention is guided to extract the region features related to the aspect words in the image. And gradually extracting deep features in the image, and enhancing contribution of the image features to emotion classification in subsequent multi-mode fusion.
Drawings
FIG. 1 is a flowchart of an aspect-level multi-modal emotion analysis method based on a dual channel and attention mechanism in accordance with a preferred embodiment of the present invention.
FIG. 2 is a framework diagram of an aspect-level multimodal emotion analysis model.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
fig. 1 is a general flow chart of an aspect-level multi-modal emotion analysis method based on a dual-channel and attention mechanism according to the present invention, and is further described below with reference to fig. 1. The invention mainly comprises the following steps:
step 1: extracting hidden feature representations from sentence features and aspect word features in the dataset using a Bert pre-training encoder. The picture features are extracted using a ResNet-152 pre-training network.
Step 2: and calculating the feature correlation of the sentence features and the aspect word features through a multi-head attention mechanism, so that corresponding attention weighting is obtained between the features with high similarity. Finally, the aspect word features guided by the text and the text features guided by the aspect words are obtained.
Step 3: and weighting the original image characteristics by using the aspect word characteristics guided by the text, and obtaining the image channel characteristics through a channel attention mechanism.
Step 4: and weighting the image channel characteristics by using the text characteristics guided by the aspect words, and generating a spatial attention pattern by using the spatial relationship of the characteristics to obtain the final characteristic representation of the image.
Step 5: the text features guided by the aspect words and the image features generated by the channel attention and the spatial attention calculate a dynamic adjacency matrix. The aggregate and messaging capabilities of the graph neural network are used to derive a final fused feature representation.
Step 6: the final fused feature representation is classified by the classification module using a pooling mechanism.
2. FIG. 2 is a framework diagram of the multi-modal emotion analysis model at this aspect level, and the structural principle of the present invention will be further described with reference to FIG. 2. The method model of the invention has 4 layers, and the specific contents of each layer are as follows:
(1) Modal feature extraction layer
A set of multimodal samples is known, which contains a sentence s= { w of n words 1 ,w 2 ,…,w n And associated image I, and an aspect word sub-sequence T of S. For aspect word T, it is also associated with an emotion tag y. The invention adds sentence E S And aspect word E T Features are extracted in two Bert encoders, respectively, image E I Features are extracted in the ResNe-152 network. For the input part of the Bert text encoder, the label [ CLS ]]Added to the header of text, tag [ SEP ]]Adding the end of the text, and finally obtaining sentence characteristics by using the Bert pre-training modelAnd aspect word featuresWhere t represents the aspect word or text length and d represents the feature dimension. For image features, expressed as +.>Where c represents the number of channels of the image feature and w and h represent the width and height of the feature, respectively.
(2) Attention mechanism layer
In order to further extract the relevance between features and the modal interaction between text words and image features, the model adopts a multi-head attention mechanism to extract the potential relevance features between aspect word features and sentence features. In the image feature extraction, the channel attention mechanism fusing aspect word features and the space attention mechanism fusing text features are used, the image mode features are fused with the text mode features, and features which are in different scales and are related to the aspect words in the image mode are extracted, so that the GCN network can identify the adjacent relation between important nodes. The multi-mode characteristics are extracted more deeply through a message passing and aggregation mechanism of the GCN network.
1) Sentence and aspect word feature interactions
In order to obtain interactive features between sentences and aspect words, the associated information among different features is strengthened, and redundant information is filtered. The invention adopts a multi-head attention mechanism to calculate the similarity between the features of the two, and can effectively realize the feature fusion between the two. The expression of the multi-head attention mechanism is:
where T represents the matrix transpose, MHA represents the multi-headed attention mechanism, consisting of three parts, query (Q), key (K) and Value (V), with the attention Value generated by interaction between Q, K mapped to V by dot product. Scale factor d k Is the characteristic dimension of each attention head.
The aspect word features and sentence features can obtain the fused output features of the aspect word features and the sentence features through a multi-head attention mechanism, and the aspect word features guided by the sentence features are subjected to linear transformation and residual connection to obtain the final output features, wherein the specific formula is as follows:
wherein LayerNorm represents layer standardization, has functions of guaranteeing stability of data characteristic distribution and accelerating model training, and Glue represents activation function, W 1 T 、W 2 T Representing a trainable weight parameter. In the multi-head attention mechanism, the aspect word features and sentence features are respectively used as a query matrix Q, so that the aspect word features Y guided by the sentence features can be calculated T And sentence feature guided by aspect word feature, called Y S
2) Channel attention mechanism
Since each channel of image features can be regarded as a feature detector, in the channel attention mechanism, the features after average pooling and maximum pooling can be extracted through a layer of feedforward neural network, and important features related to aspect words in each channel of the image can be extracted. In order to identify emotion distribution related to an aspect word contained in a channel in an image channel, the invention introduces the aspect word feature in a channel attention mechanism, firstly, the aspect word feature and the image feature are fused through a multi-head attention mechanism, and the fused feature is used as input of the channel attention mechanism. The specific formula is as follows:
H ca =MHA(H I ,Y T ,Y T ) (4)
M CH =σ(MLP(AvgPool(H ca ))+MLP(MaxPool(H ca ))) (5)
wherein in the channel attention mechanism, it inputs H ca The image features guided by the aspect words are subjected to linear transformation and residual connection by a multi-head attention mechanism, and the complete output is subjected to linear transformation and residual connection as shown in formulas (2) and (3), and H is used in the part ca Is a requirement for simplifying the description. Equation (5) represents implementation details of the channel attention mechanism, wherein the MLP represents the multi-layer perceptron, comprising trainable weight parameters in the neural network, avgPool represents average pooling and MaxPool represents maximum pooling. σ represents the Relu activation function.
3) Spatial attention mechanism
In the spatial attention mechanism, the spatial relationship of features can be utilized to learn the distribution area of important emotion features related to aspect words. In the sentence and aspect word feature interaction stage, the text features guided by the aspect words already contain the position areas of the important emotion features for the aspect words in the sentences. In order to enhance the extraction capability of the image feature region. In the invention, after sentence characteristics guided by aspect words and image characteristics output by a channel attention mechanism are fused through characteristics, a distribution area of important emotion characteristics is learned in a space attention mechanism. The specific formula is as follows:
H sa =MHA(M CH ,Y S ,Y S ) (6)
M SP =σ(Conv(Concat(AvgPool(H sa );MaxPool(H sa )))) (7)
equation (7) represents implementation details of the spatial attention mechanism, where Concat represents matrix concatenation, conv represents convolution operation, and σ represents the Relu activation function.
(3) GCN feature fusion layer
Sentence features are spliced with image features, and an attention matrix is obtained through a self-attention mechanism and is used as an adjacency matrix of the GCN. Firstly, the attention matrix can learn the related features between the sentence and the image features, so that the adjacency matrix is more flexible, and secondly, the importance degree of the similar features between the sentence features and the image features can be adaptively adjusted. In GCN, for a given node, graph G= { V, A }, where V is all nodes in the graph, A is the adjacency matrix between all nodes, A is the concatenation matrix corresponding to sentence features and image features ij The weight of (2) depends on the similarity between the nodes.
H att =Concat(Y S ,M SP ) (8)
A=MHA(H att ,H att ,H att ) (9)
Wherein,,for node v i Is characterized by the output of the first layer, W l Is a trainable weight matrix for the GCN layer i, σ is a Relu activation function. Since GCN completes feature extraction and coding work between associated nodes, output H of all nodes in layer I l Expressed as:
(4) Output layer
For aspect word features and sentence features, since [ CLS ] is added when features are extracted initially using a pre-trained model]As a tag, the final hidden state of the tag is therefore taken as a collective representation of aspect words and sentence features, noted asAndfor the first characterization of the fusion output part with the GCN feature, since it is a weighted sum between the features, the feature is taken as the classified feature +.>The output characteristics can be expressed as:
the GCN output characteristics pass through a layer of feedforward neural network to finish classification tasks, and the specific formula is as follows:
p(y|O)=softmax(W T O) (13)
wherein the method comprises the steps ofIs a trainable weight, and a cross entropy Loss function is used to calculate a Loss value Loss.
Experimental simulation
Table1 model performance comparison Table1 Model performance comparison
As can be seen from Table1, compared with other models, the model proposed in this chapter obtains optimal experimental results in terms of classification accuracy and macro average F1, and improves classification accuracy and macro average F1 score by 1.35% and 1.25% on TWITTER-2015 respectively in comparison with suboptimal TomBERT network model. The lifting rate is 1.38% and 1.53% on TWITTER-2017 respectively. The method is characterized in that the chapter extracts deep semantic association related to the term in the aspects in the images by fusing the term features in the aspects with the text features, and classification performance of the model is further improved by means of feature fusion capability of the graph neural network. The TomBERT model uses BERT to extract visual representations that are sensitive to aspect terms, but fails to employ efficient feature fusion methods. For the MIMN model that uses a multi-hop memory network to extract features, although multi-hop fusion of bimodal features is achieved, interactive features between text content and visual information are not extracted deeply.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims (7)

1. An aspect-level multi-mode emotion analysis method based on a dual-channel and attention mechanism is characterized by comprising the following steps of:
step 1: extracting hidden characteristic representations of sentences and aspect words in the data set by using a Bert pre-training encoder, and extracting picture characteristics by using a ResNet-152 pre-training network; an aspect word is a subsequence that belongs to a sentence;
step 2: calculating the feature correlation of sentence features and aspect word features through a multi-head attention mechanism, so that corresponding attention weighting is obtained between the high-similarity features; finally obtaining aspect word features guided by the text and text features guided by the aspect words;
step 3: weighting the original image characteristics by using the aspect word characteristics guided by the text, and obtaining the image channel characteristics through a channel attention mechanism;
step 4: weighting the image channel characteristics by using the text characteristics guided by the aspect words, and generating a spatial attention pattern by using the spatial relation of the characteristics in a spatial attention mechanism to obtain final characteristic representation of the image;
step 5: calculating a dynamic adjacency matrix by text features guided by aspect words and image features generated by channel attention and spatial attention; obtaining a final fusion feature representation using the aggregate capabilities and the messaging capabilities of the graph neural network;
step 6: and classifying the final fusion features, aspect word features and sentence features by using a pooling mechanism through a classification module.
2. The method for analyzing the emotion of the aspect level multimode based on the dual-channel and the attention mechanism according to claim 1, wherein the step 1 extracts the hidden characteristic representation from the sentence characteristics and the aspect word characteristics in the data set by using a Bert pre-training coder, and extracts the picture characteristics by using a res net-152 pre-training network, specifically:
outputting text and aspect word feature information through two Bert-based pre-training text feature encoders; extracting image features by using a pretrained ResNet network; providing better initialization parameters for the model by adopting a pre-training model, enabling the model to have better generalization performance by fine adjustment on a target task, and accelerating model convergence; bert pre-training model to obtain sentence feature H S =Bert(E S ),And aspect features H T =Bert(E T ),/>Wherein t represents the text length and d represents the output feature dimension; the image features being denoted as H I =ResNet(E I ),/>Wherein ResNet represents a ResNet-152 model, represents channels of image features, and w and h represent the width and the height of the features respectively; wherein E is T 、E S 、E I Representing original aspect words, sentences and images; h T 、H S 、H I Representing aspect words, sentences and image features extracted through the pre-training network.
3. The method for analyzing the emotion in multiple modes in aspect level based on dual channel and attention mechanism according to claim 1, wherein the step 2 adopts a multi-head attention mechanism, and fuses the related information between the aspect word characteristics and the sentence characteristics, and the specific method is as follows:
in order to obtain the interactive features between sentences and aspect words, a multi-head attention mechanism is adopted to calculate the similarity between the sentences and the aspect words, and feature fusion between the sentences and the aspect words can be effectively realized, and the expression is as follows:
MHA represents a multi-head attention mechanism, Q, K, V represents an input feature, d k In order for the scaling factor to be a factor,representing the output of the ith layer in the transducer, layerNorm represents layer normalization, and Glue is an activation function, W 1 T 、W 2 T Respectively representing a trainable parameter matrix;
wherein the aspect word features and sentence features are respectively used as query matrix Q to calculate aspect word features Y guided by sentence features T And sentence feature Y guided by aspect word feature S
4. The method for analyzing the emotion in multiple modes in aspect level based on dual channel and attention mechanism according to claim 1, wherein said step 3 weights the original image features by using the aspect word features guided by the text, and obtains the image channel features by the channel attention mechanism, specifically comprising:
in order to introduce aspect word features into an image, the aspect word features and the image features are fused through a multi-head self-attention mechanism, and the specific formula is as follows:
H ca =HMA(H I ,Y T ,Y T ) (4)
M CH =σ(MLP(AvgPool(H ca ))+MLP(MaxPool(H ca ))) (5)
wherein in the channel attention mechanism, it inputs H ca Is image characteristics guided by a multi-head attention mechanism through aspect words, a multi-layer perceptron represented by MLP, average pooling represented by AvgPoll, maximum pooling represented by MaxPOll, and Relu activation function represented by sigma, M CH Representing the output of the channel attention.
5. The method for analyzing the emotion of the aspect level multimode based on the dual-channel and attention mechanism according to claim 1, wherein the step 4 is as follows: the text feature guided by the aspect words is used for weighting the image channel feature, in a spatial attention mechanism, a spatial attention pattern is generated by using the spatial relation of the feature, and the final feature representation of the image is obtained, wherein the method comprises the following specific steps:
through a multi-head attention mechanism, the sentence characteristics guided by the aspect words are used for weighting the image characteristics output by the channel attention mechanism, and important areas related to the emotion of the aspect words in the image characteristics are highlighted in the channel attention mechanism, wherein the specific formula is as follows:
H sa =MHA(H I ,M CH ,M CH ) (6)
M SP =σ(Conv(Concat(AvgPool(H sa );MaxPool(H sa )))) (7)
equation (7) represents implementation details of the channel attention mechanism, wherein Concat represents matrix concatenation, conv represents convolution operation, σ represents Relu activation function, H sa Representing image features guided by multi-head attention mechanism, by sentence features, H I Representing image features, M SP Representing the output of spatial attention.
6. The method for analyzing the emotion of the aspect level multimode based on the dual-channel and attention mechanism according to claim 5, wherein the step 5 calculates a dynamic adjacency matrix by text features guided by aspect words and image features generated by channel attention and spatial attention; the aggregation capability and the message transmission capability of the graph neural network are used to obtain a final fusion characteristic representation, which specifically comprises the following steps:
the sentence features and the image features are spliced, and an attention matrix is obtained through a self-attention mechanism and is used as an adjacent matrix of the GCN, firstly, the attention matrix can capture related features between the sentence and the image features, so that the adjacent matrix is more flexible, and secondly, the importance degree of similar features between the sentence and the image can be adaptively adjusted; in GCN, for a given node, graph G= { V, A }, where V is all nodes in the graph, A is the adjacency matrix between all nodes, A is the concatenation matrix corresponding to sentence features and image features ij The weight of (2) depends on the similarity between the nodes;
H att =Concat(Y S ,M SP ) (8)
A=MHA(H att ,H att ,H att ) (9)
wherein H is att Output M representing attention of a splice channel SP Sentence feature Y guided by sum aspect words SFor node v i Is characterized by the output of the first layer, W l A trainable weight matrix of the GCN layer I, wherein sigma is a Relu activation function; since GCN completes feature extraction and coding work between associated nodes, output H of all nodes in layer I l Expressed as:
n represents the number of nodes.
7. The method for analyzing the emotion of the aspect level multimode based on the dual-channel and attention mechanism according to claim 6, wherein the following step 6: the final fusion features, aspect word features and sentence features are classified by using a pooling mechanism through a classification module, and the specific steps are as follows:
for aspect word features and sentence features, the [ CLS ] is added when the features are extracted by using a pre-training model]As a tag, the final hidden state of the tag is therefore taken as a collective representation of aspect words and sentence features, noted asAnd->For the first characterization of the fusion output part with the GCN feature, since it is a weighted sum between the features, the feature is taken as the classified feature +.>The total output characteristic O after pooling and stitching can be expressed as:
in the classification phase:
p(y|O)=softmax(W T O) (13)
wherein the method comprises the steps ofIs a trainable weight, and calculates a Loss value Loss using a cross entropy Loss function, D, y (j) The number of training samples and the actual labels of the samples are represented respectively.
CN202310273760.9A 2023-03-20 2023-03-20 Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism Pending CN116662924A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310273760.9A CN116662924A (en) 2023-03-20 2023-03-20 Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310273760.9A CN116662924A (en) 2023-03-20 2023-03-20 Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism

Publications (1)

Publication Number Publication Date
CN116662924A true CN116662924A (en) 2023-08-29

Family

ID=87708608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310273760.9A Pending CN116662924A (en) 2023-03-20 2023-03-20 Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism

Country Status (1)

Country Link
CN (1) CN116662924A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117395164A (en) * 2023-12-12 2024-01-12 烟台大学 Network attribute prediction method and system for industrial Internet of things

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117395164A (en) * 2023-12-12 2024-01-12 烟台大学 Network attribute prediction method and system for industrial Internet of things
CN117395164B (en) * 2023-12-12 2024-03-26 烟台大学 Network attribute prediction method and system for industrial Internet of things

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN108733792B (en) Entity relation extraction method
CN109918671A (en) Electronic health record entity relation extraction method based on convolution loop neural network
Sharma et al. A survey of methods, datasets and evaluation metrics for visual question answering
CN110287323B (en) Target-oriented emotion classification method
CN111985205A (en) Aspect level emotion classification model
CN113705238A (en) Method and model for analyzing aspect level emotion based on BERT and aspect feature positioning model
CN114896434B (en) Hash code generation method and device based on center similarity learning
CN116975350A (en) Image-text retrieval method, device, equipment and storage medium
CN114004220A (en) Text emotion reason identification method based on CPC-ANN
Ishmam et al. From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities
CN116662500A (en) Method for constructing question-answering system based on BERT model and external knowledge graph
CN116187349A (en) Visual question-answering method based on scene graph relation information enhancement
CN116701996A (en) Multi-modal emotion analysis method, system, equipment and medium based on multiple loss functions
CN116522945A (en) Model and method for identifying named entities in food safety field
CN112733764A (en) Method for recognizing video emotion information based on multiple modes
CN111930981A (en) Data processing method for sketch retrieval
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution
CN115730232A (en) Topic-correlation-based heterogeneous graph neural network cross-language text classification method
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism
CN114048314A (en) Natural language steganalysis method
CN113642630A (en) Image description method and system based on dual-path characteristic encoder
CN118364111A (en) Personality detection method based on text enhancement of large language model
Meng et al. Regional bullying text recognition based on two-branch parallel neural networks
CN117093692A (en) Multi-granularity image-text matching method and system based on depth fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination