[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117540023A - Image joint text emotion analysis method based on modal fusion graph convolution network - Google Patents

Image joint text emotion analysis method based on modal fusion graph convolution network Download PDF

Info

Publication number
CN117540023A
CN117540023A CN202410021947.4A CN202410021947A CN117540023A CN 117540023 A CN117540023 A CN 117540023A CN 202410021947 A CN202410021947 A CN 202410021947A CN 117540023 A CN117540023 A CN 117540023A
Authority
CN
China
Prior art keywords
text
image
emotion
features
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410021947.4A
Other languages
Chinese (zh)
Inventor
孙玉宝
谈钱辉
沈心旸
李军侠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202410021947.4A priority Critical patent/CN117540023A/en
Publication of CN117540023A publication Critical patent/CN117540023A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了基于模态融合图卷积网络的图像联合文本情感分析方法,包括:获取包含用户情感信息的图像以及文本数据;构建基于模态融合图卷积网络的图像联合文本情感分析模型,包括图文特征提取模块、语义增强图卷积模块和全局融合模块;设计损失函数,采用Adam优化器实现模型参数的迭代优化与更新;训练完成的网络模型能够实现对用户情感倾向的端到端分类。本发明利用深度学习技术,通过用户在社交平台上发布的图像与文本信息准确分析其情感倾向,有助于企业分析客户对相关产品的态度倾向,亦有助于社交媒体平台通过用户浏览的图文内容判断其喜好。

The invention discloses an image joint text sentiment analysis method based on a modal fusion graph convolution network, which includes: acquiring images and text data containing user emotional information; constructing an image joint text sentiment analysis model based on a modal fusion graph convolution network, Including image and text feature extraction module, semantic enhancement graph convolution module and global fusion module; design the loss function and use the Adam optimizer to achieve iterative optimization and update of model parameters; the trained network model can achieve end-to-end analysis of user emotional tendencies Classification. This invention uses deep learning technology to accurately analyze the emotional tendencies of users through the images and text information posted on social platforms, which helps enterprises analyze customers' attitude tendencies towards related products, and also helps social media platforms analyze the images browsed by users. Judge their preferences based on the content of the article.

Description

Image joint text emotion analysis method based on modal fusion graph convolution network
Technical Field
The invention relates to an image combined text emotion analysis method based on a modal fusion graph rolling network, and belongs to the technical field of image and text processing.
Background
Emotion analysis tasks have diversified application scenes and potential values, and a plurality of emotion analysis technologies based on text data exist in the past and can be divided into a traditional method based on an emotion dictionary and a method based on deep learning. Conventional approaches often do not work well because of the inability to handle relationships between contexts well. Deep networks have good learning ability, but the quality of data plays a decisive role in the final presentation of the data, and it is often difficult to achieve a satisfactory prediction effect by relying on only single-mode data. Because the images in the social media and the corresponding text descriptions have certain correlation, compared with a single-mode text emotion analysis task or an image emotion analysis task, the information of the two modes of the combined image and the text can fully utilize the complementary advantages between the images and the text information, so that more accurate emotion analysis is realized.
The key of the image joint text emotion analysis task is how to efficiently capture emotion association between two modal emotion representations and fully fuse the two modal features. The existing image combined text emotion analysis method mainly uses an attention mechanism to simply merge two modes, and although the methods can obtain better results than a single-mode emotion analysis method, because of quite large intra-class differences and inter-class similarities between multi-mode emotion data, complex emotion interaction between different mode features cannot be captured well by the simple attention mechanism, emotion association between different mode data cannot be built sufficiently, and therefore the model is difficult to learn deeper information.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the image combined text emotion analysis method based on the modal fusion graph rolling network is high in accuracy, rapid in analysis process and capable of achieving better graph emotion analysis.
The invention adopts the following technical scheme for solving the technical problems:
the image joint text emotion analysis method based on the modal fusion graph rolling network comprises the following steps:
step 1, acquiring images and text data containing emotion information of a user as a dataset, wherein the acquired image data and the text data correspond to each other one by one, and after labeling paired images and text data, the dataset is divided into a training set and a testing set;
step 2, constructing an image joint text emotion analysis model based on a modal fusion graph convolution network, wherein the model comprises an image-text feature extraction module, a semantic enhancement graph convolution module and a global fusion module; the image-text feature extraction module comprises an image feature extraction unit and a text feature extraction unit, which are respectively used for extracting image features in image data and text features in text data; the semantic enhancement graph convolution module comprises an image semantic enhancement unit, a text semantic enhancement unit and a fusion information semantic enhancement unit, which are respectively used for carrying out semantic enhancement on image features, text features and image text fusion features; the global fusion module comprises a combination layer, an attention mechanism layer and a full connection layer, wherein the combination layer is used for combining the semantically enhanced image features, the semanteme features and the image text fusion features to obtain initial global emotion features, the attention mechanism layer is used for capturing attention weights facing the graph from the initial global emotion features, and the full connection layer is used for obtaining a final emotion analysis result based on the attention weights;
step 3, designing a loss function for optimizing the model constructed in the step 2, and presetting a training super-parameter of the model;
step 4, training the model constructed in the step 2 by using a training set, and optimizing and updating model parameters by using an Adam optimizer according to a loss function to obtain a trained model;
and step 5, testing the test set by using the trained model to obtain emotion analysis results, namely emotion tendencies of the user.
In a preferred embodiment of the present invention, in the step 1, the paired image and text data are labeled as one of the following three categories according to the emotional tendency of the user: the method comprises the steps of negatively, neutral and positive dividing the marked data set into a training set and a testing set, wherein the proportion of each category in the training set to the total quantity of the training set is the same.
In the step 2, as a preferable scheme of the present invention, the expression of the image joint text emotion analysis model based on the modal fusion graph rolling network is:
wherein,and->Image features and text features, respectively, +.>And->Respectively image data and text data,for the image feature extraction unit, < > for>For text feature extraction unit, < > for>And->Respectively, semantically enhanced image features, text features and image text fusion features, +.>For the image semantic enhancement unit,>for text semantic enhancement unit,>for fusing information semantic enhancement units,/->For global fusion module,/->In order for the splicing operation to be performed,is a full connection layer->And the final emotion analysis result.
In the step 2, the image semantic enhancement unit, the text semantic enhancement unit and the fusion information semantic enhancement unit have the same structure and comprise an edge generation unit and a graph convolution operation unit;
for characteristics of,/>First by linear transformation->And->Will->Embedding a new feature space, and then sending the new feature space into an edge generation unit to calculate similarity between node features so as to capture the connection between the node features, wherein the expression is as follows:
wherein,and->For a learnable parameter->Respectively represent the firstijThe mode of each sample ismIs characterized by (1)>Constructing an emotion association graph for the similarity coefficients among the nodes according to the obtained similarity coefficients among the nodes, wherein the calculation mode of the adjacent matrix is as follows:
wherein,is thatmTotal number of nodes under modality, +.>Represent the firstkThe mode of each sample ismIs characterized in that,Eis a diagonal identity matrix>For the similarity matrix of the graph nodes, +.>For matrix->Element of (a)>Finally, the node characteristics with strong emotion expression in the single-mode data are aggregated through a graph convolution operation unit, and the graph convolution expression is as follows:
wherein,respectively +.>Output and input of layer diagram convolution, +.>Is->Layer diagram convolved learnable parameter matrix, +.>For ReLU activation function, +.>Is the input of the layer 1 graph convolution.
As a preferred embodiment of the present invention, in the step 3, a loss functionIncluding emotion classification loss functionsAnd tag-based contrast learning loss function>The expression is:
wherein,parameters to be optimized for the model, +.>Is the firstiTrue value of individual samples, +.>For model pair numberiPredicted value of individual sample outputs, +.>And->Respectively represent the first in the same batchiAnd (d)jGlobal emotion fusion feature of individual samples, +.>In order to compare the coefficients of the learning,Sfor batch size, +.>Is the first toiThe set of all sample numbers with the same label for each sample, the super-parameters of the model include learning rate +.>Iteration number epoch, batch sizeSAnd the depth and number of layers of the model.
In a preferred embodiment of the present invention, in the step 4, the model parameters are updated by a back propagation algorithm.
A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the image joint text emotion analysis method based on a modal fusion graph rolling network when the computer program is executed.
A computer readable storage medium storing a computer program which when executed by a processor implements the steps of the image joint text emotion analysis method based on a modal fusion graph convolution network.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
1. according to the invention, a graph rolling network, an attention mechanism and a multi-mode emotion analysis model are combined, semantic enhancement is carried out on the picture features and the text features obtained by the graph feature extraction module through the graph rolling network, global information fusion is carried out on the graph features subjected to semantic enhancement through the attention mechanism by the global fusion module, and then the obtained global features are classified through a full-connection layer.
2. The invention improves the accuracy of emotion tendency analysis to a great extent, can accurately realize emotion tendency analysis through images and text data, is beneficial to enterprises to analyze the attitude tendency of clients to related products, and is also beneficial to social media platforms to judge favorites of the clients through image-text contents browsed by users.
Drawings
FIG. 1 is a schematic flow chart of an image joint text emotion analysis method based on a modal fusion graph rolling network;
FIG. 2 is a schematic diagram of a emotion analysis network model;
FIG. 3 is a schematic diagram of the structure of a semantic enhancement unit in the semantic enhancement graph convolution module;
FIG. 4 is a schematic illustration of an example of visual analysis results on paired image text datasets.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
As shown in fig. 1, the image joint text emotion analysis method based on the modal fusion graph rolling network comprises the following steps:
s101, acquiring an image containing user emotion information and text data from an Internet social media platform.
In step S101, firstly, image and text data published by a user on a social media platform are acquired through the internet, and paired image and text data are divided into three cases according to their emotional tendency, respectively: negative, neutral, positive; and then dividing the data of each case into a training set and a testing set according to a preset proportion.
S201, constructing an image joint text emotion analysis model based on a modal fusion graph convolution network, wherein the image joint text emotion analysis model comprises an image-text feature extraction module, a semantic enhancement graph convolution module and a global fusion module. The image-text feature extraction module is used for expanding the initial image features and the initial text features into node vector representations, the semantic enhancement graph convolution module constructs a modal graph and a multi-modal interaction graph of the node features by calculating emotion similarity among the nodes, and the global fusion module calculates attention coefficients of the node vectors and carries out weighted fusion to obtain global representations of image-text emotion by carrying out dynamic reasoning and aggregation on emotion context features in the modal and among the multi-modal features on the graph.
In step S201, an image joint text emotion analysis model based on a modal fusion graph convolution network is shown in fig. 2, and the model is composed of three modules, namely an image-text feature extraction module, a semantic enhancement graph convolution module and a global fusion module. In the image-text feature extraction module, image features are extracted by the pretrained ResNet50, and text features are extracted by the pretrained Bert. After the characteristics of the image and the text are obtained and are subjected to splicing operation, the characteristics are respectively sent to a graph convolution module corresponding to the image, the text and the fusion information for semantic enhancement, and the semantic enhancement graph convolution module is shown in fig. 3. And finally, sending the semantically enhanced image, text and fusion information features into a global fusion module for processing to obtain global features, and classifying through a full connection layer to obtain a final prediction result.
As shown in FIG. 2, the text data is processed by the Bert model, given a textWherein->Is textTLength of->Representative textTMiddle (f)iIndividual words, after passing through the Bert model, each word will be mapped to a dimension ofText emotion feature vector of (2) text emotion feature +.>Wherein each text emotion feature vector is used as a node in a subsequent semantic enhanced graph convolution module.
As shown in FIG. 2, the image data is processed by the ResNet50 model for a given textTCorresponding imageIAfter inputting ResNet50, taking the feature map output by the convolution of the last layerWherein->Dimension of the feature map, < >>The number of channels in the feature map. Since the subsequent image feature will be fused with the text feature, it is necessary to make the dimensions of the image feature vector coincide with those of the text feature vector, so the number of channels of the feature map is changed using the full connection layer +.>So that the number of channels after the change is +.>. Finally, expanding the space dimension of the feature map into one dimension to obtain image emotion features +.>Wherein->Each image emotion feature vector is used as a node in a subsequent semantic enhancement graph rolling module.
As shown in FIG. 3, the semantic enhancement map convolution module includes three sub-modules for processing text features, respectivelyImage characteristics->Image text fusion feature->Wherein->. The three sub-modules are identical in construction and +.>First by linear transformation->And->Will->Embedding a new feature space, which is then fed into the edge generation unit +.>Similarity between node features is calculated to capture the links between node features. The expression is:
wherein,and->Is a parameter that can be learned, < >>Represents the firstiFirst, thejThe mode of each sample ismIs characterized by (1)>A function is calculated for the similarity between the nodes. Then constructing emotion association diagram according to the obtained similarity coefficient between nodes, and adjoining the matrixThe calculation method is as follows:
wherein,representative ofmThe total number of nodes in the modality,Eis a diagonal identity matrix, addEThe purpose of (a) is to alleviate the gradient vanishing and degradation problems. Adjacency matrix->Information interaction between two nodes with higher emotion semantic similarity can be enhanced, and interaction influence between irrelevant nodes is restrained. Finally, the node characteristics with strong emotion expression in the single-mode data are aggregated through graph convolution, and the graph convolution expression is as follows:
wherein,is->Layer diagram convolved learnable parameter matrix, +.>For the ReLU activation function, in one embodiment of the invention +.>And is set to 2 to prevent transition smoothness caused by excessive layers.
As shown in FIG. 2, in the global fusion module, the text emotion expression output by the semantic enhancement module is first expressedImage emotion representation +.>Image text fusion representation +.>Combining to obtain initial global emotion feature->The expression is:
attention mechanisms are then used to capture graph-oriented attention information from the fused node features, attention weightsThe calculated expression of (2) is:
wherein,weight and bias of the first fully connected layer, respectively,/->Respectively the weight and the deviation of the second full connection layer;
finally multiplying the attention weight with the corresponding emotion feature vector and adding to obtain the global emotion featureAnd obtaining the final prediction result by the full link layer +.>The expression is:
wherein,the weight and bias of the full connection layer respectively.
S301, designing a loss function for optimizing a network model, and presetting a training super-parameter of the network model.
In step S301, a loss functionIncluding emotion classification loss function->And tag-based contrast learning loss function>The expression is:
wherein,for parameters that need to be optimized +.>Is the firstiTrue value of individual samples, +.>For model pair numberiPredicted value of individual sample outputs, +.>And->Respectively represent the first in the same batchiAnd (d)jGlobal emotion fusion feature of individual samples, +.>For comparison of the learning coefficients. The super-parameters of the model comprise learning rate->Number of iterations epoch, batch sizeSDepth and number of layers of the network model.
S401, training data are sent into an emotion analysis model, and an Adam optimizer is adopted to realize iterative optimization and updating of model parameters according to a loss function.
Step S4011, initializing the picture feature extraction module and the text feature extraction module by adopting pre-trained parameters and other network parameters; selecting training data setsSPairs of image textAndsending the result into a network model, and obtaining a corresponding output prediction result +.>
Step S4012, updating the rest network parameters by the reverse propagation algorithm,/>Is first order momentum, wherein the Adam optimizer is one of gradient descent algorithms;
step S4013, sequentially performing operations S4011 and S4012 on the data in the whole training set, and performing epoch=100 iterations in total.
S501, if the emotion analysis model converges, the trained model can directly realize end-to-end emotion tendency analysis of the user, the output of the model is the emotion tendency of the user, and otherwise, the S401 is returned.
Step S5011, judging whether the classified network model converges or not: in the iterative process of network training, if the objective function value is reduced and gradually increased to a certain value, judging that the network converges;
s5012, inputting paired image text data into a converged network model, wherein the output of the model is the corresponding emotion tendency;
step S5013, if the iterative training does not converge, the routine returns to step S401.
Examples
In order to demonstrate the effectiveness of the present invention, comparative experiments and ablation experiments were performed. Firstly introducing a data set and training details, then providing comparison experiment results of different algorithms on the data set, and finally explaining a digestion experiment, thereby proving the effectiveness of semantic enhancement graph convolution and label comparison learning loss function.
The datasets used in the experiments were MVSA-Single and MVSA-Multiple datasets, with samples in both datasets collected from social media websites Twitter, with one corresponding emotion tag for each image text pair. The samples in the MVSA-Single and the MVSA-Multiple have three emotion labels which are positive, negative and neutral respectively, wherein the sample label of the MVSA-Single is annotated by one annotator, the total number of the samples is 4511, each sample in the MVSA-Single is annotated by three annotators, and the total logarithm of the image text in the data set is 17024. In the experiment, the data set is divided into a training set, a verification set and a test set according to the proportion of 8:1:1.
By conducting experiments in the test set, final classification accuracies of 74.36% and 72.87% were obtained on the MVSA-Single and MVSA-Multiple datasets, respectively.
The method is compared with the existing Multiple image-text fusion emotion analysis methods based on deep learning, and a comparison experiment is carried out on MVSA-Single and MVSA-Multiple data sets. Methods for comparison with the present invention include MultiSentiNet, HSAN, coMN, MVAN and MGNN. The MultiSentiNet performs image-text fusion emotion analysis by fusing a target feature vector, a scene feature vector and a text feature vector. HSAN uses a cascade semantic attention network to conduct teletext emotion prediction based on image descriptions. The CoMN iteratively performs the image-text feature interactions using a common memory network. MVAN introduces a multiview attention mechanism in the memory network for emotion classification. From the data point of view, the MGNN utilizes the graph neural network to find co-occurrence characteristics among the data set samples. The results of the comparative experiments are shown in Table 1.
TABLE 1
In order to verify the improvement effect of the semantic enhancement graph convolution module and the label contrast learning loss function on the final classification accuracy of the network, the network without the semantic enhancement graph convolution module and the contrast learning loss function is taken as a baseline, and related experiments are carried out on MVSA-Single and MVSA-Multiple data sets by sequentially adding the semantic enhancement graph convolution module for images, texts and multi-mode data and the contrast learning loss function. The experimental results are shown in table 2, wherein IG, TG and FG respectively represent an image emotion semantic enhancement map convolution module, a text emotion semantic enhancement map convolution module and a fusion feature semantic enhancement map convolution module, and LBCL represents a tag-based contrast learning loss function.
TABLE 2
As can be seen from Table 1, compared with the existing image-text fusion emotion analysis method, the method provided by the invention can greatly improve the accuracy of emotion tendency analysis on the data of a real social media platform, and realizes improvement of innovativeness.
As can be seen from table 2, compared with Baseline which only retains the image-text feature extraction and feature fusion and only uses cross entropy loss, the classification accuracy can be effectively improved by adding the semantic enhancement graph convolution modules (IG, TG, FG) and the contrast learning loss (LBCL).
As shown in fig. 4, in order to show the visual analysis result of the present invention on paired image text data sets, it can be seen that the network model of the present invention can accurately analyze the image text data in different scenes.
Based on the same inventive concept, the embodiment of the application provides a computer device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the image joint text emotion analysis method based on the modal fusion graph rolling network when executing the computer program.
Based on the same inventive concept, the embodiments of the present application provide a computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of the image joint text emotion analysis method based on a modal fusion graph rolling network.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow in the flowchart, and combinations of flows in the flowchart, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.

Claims (8)

1. The image joint text emotion analysis method based on the modal fusion graph rolling network is characterized by comprising the following steps of:
step 1, acquiring images and text data containing emotion information of a user as a dataset, wherein the acquired image data and the text data correspond to each other one by one, and after labeling paired images and text data, the dataset is divided into a training set and a testing set;
step 2, constructing an image joint text emotion analysis model based on a modal fusion graph convolution network, wherein the model comprises an image-text feature extraction module, a semantic enhancement graph convolution module and a global fusion module; the image-text feature extraction module comprises an image feature extraction unit and a text feature extraction unit, which are respectively used for extracting image features in image data and text features in text data; the semantic enhancement graph convolution module comprises an image semantic enhancement unit, a text semantic enhancement unit and a fusion information semantic enhancement unit, which are respectively used for carrying out semantic enhancement on image features, text features and image text fusion features; the global fusion module comprises a combination layer, an attention mechanism layer and a full connection layer, wherein the combination layer is used for combining the semantically enhanced image features, the semanteme features and the image text fusion features to obtain initial global emotion features, the attention mechanism layer is used for capturing attention weights facing the graph from the initial global emotion features, and the full connection layer is used for obtaining a final emotion analysis result based on the attention weights;
step 3, designing a loss function for optimizing the model constructed in the step 2, and presetting a training super-parameter of the model;
step 4, training the model constructed in the step 2 by using a training set, and optimizing and updating model parameters by using an Adam optimizer according to a loss function to obtain a trained model;
and step 5, testing the test set by using the trained model to obtain emotion analysis results, namely emotion tendencies of the user.
2. The image-text emotion analysis method based on the modal fusion graph rolling network according to claim 1, wherein in the step 1, paired images and text data are labeled as one of the following three categories according to emotion tendencies of users: the method comprises the steps of negatively, neutral and positive dividing the marked data set into a training set and a testing set, wherein the proportion of each category in the training set to the total quantity of the training set is the same.
3. The image joint text emotion analysis method based on the modal fusion graph rolling network according to claim 1, wherein in the step 2, the expression of the image joint text emotion analysis model based on the modal fusion graph rolling network is:
wherein,and->Image features and text features, respectively, +.>And->Image data and text data, respectively, +.>For the image feature extraction unit, < > for>For text feature extraction unit, < > for>And->Respectively, semantically enhanced image features, text features and image text fusion features, +.>For the image semantic enhancement unit,>for text semantic enhancement unit,>for fusing information semantic enhancement units,/->For global fusion module,/->For splicing operation, < >>Is a full connection layer->And the final emotion analysis result.
4. The image joint text emotion analysis method based on the modal fusion graph rolling network according to claim 3, wherein in the step 2, the structures of an image semantic enhancement unit, a text semantic enhancement unit and a fusion information semantic enhancement unit are the same, and the image joint text emotion analysis method comprises an edge generation unit and a graph rolling operation unit;
for characteristics of,/>First by linear transformation->And->Will->Embedding a new feature space, and then sending the new feature space into an edge generation unit to calculate similarity between node features so as to capture the connection between the node features, wherein the expression is as follows:
wherein,and->For a learnable parameter->Respectively represent the firstijThe mode of each sample ismIs characterized in that,constructing an emotion association graph for the similarity coefficients among the nodes according to the obtained similarity coefficients among the nodes, wherein the calculation mode of the adjacent matrix is as follows:
wherein,is thatmTotal number of nodes under modality, +.>Represent the firstkThe mode of each sample ismIs characterized in that,Eis a diagonal identity matrix>For the similarity matrix of the graph nodes, +.>For matrix->Element of (a)>Finally, the node characteristics with strong emotion expression in the single-mode data are aggregated through a graph convolution operation unit, and the graph convolution expression is as follows:
wherein,respectively +.>Output and input of layer diagram convolution, +.>Is->Layer diagram convolved learnable parameter matrix, +.>For ReLU activation function, +.>Is the input of the layer 1 graph convolution.
5. The image joint text emotion analysis method based on the modal fusion graph rolling network according to claim 1, wherein in the step 3, a loss function is usedIncluding emotion classification loss function->And tag-based contrast learning loss function>The expression is:
wherein,parameters to be optimized for the model, +.>Is the firstiTrue value of individual samples, +.>For model pair numberiPrediction of individual sample outputsValue of->And->Respectively represent the first in the same batchiAnd (d)jGlobal emotion fusion feature of individual samples, +.>In order to compare the coefficients of the learning,Sfor batch size, +.>Is the first toiThe set of all sample numbers with the same label for each sample, the super-parameters of the model include learning rate +.>Iteration number epoch, batch sizeSAnd the depth and number of layers of the model.
6. The image joint text emotion analysis method based on the modal fusion graph rolling network according to claim 1, wherein in the step 4, model parameters are updated through a back propagation algorithm.
7. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the image joint text emotion analysis method based on a modal fusion graph rolling network as claimed in any one of claims 1 to 6.
8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the image joint text emotion analysis method based on a modal fusion graph rolling network as claimed in any one of claims 1 to 6.
CN202410021947.4A 2024-01-08 2024-01-08 Image joint text emotion analysis method based on modal fusion graph convolution network Pending CN117540023A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410021947.4A CN117540023A (en) 2024-01-08 2024-01-08 Image joint text emotion analysis method based on modal fusion graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410021947.4A CN117540023A (en) 2024-01-08 2024-01-08 Image joint text emotion analysis method based on modal fusion graph convolution network

Publications (1)

Publication Number Publication Date
CN117540023A true CN117540023A (en) 2024-02-09

Family

ID=89796204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410021947.4A Pending CN117540023A (en) 2024-01-08 2024-01-08 Image joint text emotion analysis method based on modal fusion graph convolution network

Country Status (1)

Country Link
CN (1) CN117540023A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117953108A (en) * 2024-03-20 2024-04-30 腾讯科技(深圳)有限公司 Image generation method, device, electronic equipment and storage medium
CN118070103A (en) * 2024-03-26 2024-05-24 广东金湾信息科技有限公司 Public opinion emotion classification method and system based on machine learning
CN118427776A (en) * 2024-05-22 2024-08-02 海南大学 Intelligent medical multi-mode fusion method and system
CN119149674A (en) * 2024-11-18 2024-12-17 之江实验室 Multi-level analysis method and device for text emotion classification

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559835A (en) * 2021-02-23 2021-03-26 中国科学院自动化研究所 Multi-mode emotion recognition method
CN112784092A (en) * 2021-01-28 2021-05-11 电子科技大学 Cross-modal image text retrieval method of hybrid fusion model
CN115169440A (en) * 2022-06-16 2022-10-11 大连理工大学 Method for irony identification in social media multi-modal information
CN115239937A (en) * 2022-09-23 2022-10-25 西南交通大学 A cross-modal sentiment prediction method
CN116089619A (en) * 2023-04-06 2023-05-09 华南师范大学 Emotion classification method, apparatus, device and storage medium
CN116304042A (en) * 2023-03-13 2023-06-23 河北工业大学 A Fake News Detection Method Based on Multimodal Feature Adaptive Fusion
CN116484872A (en) * 2023-03-10 2023-07-25 西安交通大学 Multi-modal aspect emotion judging method and system based on pre-training and attention
CN116844179A (en) * 2023-07-11 2023-10-03 郑州轻工业大学 Sentiment analysis method based on multi-modal cross-attention mechanism image and text fusion
CN116864103A (en) * 2023-06-04 2023-10-10 西北工业大学 A diagnostic method for sarcopenia based on multimodal contrastive learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784092A (en) * 2021-01-28 2021-05-11 电子科技大学 Cross-modal image text retrieval method of hybrid fusion model
CN112559835A (en) * 2021-02-23 2021-03-26 中国科学院自动化研究所 Multi-mode emotion recognition method
CN115169440A (en) * 2022-06-16 2022-10-11 大连理工大学 Method for irony identification in social media multi-modal information
CN115239937A (en) * 2022-09-23 2022-10-25 西南交通大学 A cross-modal sentiment prediction method
CN116484872A (en) * 2023-03-10 2023-07-25 西安交通大学 Multi-modal aspect emotion judging method and system based on pre-training and attention
CN116304042A (en) * 2023-03-13 2023-06-23 河北工业大学 A Fake News Detection Method Based on Multimodal Feature Adaptive Fusion
CN116089619A (en) * 2023-04-06 2023-05-09 华南师范大学 Emotion classification method, apparatus, device and storage medium
CN116864103A (en) * 2023-06-04 2023-10-10 西北工业大学 A diagnostic method for sarcopenia based on multimodal contrastive learning
CN116844179A (en) * 2023-07-11 2023-10-03 郑州轻工业大学 Sentiment analysis method based on multi-modal cross-attention mechanism image and text fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QIANHUI TAN等: "Cross-Modality Fused Graph Convolutional Network for Image-Text Sentiment Analysis", 《IMAGE AND GRAPHICS . ICIG 2023. LECTURE NOTES IN COMPUTER SCIENCE》, vol. 14358, 29 October 2023 (2023-10-29), pages 397 - 411, XP047673690, DOI: 10.1007/978-3-031-46314-3_32 *
谈钱辉 等: "图像情感分析的层次图卷积网络模型", 《计算机科学》, vol. 50, no. 12, 13 September 2023 (2023-09-13), pages 203 - 211 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117953108A (en) * 2024-03-20 2024-04-30 腾讯科技(深圳)有限公司 Image generation method, device, electronic equipment and storage medium
CN118070103A (en) * 2024-03-26 2024-05-24 广东金湾信息科技有限公司 Public opinion emotion classification method and system based on machine learning
CN118427776A (en) * 2024-05-22 2024-08-02 海南大学 Intelligent medical multi-mode fusion method and system
CN119149674A (en) * 2024-11-18 2024-12-17 之江实验室 Multi-level analysis method and device for text emotion classification

Similar Documents

Publication Publication Date Title
Yang et al. Image-text multimodal emotion classification via multi-view attentional network
Liu et al. EDMF: Efficient deep matrix factorization with review feature learning for industrial recommender system
CN117540023A (en) Image joint text emotion analysis method based on modal fusion graph convolution network
Gui et al. Embedding learning with events in heterogeneous information networks
CN112966091B (en) A knowledge graph recommendation system integrating entity information and popularity
Li et al. Image sentiment prediction based on textual descriptions with adjective noun pairs
CN112805715B (en) Identifying entity-attribute relationships
Yan et al. " Shall I Be Your Chat Companion?" Towards an Online Human-Computer Conversation System
US10198497B2 (en) Search term clustering
CN113822039A (en) Method and related equipment for mining similar meaning words
CN112800225B (en) Microblog comment emotion classification method and system
CN112364743A (en) Video classification method based on semi-supervised learning and bullet screen analysis
Kaur et al. Sentiment analysis based on deep learning approaches
Maree et al. Optimizing machine learning-based sentiment analysis accuracy in bilingual sentences via preprocessing techniques
Maree et al. Semantic graph based term expansion for sentence-level sentiment analysis
CN113536015A (en) Cross-modal retrieval method based on depth identification migration
Sun et al. Rumour detection technology based on the BiGRU_capsule network
CN113869037B (en) Learning method of topic label representation based on content-enhanced network embedding
Habbat et al. LSTM-CNN deep learning model for French online product reviews classification
CN114297390A (en) Aspect category recognition method and system in long-tail distribution scenarios
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
Zhang et al. An effective convolutional neural network model for Chinese sentiment analysis
Mudigonda et al. IDEAL: an inventive optimized deep ensemble augmented learning framework for opinion mining and sentiment analysis
Date et al. A systematic survey on text-based dimensional sentiment analysis: advancements, challenges, and future directions
Xie et al. Knowledge graph construction for intelligent analysis of social networking user opinion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20240209

RJ01 Rejection of invention patent application after publication