CN118428467A - Knowledge graph intelligent construction method based on deep learning - Google Patents
Knowledge graph intelligent construction method based on deep learning Download PDFInfo
- Publication number
- CN118428467A CN118428467A CN202410664630.2A CN202410664630A CN118428467A CN 118428467 A CN118428467 A CN 118428467A CN 202410664630 A CN202410664630 A CN 202410664630A CN 118428467 A CN118428467 A CN 118428467A
- Authority
- CN
- China
- Prior art keywords
- cluster
- mode
- data
- taking
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 26
- 238000013135 deep learning Methods 0.000 title claims abstract description 19
- 239000013598 vector Substances 0.000 claims abstract description 147
- 239000011159 matrix material Substances 0.000 claims abstract description 91
- 230000004927 fusion Effects 0.000 claims abstract description 79
- 238000000034 method Methods 0.000 claims description 44
- 230000006870 function Effects 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 238000005259 measurement Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 4
- 238000009825 accumulation Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 17
- 230000008569 process Effects 0.000 description 17
- 238000004458 analytical method Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000002146 bilateral effect Effects 0.000 description 2
- 230000009849 deactivation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The application relates to the technical field of knowledge maps, and provides an intelligent knowledge map construction method based on deep learning, which comprises the following steps: acquiring text data and image data; determining common semantic base similarity based on differences between projection results of elements in the clusters in the corresponding projection matrixes of the clusters; determining the context fusion of the modes based on the word vector, the difference of projection results of the single-mode data descriptors on the consistency matrix under different modes and the expandable overlapping degree of the dual-mode semantics; determining a consistent encoding feature vector based on modality context fusion; determining and updating a dynamic programming adjustment factor based on the result of the comparison learning and the consistent coding feature vector; determining a data fusion result corresponding to each entity node based on the updated dynamic programming adjustment factors; and obtaining a multi-mode knowledge graph based on the data fusion result corresponding to the entity node. The application improves the interactivity self-adaptive determination attention weight of different modal data and improves the semantic credibility of the multi-modal knowledge graph.
Description
Technical Field
The application relates to the technical field of knowledge maps, in particular to an intelligent knowledge map construction method based on deep learning.
Background
The multi-modal data refers to information data of multiple modes, including text data, image data, audio data and the like, information complementation of different dimensions can be carried out among the multi-modal data, and fusion of the multi-modal data can further improve accuracy of data analysis.
At present, a plurality of different construction modes are constructed for the multi-mode knowledge graph, and the multi-mode knowledge graph can be directly constructed by directly carrying out named body recognition, relation extraction and the like on multi-mode data; the simple knowledge graph can be constructed by utilizing single-mode data, and then the simple knowledge graph is complemented by technologies such as entity linking, data fusion, entity embedding matching and the like to obtain the knowledge graph for multiple modes. The knowledge graph construction comprises a plurality of links such as data collection, data processing, entity identification, relation extraction, knowledge fusion and the like. The knowledge fusion generally refers to fusing the identified entities, relationships and attributes, solving the ambiguity problem in entity identification, determining the consistency of knowledge maps, and the higher the robustness of the identified entity information is, the fewer the ambiguity problem is. Therefore, how to eliminate contradiction and ambiguity between data obtained by different data sources by performing data fusion and data analysis on the data of a plurality of data sources is one of the main problems in constructing a knowledge graph.
Disclosure of Invention
The application provides a knowledge graph intelligent construction method based on deep learning, which solves the problem that contradiction and ambiguity among data obtained by different data sources affect the construction of a knowledge graph oriented to multi-mode data, and adopts the following technical scheme:
The application relates to a knowledge graph intelligent construction method based on deep learning, which comprises the following steps:
Respectively acquiring text data and image data from different data sources;
Determining the common semantic base similarity of each element based on the difference between projection results of each element in each cluster in the clustering results of each modal data in the corresponding projection matrix of each cluster;
Determining the modal context fusion between each word and each target area based on the difference of projection results of the word vector and the unimodal data descriptors on the consistency matrix under different modes and the dual-mode semantic expandable overlapping degree between the word vector and the unimodal data descriptors;
Determining a consistent coding feature vector of each cluster based on the modal context fusibility between the elements in the cluster and the other modal cluster; determining an updated dynamic programming adjustment factor of each mode based on the result of contrast learning among the clusters under each mode and the consistent coding feature vector of the clusters;
Determining a data fusion result corresponding to each entity node in the initial knowledge graph based on updated dynamic programming adjustment factors of all modes by adopting a multi-mode fusion model; and complementing the data fusion results corresponding to all the entity nodes in the initial knowledge graph to obtain the multi-mode knowledge graph.
Preferably, the method for determining the common semantic base similarity of each element based on the difference between the projection results of each element in each cluster in the cluster results of each modal data in the projection matrix corresponding to each cluster comprises the following steps:
Acquiring a Word vector of words in each text data sequence and clean image data and a single-mode data descriptor of a target area by using ELMo models and Word2vec models respectively;
A clustering algorithm is adopted to acquire clustering results of the word vector and the target area based on an undirected graph formed by the word vector and the target area of the word;
For any one word vector cluster, taking each word vector in each cluster as a row vector in a matrix, and taking the matrix formed by all word vectors in each cluster as a semantic non-negative matrix of each cluster;
For any cluster of target areas, taking a single-mode data descriptor of each target area in each cluster as a row vector in a matrix, and taking a matrix formed by single-mode data descriptors of all target areas in each cluster as a semantic non-negative matrix of each cluster;
For any cluster, taking a semantic non-negative matrix of each cluster as input, and decomposing the semantic non-negative matrix into a result of multiplying a consistency matrix and a projection matrix by adopting an NMF algorithm;
Taking the pearson correlation coefficient between each element in any cluster and the projection result of each element on the projection matrix in the semantic nonnegative matrix decomposition result of the cluster as the connotation semantic similarity corresponding to each element;
taking the difference value between the meaning semantic similarity of each element and the minimum value of the meaning semantic similarity corresponding to all elements in the cluster where each element is located as a molecule;
taking the sum of the accumulated result of the bit variance between the projection results of each element and the projection results of other elements in the cluster where the element is located and the projection matrix in the semantic nonnegative matrix decomposition result of the cluster and 0.01 as the denominator;
The ratio of the numerator to the denominator is taken as the common semantic base similarity of each element.
Preferably, the method for obtaining the Word vector of the Word in each text data sequence and clean image data and the single-mode data descriptor of the target area by using ELMo model and Word2vec model respectively comprises the following steps:
Sequentially performing word segmentation and word removal processing on each piece of original text data to obtain a sequence consisting of words as a text data sequence; using all text data sequences as input of ELMo models, and obtaining word vectors of each word in each text data sequence by using ELMo models;
Taking the denoising result of each image data as clean image data, and acquiring each target area and a preset number of category labels of each target area in each clean image data by using a CNN (computer numerical network) identification model;
The method comprises the steps of taking category description data corresponding to a preset number of category labels of each target area as input of a Word2vec model, obtaining Word vectors of each category label by using the Word2vec model, and taking vectors formed by the Word vectors of the preset number of category labels of each target area according to a descending order of confidence degrees of the category labels as single-mode data descriptors of each target area.
Preferably, the method for obtaining the clustering results of the word vector and the target area by adopting the clustering algorithm based on the word vector of the word and the undirected graph formed by the target area respectively comprises the following steps:
for text data, taking word vectors of words in all text data sequences as one node in a graph, taking cosine similarity between two word vectors as a similarity measurement result between two corresponding nodes, taking the graph formed by the word vectors of all words as input, and acquiring a clustering result of the word vectors by adopting an AP clustering algorithm;
For image data, each target area is taken as one node in the graph, the structural similarity between two target areas is taken as a similarity measurement result between the two corresponding nodes, the graph formed by all the target areas is taken as input, and an AP clustering algorithm is adopted to obtain a clustering result of the target areas.
Preferably, the method for determining the modal context fusion between each word and each target area based on the difference of projection results of word vectors and single-mode data descriptors on consistency matrixes under different modes and the dual-mode semantic expandable overlapping degree between the word vectors and the single-mode data descriptors comprises the following steps:
Wherein T i,j is the dual-mode semantic expandable overlapping degree between the ith word and the jth target area, c j is a single-mode data descriptor of the jth target area, X i、Xj is a consistency matrix obtained by decomposing the semantic non-negative matrix where c i、cj is located, J (X i,Xj) is a Jaccard coefficient between matrices X i、Xj, Y () is a cosine similarity function, Y (c i,cj) is the cosine similarity between c i、cj, h i、hj is the common semantic base similarity of the ith word and the jth target area, and mu is a parameter adjustment factor;
R ij is the modality context fusion between the ith term and the jth target area, N is the number of term vectors in the cluster where the term vector of the ith term is located, N is the nth term vector except the term vector of the ith term, M is the number of monomodal data descriptors in the cluster where the monomodal data descriptors of the jth target area are located, M is the mth monomodal data descriptors except the monomodal data descriptors of the jth target area, T n,m is the dual-mode semantic expandable overlap between the corresponding term of the nth term vector and the corresponding target area of the mth monomodal data descriptor, ct i (j) is the projection result of the term vector of the ith term on the consistency matrix of the semantic non-negative matrix where the monomodal data descriptors of the jth target area are located, ct j (i) is the projection result of the monomodal data descriptors of the jth target area on the consistency matrix where the term vector of the ith term is located, and DTW is the Distance (DTW) 5629 (i) is the distance (W i(j)、ctj).
Preferably, the method for determining the consistent coding feature vector of each cluster based on the modal context fusion between each cluster and the elements in the cluster under another mode comprises the following steps:
taking a matrix formed by the fusion of the modal context between elements in two clusters under two modes as a consistent coding matrix between the two clusters;
The average value of all elements in each row and each column of each consistent coding matrix is used as a consistent coding row characteristic value and a consistent coding column characteristic value, the vector formed by all consistent coding row characteristic values and consistent coding column characteristic values in each consistent coding matrix is used as a consistent coding characteristic row vector and a consistent coding characteristic column vector of the consistent coding matrix, and the consistent coding characteristic row vector and the consistent coding characteristic column vector are used as a consistent coding characteristic vector.
Preferably, the method for determining the updated dynamic programming adjustment factor of each mode based on the result of contrast learning among the clusters and the consistent coding feature vector of the clusters under each mode comprises the following steps:
determining a semantic expandable cluster of each cluster in each mode based on the result of element contrast learning in all clusters in each mode;
taking the average value of all elements in any one consistent coding feature vector of a consistent coding matrix between each cluster in each mode and each cluster in another mode as a molecule;
Taking the sum of the difference value and 0.01 of the distribution variance of all elements in a consistent coding feature vector of a consistent coding matrix between each cluster, any semantic expandable cluster of each cluster and the same cluster of the other mode as a denominator;
Accumulating results of the ratio of the numerator to the denominator on the semantically expandable clusters of each cluster are used as first stable values, and accumulating results of the first stable values on all the clusters under another mode are used as unilateral consistent coding stable coefficients of each cluster;
Calculating a logarithmic function calculation result taking a natural constant as a base, wherein the absolute value of a difference value between the average value of the modal context fusions between all elements in any two clusters in each modal and all data in the other modal is a power; taking the accumulated result of the product between the DTW distance between the expansion distance sequences consisting of the preset number of minimum expansion distances of any two clusters in each mode and the calculation result on each mode as the semantic expansibility of each mode;
taking the sum of semantic expansibility and 0.01 of each mode as a denominator, and taking the ratio of the sum of all unilateral uniform coding stability coefficients of all cluster clusters under each mode to the denominator as an updated dynamic programming adjustment factor of each mode.
Preferably, the method for determining the semantic expandable cluster of each cluster in each mode based on the result of element comparison learning in all clusters in each mode comprises the following steps:
Taking all elements in each cluster in each mode as positive samples, taking all elements in the rest clusters as negative samples, taking all positive samples and negative samples as inputs, and acquiring an optimized distance between each pair of positive and negative samples in a mapping space by utilizing a CPC model;
And taking the sum of the optimized distances corresponding to all elements in each cluster and the rest of each cluster as the expansion distance between each cluster and the rest of each cluster, and taking the preset number of clusters with the minimum expansion distance between each cluster as the semantic expandable clusters of each cluster.
Preferably, the method for determining the data fusion result corresponding to each entity node in the initial knowledge graph by adopting the multi-mode fusion model based on the updated dynamic programming adjustment factors of all modes includes:
Constructing a knowledge graph based on data of any one mode as an initial knowledge graph;
taking a natural constant as a base number, taking a calculation result with an updated dynamic programming adjustment factor of each mode as an index as a numerator, taking accumulation of the numerator on all modes as a denominator, and taking the ratio of the numerator to the denominator as the attention weight of each mode;
Taking all words and all target area images in a text data sequence as input, and respectively utilizing a BERT-based text feature extraction network and a VGG image feature extraction network in a multi-mode fusion model to obtain text feature vectors and image feature vectors;
updating the text feature vector and the image feature vector based on the attention weight of each mode by using a self-attention mechanism, and carrying out maximum fusion on the updated text feature vector and the updated image feature vector to obtain a multi-mode fusion update vector;
and taking the output of the multi-mode fusion model as a data fusion result corresponding to each entity node in the initial knowledge graph.
Preferably, the method for obtaining the multi-mode knowledge graph based on the completion of the data fusion results corresponding to all the entity nodes in the initial knowledge graph comprises the following steps:
and carrying out equivalent link completion updating on the data fusion result corresponding to each entity node in the initial knowledge graph and each entity node in the initial knowledge graph by using an equivalent symbol, and updating all entity nodes in the initial knowledge graph to obtain the knowledge graph as a multi-mode knowledge graph.
The beneficial effects of the application are as follows: according to the method, the overlapping condition of semantic ranges among clustering clusters of different modes is analyzed through the clustering results of the image data and the text data sequences of the target area, and the consistency of potential representation of the data in the different modes is evaluated to construct the mode context fusion; according to the method, a consistency coding matrix of the data cluster level under two modes is constructed through the mode context fusion among the data of different modes, and a consistency coding feature vector among the cluster clusters under different modes is obtained; secondly, the optimal distance between positive and negative sample pairs among different clusters in the same mode is learned by using a contrast learning model, and the method has the advantages that the contrast learning among each cluster in different modes is utilized, the interactivity of data in different modes is improved, the maximization of mutual information is realized, and semantic expansion clusters formed by possible semantic expansion results of elements in each cluster can be screened based on the optimal distance; and the semantic expansibility of elements in each cluster under a single mode is evaluated by combining the consistent coding feature vector and the semantic expansion cluster to obtain an updated dynamic programming adjustment factor, and the influence degree of each mode data on the whole knowledge graph is evaluated when the knowledge graph is completed, so that the attention weight of each mode in the multi-mode fusion model is adaptively determined, the semantic credibility of the multi-mode data fusion result is improved, and the effect of the constructed multi-mode knowledge graph in recommending and constructing an automobile is better.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a schematic flow chart of a knowledge graph intelligent construction method based on deep learning according to an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of a knowledge-graph intelligent construction method based on deep learning according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a multi-modal fusion model structure according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, a flowchart of a knowledge graph intelligent construction method based on deep learning according to an embodiment of the application is shown, and the method includes the following steps:
Step S001, acquiring text data and image data from different data sources, respectively.
In the application, taking the construction of the multi-mode knowledge graph in the vehicle purchasing recommendation system as an example, in the process of constructing the multi-mode knowledge graph, firstly, an initial knowledge graph is constructed by utilizing single-mode data, secondly, data fusion is carried out by collecting data of different modes from a plurality of data sources, the accuracy of entity attribute analysis is improved, and the initial knowledge graph is complemented based on the multi-mode data fusion result, so that the semantic information of each entity node is more accurate, the attribute is more comprehensive, and the construction of the multi-mode knowledge graph is completed.
In the application, text data and image data are used as multi-mode data to carry out data analysis, and original text data and image data related to vehicle purchase recommendation are respectively obtained. Specifically, the automobile-related text data is obtained from a large amount of automobile knowledge, related web pages of automobile science popularization, and the text data comprises but is not limited to performance parameters, manufacturer specifications, configuration files, purchase rights and interests and the like. Next, vehicle-related image data including, but not limited to, the vehicle as a whole, vehicle parts, mechanical structures, etc., is obtained from a large number of vehicle-related web pages.
Further, for the original text data, inputting jieba word segmentation tools to perform word segmentation processing, secondly, taking the word segmentation result of each original text data as input, performing the word deactivation processing by using the existing deactivation vocabulary, and taking a sequence formed by the obtained results as a text data sequence. Wherein, the stop word removal and word segmentation are known techniques, and the specific process is not repeated. For the obtained image data, in order to eliminate the influence of image noise on the analysis result of the subsequent data, denoising each image data by using a bilateral filtering denoising algorithm, taking the denoised result as clean image data, wherein the bilateral filtering denoising algorithm is a known technology, and the specific process is not repeated.
And obtaining a preprocessed text data sequence and clean image data for subsequent analysis of semantic fusibility among different modal data.
Step S002, determining the common semantic base similarity of the elements based on the difference between the projection results of the elements in the clusters in the projection matrix corresponding to the clusters; determining dual-mode semantic expandable overlap based on context semantic information similarity and common semantic base similarity among elements in different modes; and determining the modal context fusion between each word and each target area based on the difference of projection results of elements on the consistency matrix under different modes and the expandable overlapping degree of dual-mode semantics.
Each mode data contains a large amount of entities and entity attribute information, and when multi-mode data fusion is carried out, the same-level data fusion can obtain a better data analysis result. Each text data contains a relatively clear semantic entity, or keyword, for example, the wheel hub is 20 inches, hundred kilometers of acceleration time is first, and the like; each image data contains a large number of targets, a large area of background area and other semantically abstract areas. If the whole text data and the whole image data are directly fused, a result with lower quality and low semantic interpretability can be obtained.
Further, when multi-modal data is integrated into a structural representation of an entity, if all the modalities are directly projected into a common subspace to capture a commonality representation between different modalities, specific information in each modality may be lost, which may result in poor completion of the knowledge graph. Therefore, the application considers the comparison learning among each cluster under different modes, improves the interactivity of the data of different modes, and realizes the maximization of mutual information, so that the effect of the constructed multi-mode knowledge graph when recommending to construct the automobile is better, and the implementation flow of the whole scheme is shown in figure 2.
Specifically, for any modality data, similar objects exist for each word or each target, such as an automobile hub and a blade hub, a white tire image and a golden tire image, and similar data also exist for a degree of similarity between the context information in each modality. Therefore, the method and the device perform clustering processing on the data of each mode, and reduce the influence of the mode isomerism on the data fusion efficiency when acquiring the semantic information with more comprehensive coverage area.
Since the text data contains explicit context information, the image data is relatively blurred with respect to the context information. Therefore, for any piece of clean image data, each piece of clean image data is used as input of a target recognition model, the target recognition model is output as recognition results for marking all target areas, b class labels before confidence values of each target area are stored, the size of b takes an experience value of 3, the structure of the target recognition model is CNN (Convolutional Neural Network) networks, an optimization algorithm is Adam (Adaptive moment estimation) algorithms, a loss function is a cross entropy function, training of a neural network is a known technology, and specific processes are not repeated.
For text data, all text data sequences are used as input of ELMo (Embeddings form Language Models) models, the ELMo model is utilized to obtain word vectors of each word in each text data sequence, and the ELMo model is a known technology, and the specific process is not repeated. Further, taking word vectors of words in all text data sequences as one node in a graph, taking cosine similarity between two word vectors as a similarity measurement result between two corresponding nodes, taking the graph formed by the word vectors of all words as input, adopting AP (Affinity Propagation) clustering algorithm to obtain a clustering result of the word vectors, and marking a cluster where the word vector of the i-th word is located as B i; for image data, taking any one target area in each clean image number as one node in the graph, taking the structural similarity between two target areas as a similarity measurement result between the two corresponding nodes, taking the graph formed by all the target areas as input, and adopting an AP clustering algorithm to obtain the clustering result of the target areas, wherein the AP algorithm is a known technology, and the specific process is not repeated. The reason for such clustering is that the number of clusters cannot be preset because a large amount of data is contained in each modality data.
When multi-mode data fusion is performed, the same object should have similar semantics under different modes in an ideal state, and the data characteristics of each mode should be similar as much as possible in a common subspace constructed by the multi-mode data, namely, the potential representation of each mode is approximately consistent. And each object has different limiting conditions or can be combined with different constraint conditions to generate different semantic information under different modes, so that the consistency of potential data representation in different modes is evaluated according to the overlapping condition of semantic ranges among clusters of different modes.
Further, for any cluster, taking the vector corresponding to each element in each cluster as a row vector in the matrix, wherein if the cluster is corresponding to the text data, the vector corresponding to each element is a word vector; if the image data is a cluster corresponding to the image data, the vector corresponding to each element is a single-mode data descriptor. And secondly, taking a matrix formed by corresponding vectors of all elements in each cluster as a semantic non-negative matrix of each cluster. Secondly, taking a semantic Non-negative matrix of each cluster as input, and decomposing the semantic Non-negative matrix into a result of multiplication of a consistency matrix and a projection matrix by adopting an NMF (Non-negative Matrix Factorization) algorithm, wherein each column in the projection matrix is a projection result of each row vector in the semantic Non-negative matrix on the consistency matrix, and the NMF algorithm is a known technology and a specific process is not repeated.
Based on the above analysis, modality context fusions are constructed here for characterizing the degree of fusibility of context relationships of different entities in different modalities. Calculating the modal context fusion between the ith word and the jth target area:
Where h i is the common semantic base similarity of the ith term, c i is the term vector of the ith term, ct i is the projection result of c i in the projection matrix of the semantic non-negative matrix where c i is located, P (c i,cti) is the meaning semantic similarity of the ith word, the size of P (c i,cti) is equal to the pearson correlation coefficient between c i and ct i, P i,min is the minimum value of the semantic similarity of meaning for all words in cluster B i, N is the number of word vectors in cluster B i, N is the nth word vector in cluster B i except for c i, ct n is the projection result of the nth word vector in the projection matrix, lsd (ct i,ctn) is the bit variance between vectors ct i、ctn, μ is the parametrical factor for preventing the denominator from being 0, μ takes the checked value of 0.01, the pearson correlation coefficient and the bit variance are all known techniques, and the specific process is not repeated;
T i,j is the expandable overlapping degree of dual-mode semantics between the ith word and the jth target area, c j is a monomodal data descriptor of the jth target area, X i、Xj is a consistency matrix of a semantic non-negative matrix where c i、cj is located respectively, J (X i,Xj) is a Jaccard coefficient between matrices X i、Xj, Y () is a cosine similarity function, Y (c i,cj) is a cosine similarity between c i、cj, h j is a common semantic base similarity of the jth target area, and the calculation principle of the common semantic base similarity of the target area and the word is consistent and is not repeated; the Jaccard coefficient is a known technology, and the specific process is not repeated;
R ij is the modal context fusion between the ith word and the jth target area, M is the number of monomodal data descriptors in the cluster where c j is located, M is the mth monomodal data descriptor except for c j, T n,m is the dual-mode semantic expandable overlap between the word corresponding to the nth word vector and the target area corresponding to the mth monomodal data descriptor, ct i (j) is the projection result of c i on the consistency matrix of the semantic non-negative matrix where c j is located, ct j (i) is the projection result of c j on the consistency matrix of the semantic non-negative matrix where c i is located, DTW () is a DTW (Dynamic Time Warping) distance function, DTW (ct i(j),ctj (i)) is the DTW distance between ct i(j)、ctj (i), the DTW distance is a known technology, and the specific process is not repeated.
The more semantic information of the ith word can represent semantic information of all word vectors in the cluster where c i is located, the more likely the ith word is a word with stable semantics, the smaller the change of the corresponding vector before and after decomposition, the larger the value of P (c i,cti), and the larger the value of P (c i,cti)-Pi,min; The closer the word vector c i is to the semantic information of the rest of word vectors in the cluster B i, the greater the similarity between the corresponding projection results, the smaller the value of lsd (ct i,ctn), and the greater the value of h i; c i、cj represents that the higher the probability of semantic information of the same object in different modes is, the smaller the difference of consistency matrixes of clustering clusters where c i、cj is located is, the larger the value of J (X i,Xj) is, and the larger the value of Y (c i,cj) is; Meanwhile, the more stable the semantic information of c i、cj is, the smaller the influence of the semantics of the ith word and the jth target area on the adjacent data is, the closer the semantic stability is, the smaller the value of the I h i-hj is, and the larger the value of T i,j is; c i、cj is that the higher the probability of the data characterization result of the same object under different modes is, the higher the consistency in the multi-mode decomposition subspace is, the more similar the projection results on different consistency matrixes are, the smaller the value of dtw (ct i(j),ctj (i)) is; That is, the greater the value of R ij, the more similar the context information between the ith term and the jth target area, the greater the fusibility. The beneficial effect of the modality context fusion is that the influence of heterogeneous factors of the multi-modality data can be reduced by extracting the local context information of the data in different modalities, and the influence of useless components such as noise and the like on the real semantics is reduced.
And obtaining the context fusion of the modes between each word and each target area for later determining the fusion result of each data in each mode.
Step S003, determining updated dynamic programming adjustment factors of each mode based on the optimized distance between positive and negative sample pairs in the contrast learning mapping space of the mode data and the mode context fusion between different mode data.
Further, the process of constructing the initial knowledge graph based on the single mode is as follows: taking text data as an example, all text data sequences are taken as input, and a named entity recognition technology and a rule matching technology are sequentially utilized to recognize the entity and the entity attribute in each text data sequence, wherein the named entity recognition technology and the rule matching technology are known technologies, and the specific process is not repeated. And secondly, constructing triples based on the entities and the attributes in the text data sequence and the relation between the entities and the attributes, and constructing an initial knowledge graph of the vehicle purchase recommendation system based on all triples, wherein the construction of the knowledge graph is a known technology, and the specific process is not repeated.
In the process of acquiring multi-modal data from multiple data sources, the data in each modality is often dynamically updated, for example, automobile paint colors are more and more, and automobile images are also increasing in real time; the text description of the car will also increase, and correspondingly, the context information of the words in each text data sequence will also generate new semantic information. When the multi-mode data is updated in real time, the updating change on the original data can be generally considered, for example, a white automobile image updates a pink automobile image; from this text data, a 20 inch hub was updated with text data that is more pleasing to the 19 inch contoured appearance. In each cluster in each mode, a certain information expansibility exists among entities in an entity group consisting of different numbers of entities, and the expansion result often has higher similarity with the data updating result.
Specifically, for any two clusters under different modes, taking a cluster l A of the A text data and a cluster l B of the B image data as examples, respectively calculating the corresponding modal context fusion between any two elements in l A、lB, and taking a matrix constructed by the modal context fusion corresponding to all the elements as a consistent coding matrix between l A、lB, wherein the g-th row o in the consistent coding matrix is the modal context fusion between the g-th word vector corresponding word in l A and the o-th single-module descriptor corresponding target area in l B. And secondly, taking the average value of all elements in each row and each column of the consistent coding matrix as a consistent coding row characteristic value and a consistent coding column characteristic value respectively, taking vectors formed by all consistent coding row characteristic values and consistent coding column characteristic values in the consistent coding matrix as consistent coding characteristic row vectors and consistent coding characteristic column vectors of the consistent coding matrix respectively, and taking the consistent coding characteristic row vectors and the consistent coding characteristic column vectors as consistent coding characteristic vectors.
Further, all word vectors in the cluster l A are used as positive samples, all word vectors in the rest text data clusters are used as negative samples, all positive and negative samples are used as inputs, a CPC (Contrastive Predictive Coding) model is adopted, an optimization algorithm is an Adam algorithm, a loss function is InfoNCE (Information Noise Contrastive Estimation loss), an optimized distance between each pair of positive and negative samples in a mapping space is output, training of a neural network is a known technology, and detailed processes are not repeated. And counting the sum of the optimized distances corresponding to the word vectors in each text data cluster except the cluster l A, and taking the sum as the expansion distance between the cluster l A and each text data cluster. And M 2 text data clusters with the minimum expansion distance between the clusters I A are taken as semantic expandable clusters of the clusters I A, and the size of M 2 is taken as a checked value of 5. It should be noted that, for the cluster of the target area, the target area in the cluster is taken as a sample, and the above steps are repeated to obtain the semantic expandable cluster of the cluster of each target area. The purpose of this is to assess the expansibility of information semantics between individual data while ensuring that each modality data minimizes intra-class variation.
Based on the analysis, an updated dynamic programming adjustment factor is constructed herein to characterize the impact of each modality data update on the multimodal data fusion results. Calculating an updated dynamic programming adjustment factor for the a-th modality:
Wherein r a,A is a single-side consistent coding stability coefficient of an A-th cluster in a mode, N 2 is the number of clusters corresponding to image data, B is a cluster of a B-th target area, M 2 is the number of semantic expandable clusters of the A-th cluster, W is a W-th semantic expandable cluster of the A-th cluster, C AB is any consistent coding feature vector of a consistent coding matrix between the A-th cluster and a cluster l B, sigma (C AB), The distribution variance and the mean value of elements in C AB are respectively, sigma (C WB) is the distribution variance of all elements in a consistent coding feature line vector of a consistent coding matrix between a W semantic expandable cluster and a cluster l B, mu is a parameter adjustment factor and is used for preventing denominator from being 0, and the size of mu takes a tested value of 0.01;
u a is the semantic scalability of the a-th modality, N 3 is the number of clusters in the a-th modality, a is the a-th cluster in the a-th modality, The average value of the modal context fusion between all elements in the A-th and alpha-th clusters and all data in another mode is respectively shown, ln () is a logarithmic function based on a natural constant, D A、Dα is an extended distance sequence consisting of the minimum M 2 extended distances of the A-th and alpha-th clusters respectively, DTW () is a DTW (Dynamic Time Warping) distance function, DTW (D A,Dα) is a DTW distance between D A、Dα, the DTW distance is a known technology, and the specific process is not repeated;
V a is the updated dynamic programming adjustment factor for the a-th modality, Δr a,A is the sum of two single-sided consistent coding stability coefficients for the a-th cluster in the a-th modality.
Wherein the higher the fusibility between the element of the A cluster in the a-th mode and the element in the next data cluster in the other mode, the larger the value of the element in each consistent coding feature vector of the corresponding consistent coding matrix,The larger the value of (C AB)、σ(CWB), the smaller the optimization distance between the positive sample contained in the A-th cluster and the negative sample contained in the semantic expandable cluster in the feature space of the contrast model, the larger the similarity in data representation, the smaller the value between sigma (C AB)、σ(CWB), the first stable valueThe larger the value of r a,A, the larger the value of r; the less the semantic interpretation possibility of each data in the a-th mode, the weaker the expansibility, the smaller the semantic coverage of the existing data in the a-th mode and the smaller the semantic change among different data, the closer the fusibility of elements in different clusters and the data of the other modes,The smaller the value of (c) is, The smaller the value of dtw (D A,Dα), the smaller the value of u a, the greater the likelihood that different clusters have the same semantic expandable cluster, the smaller the differences between the expanded distance sequences; the larger the value of V a is, the weaker the semantic expansibility of the data under the a-th mode is, and the larger the influence of the completion result of the data under the a-th mode on the knowledge graph is.
Thus, updated dynamic programming adjustment factors of each mode are obtained and used for later determining the attention weight of each mode in the attention mechanism when the deep learning model is used for multi-mode data fusion.
Step S004, self-adaptively determining the attention weight of each mode based on the updated dynamic programming adjustment factors of all modes, and completing the construction of the multi-mode knowledge graph based on the attention weights by adopting a multi-mode fusion model.
Further, updated dynamic programming adjustment factors under all modes are calculated, a sequence formed by the updated dynamic programming adjustment factors under all modes is used as an input sequence of a self-attention mechanism in the fusion model, and attention weight of each mode data is determined based on the self-attention mechanism. Calculating the attention weight of the a-th modality:
where w a is the attention weight of the a-th modality, V a is the updated dynamic programming adjustment factor of the a-th modality, exp () is an exponential function based on a natural constant, and K is the number of modality types.
Further, taking all words and all target area images in a text data sequence as input, and respectively utilizing a BERT (Bidirectional Encoder Representations from Transformers) -based text feature extraction network and a VGG (Visual Geometry Group) image feature extraction network in a multi-mode fusion model to obtain text feature vectors and image feature vectors, wherein an optimization algorithm in a BERT network is Adam, and a loss function is a cross entropy loss function; the VGG network takes an Adam algorithm as an optimization algorithm and takes an MSE function as a loss function; the text feature vector and the image feature vector are respectively updated based on the attention weight of each mode by utilizing a self-attention mechanism, the updated text feature vector and the updated image feature vector are subjected to maximum fusion to obtain a multi-mode fusion update vector, the output of the multi-mode fusion update vector after passing through a classifier is used as a data fusion result corresponding to each entity node in the initial knowledge graph, the classifier is a softmax classifier, the structure of the multi-mode fusion model is shown in fig. 3, and the training of the neural network is a specific process of the known technology and is not repeated.
Further, the equivalent symbol is utilized to carry out equivalent link on the data fusion result corresponding to each entity node in the initial knowledge graph and each entity node in the initial knowledge graph to finish updating, and the knowledge graph obtained after updating all entity nodes in the initial knowledge graph is used as a multi-mode knowledge graph.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. The foregoing description of the preferred embodiments of the present application is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present application are intended to be included within the scope of the present application.
Claims (10)
1. The knowledge graph intelligent construction method based on deep learning is characterized by comprising the following steps of:
Respectively acquiring text data and image data from different data sources;
Determining the common semantic base similarity of each element based on the difference between projection results of each element in each cluster in the clustering results of each modal data in the corresponding projection matrix of each cluster;
Determining the modal context fusion between each word and each target area based on the difference of projection results of the word vector and the unimodal data descriptors on the consistency matrix under different modes and the dual-mode semantic expandable overlapping degree between the word vector and the unimodal data descriptors;
Determining a consistent coding feature vector of each cluster based on the modal context fusibility between the elements in the cluster and the other modal cluster; determining an updated dynamic programming adjustment factor of each mode based on the result of contrast learning among the clusters under each mode and the consistent coding feature vector of the clusters;
Determining a data fusion result corresponding to each entity node in the initial knowledge graph based on updated dynamic programming adjustment factors of all modes by adopting a multi-mode fusion model; and complementing the data fusion results corresponding to all the entity nodes in the initial knowledge graph to obtain the multi-mode knowledge graph.
2. The knowledge graph intelligent construction method based on deep learning according to claim 1, wherein the method for determining the common semantic base similarity of each element based on the difference between projection results of each element in each cluster in the projection matrix corresponding to each cluster in the clustering results of each modal data is as follows:
Acquiring a Word vector of words in each text data sequence and clean image data and a single-mode data descriptor of a target area by using ELMo models and Word2vec models respectively;
A clustering algorithm is adopted to acquire clustering results of the word vector and the target area based on an undirected graph formed by the word vector and the target area of the word;
For any one word vector cluster, taking each word vector in each cluster as a row vector in a matrix, and taking the matrix formed by all word vectors in each cluster as a semantic non-negative matrix of each cluster;
For any cluster of target areas, taking a single-mode data descriptor of each target area in each cluster as a row vector in a matrix, and taking a matrix formed by single-mode data descriptors of all target areas in each cluster as a semantic non-negative matrix of each cluster;
For any cluster, taking a semantic non-negative matrix of each cluster as input, and decomposing the semantic non-negative matrix into a result of multiplying a consistency matrix and a projection matrix by adopting an NMF algorithm;
Taking the pearson correlation coefficient between each element in any cluster and the projection result of each element on the projection matrix in the semantic nonnegative matrix decomposition result of the cluster as the connotation semantic similarity corresponding to each element;
taking the difference value between the meaning semantic similarity of each element and the minimum value of the meaning semantic similarity corresponding to all elements in the cluster where each element is located as a molecule;
taking the sum of the accumulated result of the bit variance between the projection results of each element and the projection results of other elements in the cluster where the element is located and the projection matrix in the semantic nonnegative matrix decomposition result of the cluster and 0.01 as the denominator;
The ratio of the numerator to the denominator is taken as the common semantic base similarity of each element.
3. The knowledge graph intelligent construction method based on deep learning according to claim 2, wherein the method for obtaining the Word vector of the Word in each text data sequence and the clean image data and the single-mode data descriptor of the target area by using ELMo model and Word2vec model respectively comprises the following steps:
Sequentially performing word segmentation and word removal processing on each piece of original text data to obtain a sequence consisting of words as a text data sequence; using all text data sequences as input of ELMo models, and obtaining word vectors of each word in each text data sequence by using ELMo models;
Taking the denoising result of each image data as clean image data, and acquiring each target area and a preset number of category labels of each target area in each clean image data by using a CNN (computer numerical network) identification model;
The method comprises the steps of taking category description data corresponding to a preset number of category labels of each target area as input of a Word2vec model, obtaining Word vectors of each category label by using the Word2vec model, and taking vectors formed by the Word vectors of the preset number of category labels of each target area according to a descending order of confidence degrees of the category labels as single-mode data descriptors of each target area.
4. The knowledge graph intelligent construction method based on deep learning according to claim 2, wherein the method for obtaining the clustering results of the word vector and the target area based on the word vector of the word and the undirected graph formed by the target area by adopting the clustering algorithm comprises the following steps:
for text data, taking word vectors of words in all text data sequences as one node in a graph, taking cosine similarity between two word vectors as a similarity measurement result between two corresponding nodes, taking the graph formed by the word vectors of all words as input, and acquiring a clustering result of the word vectors by adopting an AP clustering algorithm;
For image data, each target area is taken as one node in the graph, the structural similarity between two target areas is taken as a similarity measurement result between the two corresponding nodes, the graph formed by all the target areas is taken as input, and an AP clustering algorithm is adopted to obtain a clustering result of the target areas.
5. The knowledge graph intelligent construction method based on deep learning according to claim 1, wherein the method for determining the modality context fusion between each word and each target area based on the difference of projection results of word vectors and single-modality data descriptors on consistency matrixes under different modalities and the dual-mode semantic expandable overlapping degree between the word vectors and the single-modality data descriptors is as follows:
Wherein T i,j is the dual-mode semantic expandable overlapping degree between the ith word and the jth target area, c j is a single-mode data descriptor of the jth target area, X i、Xj is a consistency matrix obtained by decomposing the semantic non-negative matrix where c i、cj is located, J (X i,Xj) is a Jaccard coefficient between matrices X i、Xj, Y () is a cosine similarity function, Y (c i,cj) is the cosine similarity between c i、cj, h i、hj is the common semantic base similarity of the ith word and the jth target area, and mu is a parameter adjustment factor;
R ij is the modality context fusion between the ith term and the jth target area, N is the number of term vectors in the cluster where the term vector of the ith term is located, N is the nth term vector except the term vector of the ith term, M is the number of monomodal data descriptors in the cluster where the monomodal data descriptors of the jth target area are located, M is the mth monomodal data descriptors except the monomodal data descriptors of the jth target area, T n,m is the dual-mode semantic expandable overlap between the corresponding term of the nth term vector and the corresponding target area of the mth monomodal data descriptor, ct i (j) is the projection result of the term vector of the ith term on the consistency matrix of the semantic non-negative matrix where the monomodal data descriptors of the jth target area are located, ct j (i) is the projection result of the monomodal data descriptors of the jth target area on the consistency matrix where the term vector of the ith term is located, and DTW is the Distance (DTW) 5629 (i) is the distance (W i(j)、ctj).
6. The knowledge-graph intelligent construction method based on deep learning according to claim 1, wherein the method for determining the consistent coding feature vector of each cluster based on the modal context fusion between each cluster and the elements in the cluster under another modality is as follows:
taking a matrix formed by the fusion of the modal context between elements in two clusters under two modes as a consistent coding matrix between the two clusters;
The average value of all elements in each row and each column of each consistent coding matrix is used as a consistent coding row characteristic value and a consistent coding column characteristic value, the vector formed by all consistent coding row characteristic values and consistent coding column characteristic values in each consistent coding matrix is used as a consistent coding characteristic row vector and a consistent coding characteristic column vector of the consistent coding matrix, and the consistent coding characteristic row vector and the consistent coding characteristic column vector are used as a consistent coding characteristic vector.
7. The knowledge graph intelligent construction method based on deep learning according to claim 1, wherein the method for determining the updated dynamic programming adjustment factor of each mode based on the result of contrast learning among clusters and the consistent coding feature vector of the clusters in each mode comprises the following steps:
determining a semantic expandable cluster of each cluster in each mode based on the result of element contrast learning in all clusters in each mode;
taking the average value of all elements in any one consistent coding feature vector of a consistent coding matrix between each cluster in each mode and each cluster in another mode as a molecule;
Taking the sum of the difference value and 0.01 of the distribution variance of all elements in a consistent coding feature vector of a consistent coding matrix between each cluster, any semantic expandable cluster of each cluster and the same cluster of the other mode as a denominator;
Accumulating results of the ratio of the numerator to the denominator on the semantically expandable clusters of each cluster are used as first stable values, and accumulating results of the first stable values on all the clusters under another mode are used as unilateral consistent coding stable coefficients of each cluster;
Calculating a logarithmic function calculation result taking a natural constant as a base, wherein the absolute value of a difference value between the average value of the modal context fusions between all elements in any two clusters in each modal and all data in the other modal is a power; taking the accumulated result of the product between the DTW distance between the expansion distance sequences consisting of the preset number of minimum expansion distances of any two clusters in each mode and the calculation result on each mode as the semantic expansibility of each mode;
taking the sum of semantic expansibility and 0.01 of each mode as a denominator, and taking the ratio of the sum of all unilateral uniform coding stability coefficients of all cluster clusters under each mode to the denominator as an updated dynamic programming adjustment factor of each mode.
8. The knowledge graph intelligent construction method based on deep learning according to claim 1, wherein the method for determining the semantic expandable cluster of each cluster in each mode based on the result of element contrast learning in all clusters in each mode is as follows:
Taking all elements in each cluster in each mode as positive samples, taking all elements in the rest clusters as negative samples, taking all positive samples and negative samples as inputs, and acquiring an optimized distance between each pair of positive and negative samples in a mapping space by utilizing a CPC model;
And taking the sum of the optimized distances corresponding to all elements in each cluster and the rest of each cluster as the expansion distance between each cluster and the rest of each cluster, and taking the preset number of clusters with the minimum expansion distance between each cluster as the semantic expandable clusters of each cluster.
9. The intelligent knowledge graph construction method based on deep learning according to claim 1, wherein the method for determining the data fusion result corresponding to each entity node in the initial knowledge graph based on the updated dynamic programming adjustment factors of all modes by adopting the multi-mode fusion model is as follows:
Constructing a knowledge graph based on data of any one mode as an initial knowledge graph;
taking a natural constant as a base number, taking a calculation result with an updated dynamic programming adjustment factor of each mode as an index as a numerator, taking accumulation of the numerator on all modes as a denominator, and taking the ratio of the numerator to the denominator as the attention weight of each mode;
Taking all words and all target area images in a text data sequence as input, and respectively utilizing a BERT-based text feature extraction network and a VGG image feature extraction network in a multi-mode fusion model to obtain text feature vectors and image feature vectors;
updating the text feature vector and the image feature vector based on the attention weight of each mode by using a self-attention mechanism, and carrying out maximum fusion on the updated text feature vector and the updated image feature vector to obtain a multi-mode fusion update vector;
and taking the output of the multi-mode fusion model as a data fusion result corresponding to each entity node in the initial knowledge graph.
10. The intelligent knowledge graph construction method based on deep learning according to claim 1, wherein the method for obtaining the multi-modal knowledge graph based on the completion of the data fusion results corresponding to all the entity nodes in the initial knowledge graph comprises the following steps:
and carrying out equivalent link completion updating on the data fusion result corresponding to each entity node in the initial knowledge graph and each entity node in the initial knowledge graph by using an equivalent symbol, and updating all entity nodes in the initial knowledge graph to obtain the knowledge graph as a multi-mode knowledge graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410664630.2A CN118428467A (en) | 2024-05-27 | 2024-05-27 | Knowledge graph intelligent construction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410664630.2A CN118428467A (en) | 2024-05-27 | 2024-05-27 | Knowledge graph intelligent construction method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118428467A true CN118428467A (en) | 2024-08-02 |
Family
ID=92331507
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410664630.2A Pending CN118428467A (en) | 2024-05-27 | 2024-05-27 | Knowledge graph intelligent construction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118428467A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022351A (en) * | 2016-04-27 | 2016-10-12 | 天津中科智能识别产业技术研究院有限公司 | Learning robustness multi-view clustering method based on nonnegative dictionaries |
US20230186600A1 (en) * | 2021-12-09 | 2023-06-15 | Vinai Artificial Intelligence Application And Research Joint Stock Company | Method of clustering using encoder-decoder model based on attention mechanism and storage medium for image recognition |
CN116935277A (en) * | 2023-07-21 | 2023-10-24 | 中国工商银行股份有限公司 | Multi-mode emotion recognition method and device |
CN117540750A (en) * | 2023-12-25 | 2024-02-09 | 卓世科技(海南)有限公司 | Intelligent customer service semantic analysis method based on knowledge graph |
CN117935339A (en) * | 2024-03-19 | 2024-04-26 | 北京长河数智科技有限责任公司 | Micro-expression recognition method based on multi-modal fusion |
-
2024
- 2024-05-27 CN CN202410664630.2A patent/CN118428467A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022351A (en) * | 2016-04-27 | 2016-10-12 | 天津中科智能识别产业技术研究院有限公司 | Learning robustness multi-view clustering method based on nonnegative dictionaries |
US20230186600A1 (en) * | 2021-12-09 | 2023-06-15 | Vinai Artificial Intelligence Application And Research Joint Stock Company | Method of clustering using encoder-decoder model based on attention mechanism and storage medium for image recognition |
CN116935277A (en) * | 2023-07-21 | 2023-10-24 | 中国工商银行股份有限公司 | Multi-mode emotion recognition method and device |
CN117540750A (en) * | 2023-12-25 | 2024-02-09 | 卓世科技(海南)有限公司 | Intelligent customer service semantic analysis method based on knowledge graph |
CN117935339A (en) * | 2024-03-19 | 2024-04-26 | 北京长河数智科技有限责任公司 | Micro-expression recognition method based on multi-modal fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111126396B (en) | Image recognition method, device, computer equipment and storage medium | |
CN110069709B (en) | Intention recognition method, device, computer readable medium and electronic equipment | |
CN113887643B (en) | New dialogue intention recognition method based on pseudo tag self-training and source domain retraining | |
CN109948735B (en) | Multi-label classification method, system, device and storage medium | |
US20220269718A1 (en) | Method And Apparatus For Tracking Object | |
CN113537304A (en) | Cross-modal semantic clustering method based on bidirectional CNN | |
CN112529638B (en) | Service demand dynamic prediction method and system based on user classification and deep learning | |
CN114662652A (en) | Expert recommendation method based on multi-mode information learning | |
CN116976505A (en) | Click rate prediction method of decoupling attention network based on information sharing | |
CN112749737A (en) | Image classification method and device, electronic equipment and storage medium | |
CN111061923A (en) | Graph data entity identification method and system based on graph dependence rule and supervised learning | |
CN113535928A (en) | Service discovery method and system of long-term and short-term memory network based on attention mechanism | |
CN118428467A (en) | Knowledge graph intelligent construction method based on deep learning | |
CN113762005A (en) | Method, device, equipment and medium for training feature selection model and classifying objects | |
CN117972359B (en) | Intelligent data analysis method based on multi-mode data | |
CN117390187A (en) | Event type induction method and system based on contrast learning and iterative optimization | |
CN115953584A (en) | End-to-end target detection method and system with learnable sparsity | |
CN112507137B (en) | Small sample relation extraction method based on granularity perception in open environment and application | |
CN115017260A (en) | Keyword generation method based on subtopic modeling | |
CN114595324A (en) | Method, device, terminal and non-transitory storage medium for power grid service data domain division | |
CN114117251B (en) | Intelligent context-Bo-down fusion multi-factor matrix decomposition personalized recommendation method | |
CN112685623A (en) | Data processing method and device, electronic equipment and storage medium | |
CN106203517A (en) | The data classification method of a kind of nuclear norm driving and system | |
CN114898339B (en) | Training method, device, equipment and storage medium of driving behavior prediction model | |
CN111274216B (en) | Identification method and identification device of wireless local area network, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |