CN111026865A - Relation alignment method, device and equipment of knowledge graph and storage medium - Google Patents
Relation alignment method, device and equipment of knowledge graph and storage medium Download PDFInfo
- Publication number
- CN111026865A CN111026865A CN201910992324.0A CN201910992324A CN111026865A CN 111026865 A CN111026865 A CN 111026865A CN 201910992324 A CN201910992324 A CN 201910992324A CN 111026865 A CN111026865 A CN 111026865A
- Authority
- CN
- China
- Prior art keywords
- relationship
- vector
- relation
- sub
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 618
- 238000006243 chemical reaction Methods 0.000 claims abstract description 30
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000002427 irreversible effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000969 carrier Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of artificial intelligence, and discloses a relation alignment method, a device, equipment and a storage medium of a knowledge graph. The method comprises the following steps: acquiring triple group data in the knowledge graph; converting the ternary group data through a preset vector conversion model to obtain a relationship initial vector of each relationship; clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation; clustering initial relation vectors of the same relation in the triple data through a preset clustering algorithm to obtain a relation sub-vector of each relation; calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector; and classifying the relation according to the relation similarity.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a relation alignment method, a device, equipment and a storage medium of a knowledge graph.
Background
Knowledge graph technology is increasingly becoming the basis of artificial intelligence, and is an important method for machine understanding of natural language and construction of knowledge network. In recent years, different knowledge graph fusion is invisibly created, and the knowledge graph fusion method helps workers in different working fields to retrieve related contents, so that the working quality and the working efficiency are improved.
The legal knowledge system is very complex and is a combination of various logics. In order to construct a complete knowledge system, knowledge from various fields needs to be fused, the knowledge system is continuously updated, new entities and relationships are added in the updating process, and a problem and a challenge are presented in how to correspond the new relationships to the existing knowledge graph. Relationships of the same meaning may have different expressions, such as "country of the country" and "nationality", and the same relationship also has multiple meanings, such as < new york, belong to, usa > and < display, belong to, tv > and partial relationship. However, no relevant research on relation classification exists at present, and the efficiency of relation classification in the knowledge graph is low.
Disclosure of Invention
The invention provides a relation alignment method, a device, equipment and a storage medium of a knowledge graph, which can determine whether each relation represents the same meaning and classify the relation by vectorizing the relation based on the upper and lower hierarchical structure information of the relation and calculating the similarity between the relation vectors, thereby improving the efficiency of the relation alignment in the knowledge graph.
A first aspect of an embodiment of the present invention provides a relation alignment method for a knowledge graph, including: acquiring triple data in a knowledge graph, wherein the triple comprises a main body, a relation and an attribute; converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship; clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation; clustering the initial relation vectors of the same relation in the triple data through the preset clustering algorithm to obtain a relation sub-vector of each relation; calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship cluster vector and the relationship sub-vector; and classifying the relation according to the relation similarity.
Optionally, in a first implementation manner of the first aspect of the embodiment of the present invention, the converting the triple data by using a preset vector conversion model to obtain a relationship initial vector of each relationship includes: mapping subjects and objects of a target triplet into low-dimensional vectorsAndthe target triple is any triple in the triple data, and i is the total number of the triples in the triple data; adjustment by preset loss functionAndup toAt a minimum, whereinWhen saidAt the minimum, willSaid target triplets arranged as vector type, saidIs the relationship initial vector.
Optionally, in a second implementation manner of the first aspect of the embodiment of the present invention, the clustering all the relationship initial vectors by using a preset clustering algorithm to obtain a relationship clustering vector of each relationship includes: selecting k relation initial vectors from all the relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; calculating Euclidean distances between each relationship initial vector and k first centroid vectors, and determining a first clustering vector of each relationship initial vector, wherein the first clustering vector is the first centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; calculating a second centroid vector of each first cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector and k second centroid vectors, and determining a second clustering vector of each relationship initial vector, wherein the second clustering vector is the second centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding second cluster vector into a cluster to obtain k second clusters; calculating a third centroid vector of each second cluster class based on a mean principle; and repeatedly calculating the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, taking the N-th cluster as k clusters, taking the (N + 1) th centroid vector as a relational clustering vector, and taking N as an integer greater than 1.
Optionally, in a third implementation manner of the first aspect of the embodiment of the present invention, the clustering the initial relationship vectors of the same relationship in the triple data by using the preset clustering algorithm to obtain a relationship sub-vector of each relationship includes: selecting n initial relation vectors from all the initial relation vectors of a target relation as n first sub-centroid vectors, wherein the target relation is the same at least one relation, and n is an integer greater than 1; calculating Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determining a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is the first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; taking each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; calculating a second sub-centroid vector of each first sub-cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector of the target relationship and n second sub-centroid vectors, and determining a second sub-vector of each relationship initial vector of the target relationship, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relationship initial vector; classifying each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; calculating a third sub-centroid vector of each second sub-cluster class based on a mean principle; repeatedly calculating the M +1 sub-centroid vector of each Mth sub-cluster until the Mth sub-centroid vector is the same as the M +1 sub-centroid vector, taking n Mth sub-clusters as the subclasses of the target relationship, taking the M +1 sub-centroid vector as the relationship sub-vector of the target relationship, and taking M as an integer greater than 1.
Optionally, in a fourth implementation manner of the first aspect of the embodiment of the present invention, the calculating a relationship similarity between any two relationships based on the relationship initial vector, the relationship cluster vector, and the relationship sub-vector includes: generating a relationship vector of the relationship in each triplet through a preset formulaWherein,the relationship vector is represented by a vector of the relationship,representing the set of relational cluster vectors in a cluster of,the initial vector of said relationship is represented by,representing the relationship subvectors;
calculating the relationship similarity by cosine distance:
Optionally, in a fifth implementation manner of the first aspect of the embodiment of the present invention, the classifying the relationship according to the relationship similarity includes: determining relationship similarity results among all relationships according to the relationship similarity, wherein the relationship similarity results comprise similarity and dissimilarity; if the relationship similarity results of the relationship pairs to be classified are similar, and the relationship pairs to be classified are any two relationships, bisecting the relationship pairs to be classified into a cluster; and if the similar results of the relations of the relation pairs to be classified are not similar, keeping the original clustering of each relation in the relation pairs to be classified.
Optionally, the determining a relationship similarity result between all relationships according to the relationship similarity includes: judging whether the relationship similarity of the relationship pairs to be classified exceeds a preset threshold value; if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, determining that the relationship pair to be classified is similar; and if the relationship similarity of the relationship pair to be classified does not exceed the preset threshold, determining that the relationship pair to be classified is not similar.
A second aspect of an embodiment of the present invention provides a relation alignment apparatus for a knowledge graph, including: the acquisition unit is used for acquiring triple data in the knowledge graph, wherein the triple comprises a main body, a relation and an attribute; the conversion unit is used for converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship; the first clustering unit is used for clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation; the second clustering unit is used for clustering the initial relationship vectors of the same relationship in the triple data through the preset clustering algorithm to obtain a relationship sub-vector of each relationship; the calculating unit is used for calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector; and the classification unit is used for classifying the relationship according to the relationship similarity.
Optionally, in a first implementation manner of the second aspect of the embodiment of the present invention, the conversion unit is specifically configured to: mapping subjects and objects of a target triplet into low-dimensional vectorsAndthe target triple is any triple in the triple data, and i is the total number of the triples in the triple data; adjustment by preset loss functionAndup toAt a minimum, whereinWhen saidAt the minimum, willSaid target triplets arranged as vector type, saidIs the relationship initial vector.
Optionally, in a second implementation manner of the second aspect of the embodiment of the present invention, the first clustering unit is specifically configured to: selecting k relation initial vectors from all the relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; calculating Euclidean distances between each relationship initial vector and k first centroid vectors, and determining a first clustering vector of each relationship initial vector, wherein the first clustering vector is the first centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; calculating a second centroid vector of each first cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector and k second centroid vectors, and determining a second clustering vector of each relationship initial vector, wherein the second clustering vector is the second centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding second cluster vector into a cluster to obtain k second clusters; calculating a third centroid vector of each second cluster class based on a mean principle; and repeatedly calculating the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, taking the N-th cluster as k clusters, taking the (N + 1) th centroid vector as a relational clustering vector, and taking N as an integer greater than 1.
Optionally, in a third implementation manner of the second aspect of the embodiment of the present invention, the second clustering unit is specifically configured to: selecting n initial relation vectors from all the initial relation vectors of a target relation as n first sub-centroid vectors, wherein the target relation is the same at least one relation, and n is an integer greater than 1; calculating Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determining a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is the first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; taking each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; calculating a second sub-centroid vector of each first sub-cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector of the target relationship and n second sub-centroid vectors, and determining a second sub-vector of each relationship initial vector of the target relationship, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relationship initial vector; classifying each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; calculating a third sub-centroid vector of each second sub-cluster class based on a mean principle; repeatedly calculating the M +1 sub-centroid vector of each Mth sub-cluster until the Mth sub-centroid vector is the same as the M +1 sub-centroid vector, taking n Mth sub-clusters as the subclasses of the target relationship, taking the M +1 sub-centroid vector as the relationship sub-vector of the target relationship, and taking M as an integer greater than 1.
Optionally, in a fourth implementation manner of the second aspect of the embodiment of the present invention, the calculation unit includes: a generating module, configured to generate a relationship vector of the relationship in each triplet according to a preset formulaWherein,the relationship vector is represented by a vector of the relationship,representing the set of relational cluster vectors in a cluster of,the initial vector of said relationship is represented by,representing the relationship subvectors; the calculating module is used for calculating the relation similarity through the cosine distance:
Optionally, in a fifth implementation manner of the second aspect of the embodiment of the present invention, the classifying unit includes: the determining module is used for determining relationship similarity results among all relationships according to the relationship similarity, wherein the relationship similarity results comprise similarity and dissimilarity; the classification module is used for bisecting the relation to be classified into a cluster if the relation similarity results of the relation pairs to be classified are similar and the relation pairs to be classified are any two relations; and the classification module is also used for keeping the original clustering of each relation in the relation pairs to be classified if the similar results of the relations in the relation pairs to be classified are not similar.
Optionally, in a sixth implementation manner of the second aspect of the embodiment of the present invention, the determining module is specifically configured to: judging whether the relationship similarity of the relationship pairs to be classified exceeds a preset threshold value; if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, determining that the relationship pair to be classified is similar; and if the relationship similarity of the relationship pair to be classified does not exceed the preset threshold, determining that the relationship pair to be classified is not similar.
A third aspect of the embodiments of the present invention provides a relation alignment apparatus of a knowledge graph, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the relation alignment method of the knowledge graph according to any one of the above embodiments when executing the computer program.
A fourth aspect of an embodiment of the present invention provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to perform the steps of the relation alignment method of a knowledge-graph according to any one of the above embodiments.
In the technical scheme provided by the embodiment of the invention, triple data in a knowledge graph is obtained, wherein the triple comprises a main body, a relation and an attribute; converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship; clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation; clustering the initial relation vectors of the same relation in the triple data through the preset clustering algorithm to obtain a relation sub-vector of each relation; calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship cluster vector and the relationship sub-vector; and classifying the relation according to the relation similarity. According to the embodiment of the invention, the upper and lower hierarchical structures of the relation of the knowledge graph are constructed, the relation is vectorized based on the upper and lower hierarchical structure information, whether each relation represents the same meaning or not is determined by calculating the similarity between the relation vectors, and the relation is classified, so that the efficiency of relation alignment in the knowledge graph is improved.
Drawings
FIG. 1 is a diagram of an embodiment of a relation alignment method of a knowledge graph in an embodiment of the present invention;
FIG. 2 is a diagram of another embodiment of the relation alignment method of the knowledge-graph in the embodiment of the invention;
FIG. 3 is a diagram of an embodiment of a relation alignment apparatus of a knowledge-graph in an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a relation alignment apparatus of a knowledge-graph in an embodiment of the invention;
FIG. 5 is a diagram of an embodiment of a relation alignment apparatus of a knowledge-graph in an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a relation alignment method, a device, equipment and a storage medium of a knowledge graph, which can determine whether each relation represents the same meaning and classify the relation by vectorizing the relation based on the upper and lower hierarchical structure information of the relation and calculating the similarity between relation vectors, thereby improving the efficiency of relation alignment in the knowledge graph.
In order to make the technical field of the invention better understand the scheme of the invention, the embodiment of the invention will be described in conjunction with the attached drawings in the embodiment of the invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, a flowchart of a relation alignment method of a knowledge graph according to an embodiment of the present invention specifically includes:
101. and acquiring triple data in the knowledge graph, wherein the triple comprises a main body, a relation and an attribute.
The server acquires triple data in the legal knowledge graph, wherein the triple comprises a main body, a relation and an attribute. The data structure of the knowledge graph is a triplet, in which in a triplet (h, r, t), h is subject, t is object, and r is a relationship, such as a triplet (new york, belong to, usa), new york is subject, usa is object, belong to relationship. A triplet is an intuitive data structure, where a subject and an object are collectively referred to as an entity (entity), and the relationship has irreversible attributes, and in a triplet, the subject and the object cannot be interchanged.
102. And converting the ternary group data through a preset vector conversion model to obtain a relation initial vector of each relation.
And the server converts the ternary group data through a preset vector conversion model to obtain a relationship initial vector of each relationship. The preset vector conversion model converts character-type triple data into vector-type triple data, converts character-type relational data into relational initial vectors, converts character-type subject data into subject vectors, and converts character-type object data into object vectors. Each triplet corresponds to a set of vectors.
Specifically, the subject and the object of the target triple are mapped into low-dimensional vectorsAndthe target triple is any triple in the triple data, and i is the total number of the triples in the triple data; adjustment by preset loss functionAndup toAt a minimum, whereinWhen in useAt the minimum, willThe target triplets are set to be of the vector type,for the relationship initial vector, for example, the triplet (taiwan, belong to, china), the server maps the subject "taiwan" and the object "china" into a low-dimensional vector and adjusts the vectors to be (1, 0) and (0, 1), resulting in that the relationship initial vector of the relationship "belong to" is (1, -1).
It should be noted that, in a knowledge graph, the vector form of a subject, object or relationship in any triplet is unique.
The preset vector conversion model is a conversion embedding (TransE) model, the TransE model is trained based on samples in the legal knowledge graph, and triple data in the legal knowledge graph are converted into a vector form.
103. And clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation.
And the server clusters all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation. In the knowledge graph, different relations may have the same meaning, for example, "country of the country" and "nationality" have the same meaning, the server classifies different relations representing the same meaning into one class, and sets a relation clustering vector of the same class of relations. In the same type of relationship, the Euclidean distance between the relationship initial vector of each relationship and the relationship clustering vector of the type is the minimum.
Specifically, the server selects k relation initial vectors from all relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; the server calculates the Euclidean distance between each relation initial vector and k first centroid vectors, and determines a first clustering vector of each relation initial vector, wherein the first clustering vector is the first centroid vector with the minimum Euclidean distance to one relation initial vector; the server classifies each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; the server calculates a second centroid vector of each first cluster class based on a mean principle; the server calculates Euclidean distances between each relation initial vector and k second centroid vectors, and determines a second clustering vector of each relation initial vector, wherein the second clustering vector is the second centroid vector with the minimum Euclidean distance to one relation initial vector; the server classifies each relationship initial vector and the corresponding second cluster vector into a cluster class to obtain k second cluster classes; the server calculates a third centroid vector of each second cluster based on a mean principle; and the server repeatedly calculates the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, the N-th cluster is used as k clusters, the (N + 1) th centroid vector is used as a relational clustering vector, and N is an integer greater than 1.
For example, when the relation initial vectors include (1, 0), (1, 2), (2, 1), (0, 3), the server selects 2 relation initial vectors (1, 0) and (2, 1) as the first centroid vector, and the euclidean distance between the relation initial vector (1, 2) and the first centroid vector (1, 0) is 2 and the euclidean distance between the relation initial vector (1, 2) and the first centroid vector (2, 1)The first cluster vector of the relationship initial vector (1, 2) is (2, 1), the first cluster vector of the relationship initial vector (0, 3) is (2, 1), then (1, 0) is a first cluster class, (1, 2), (0, 3) and (2, 1) are another cluster class, and the server obtains the second centroid vectors of the first cluster class by calculation as (1/1, 0/1) and (2, 1)I.e., (1, 0) and (1, 2), respectivelyAnd the server determines that the first centroid vector is the same as the second centroid vector, and then the server takes the second centroid vector as a relational clustering vector, and the 2 first clusters are 2 clusters.
104. And clustering the initial relation vectors of the same relation in the triple data through a preset clustering algorithm to obtain a relation sub-vector of each relation.
And the server clusters all the initial relation vectors of each relation through a preset clustering algorithm to obtain a relation sub-vector of each relation. The server converts the multiple triples of the relation through a preset vector conversion model to obtain multiple initial relation vectors of the relation. In a triplet, the subject, the object and the relationship are all converted into vectors through a preset vector conversion model, and the vectors obtained by converting different triples are different.
It should be noted that a relationship exists in different triples, a relationship has the same or different meaning in different triples, and an initial relationship vector converted from a relationship in different triples is different, for example, a "belonging" relationship exists in a triplet (new york, belong to, usa), and in a triplet (new york, belong to, usa), the "belonging" relationship represents a regional dependency relationship, and the initial relationship vector of the "belonging" relationship is (1,0,1, 1); the "belonging" relationship also exists in the triplet (display, belonging, tv), where the "belonging" relationship represents partial and global dependencies, and the "belonging" relationship's initial relationship vector is (1,0,0, 1); the "belonging" relationship also exists in the triplet (beijing, belonged to, china), in which the initial relationship vector of "belonging" is (1,0, 1), and the meaning of "belonging" relationship in the triplet (beijing, belonged to, china) is the same as that in the triplet (new york, belonged to, usa).
Specifically, the server selects n relation initial vectors from all initial relation vectors of a target relation as n first sub-centroid vectors, the target relation is the same at least one relation, and n is an integer greater than 1; the server calculates Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determines a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is a first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; the server takes each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; the server calculates a second sub-centroid vector of each first sub-cluster class based on a mean principle;
the server calculates Euclidean distances between each relation initial vector of the target relation and n second sub-centroid vectors, and determines a second sub-vector of each relation initial vector of the target relation, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relation initial vector; the server classifies each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; the server calculates a third sub-centroid vector of each second sub-cluster class based on a mean principle; and the server repeatedly calculates the M +1 sub-centroid vector of each M sub-cluster until the M sub-centroid vector is the same as the M +1 sub-centroid vector, the n M sub-clusters are taken as the subclasses of the target relationship, the M +1 sub-centroid vector is taken as the relationship sub-vector of the target relationship, and M is an integer greater than 1.
105. And calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship cluster vector and the relationship sub-vector.
And the server calculates the relationship similarity between any two relationships based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector. The relationship similarity is calculated by combining the relationship initial vector, the relationship clustering vector and the relationship sub-vector, and whether the meaning of the relationship in each triple is the same or not can be more accurately determined by utilizing the hierarchical structure information of the relationship, so that the relationship can be classified, and the accuracy of identifying the relationship is improved.
106. And classifying the relation according to the relation similarity.
The server classifies the relations according to the relation similarity, the similar relations are divided into a cluster, and if the two relations are a cluster, the similar relations are divided into a subclass.
It can be understood that, when a new relationship is added to the knowledge graph spectrum, the server first divides the distance of the relationship by calculating the euclidean distance between the initial relationship vector of the relationship and each relationship clustering vector; then, according to whether the same relation exists in the knowledge graph or not, the subclass of the relation is divided, if the same relation exists in the knowledge graph, the subclass of the relation is divided by calculating the Euclidean distance between the initial relation vector of the relation and each relation sub-vector of the same relation; if the relationship does not have the same relationship in the knowledge graph, the initial relationship vector of the relationship is used as a subclass of the relationship.
According to the embodiment of the invention, the relations are vectorized based on the upper and lower hierarchical structure information, whether the relations represent the same meaning or not is determined by calculating the similarity between the relation vectors, and the relations are classified, so that the efficiency of relation alignment in the knowledge graph is improved.
Referring to fig. 2, another embodiment of the method for aligning relationships of a knowledge graph according to an embodiment of the present invention includes:
201. and acquiring triple data in the knowledge graph, wherein the triple comprises a main body, a relation and an attribute.
The server acquires triple data in the legal knowledge graph, wherein the triple comprises a main body, a relation and an attribute. The data structure of the knowledge graph is a triplet, in which in a triplet (h, r, t), h is subject, t is object, and r is a relationship, such as a triplet (new york, belong to, usa), new york is subject, usa is object, belong to relationship. A triplet is an intuitive data structure, where the subject and object are collectively referred to as entity, and the relationship has irreversible attributes, and in a triplet, the subject and object cannot be interchanged.
202. And converting the ternary group data through a preset vector conversion model to obtain a relation initial vector of each relation.
And the server converts the ternary group data through a preset vector conversion model to obtain a relationship initial vector of each relationship. The preset vector conversion model converts character-type triple data into vector-type triple data, converts character-type relational data into relational initial vectors, converts character-type subject data into subject vectors, and converts character-type object data into object vectors. Each triplet corresponds to a set of vectors.
Specifically, the subject and the object of the target triple are mapped into low-dimensional vectorsAndthe target triple is any triple in the triple data, and i is the total number of the triples in the triple data; adjustment by preset loss functionAndup toAt a minimum, whereinWhen in useAt the minimum, willThe target triplets are set to be of the vector type,for a relationship initial vector, e.g., a triplet (taiwan, of, china), the server maps the subject "taiwan" and the object "china" to a low dimensional vector and adjusts the settings to (1,0) and (0, 1), resulting in a relationship initial vector of (1, -1) for the relationship "belonging".
It should be noted that, in a knowledge graph, the vector form of a subject, object or relationship in any triplet is unique.
The preset vector conversion model is a TransE model, the TransE model is trained based on samples in the legal knowledge graph, and the triple data in the legal knowledge graph are converted into a vector form.
203. And clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation.
And the server clusters all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation. In the knowledge graph, different relations may have the same meaning, for example, "country of the country" and "nationality" have the same meaning, the server classifies different relations representing the same meaning into one class, and sets a relation clustering vector of the same class of relations. In the same type of relationship, the Euclidean distance between the relationship initial vector of each relationship and the relationship clustering vector of the type is the minimum.
Specifically, the server selects k relation initial vectors from all relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; the server calculates the Euclidean distance between each relation initial vector and k first centroid vectors, and determines a first clustering vector of each relation initial vector, wherein the first clustering vector is the first centroid vector with the minimum Euclidean distance to one relation initial vector; the server classifies each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; the server calculates a second centroid vector of each first cluster class based on a mean principle; the server calculates Euclidean distances between each relation initial vector and k second centroid vectors, and determines a second clustering vector of each relation initial vector, wherein the second clustering vector is the second centroid vector with the minimum Euclidean distance to one relation initial vector; the server classifies each relationship initial vector and the corresponding second cluster vector into a cluster class to obtain k second cluster classes; the server calculates a third centroid vector of each second cluster based on a mean principle; and the server repeatedly calculates the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, the N-th cluster is used as k clusters, the (N + 1) th centroid vector is used as a relational clustering vector, and N is an integer greater than 1.
For example, when the relation initial vectors include (1, 0), (1, 2), (2, 1), (0, 3), the server selects 2 relation initial vectors (1, 0) and (2, 1) as the first centroid vector, and the euclidean distance between the relation initial vector (1, 2) and the first centroid vector (1, 0) is 2 and the euclidean distance between the relation initial vector (1, 2) and the first centroid vector (2, 1)The first cluster vector of the relationship initial vector (1, 2) is (2, 1), the first cluster vector of the relationship initial vector (0, 3) is (2, 1), then (1, 0) is a first cluster class, (1, 2), (0, 3) and (2, 1) are another cluster class, and the server obtains the second centroid vectors of the first cluster class by calculation as (1/1, 0/1) and (2, 1)Namely (1, 0) and (1, 2), the server determines that the first centroid vector is the same as the second centroid vector, and then the server takes the second centroid vector as a relational cluster vector, and 2 first clusters are 2 clusters.
204. And clustering the initial relation vectors of the same relation in the triple data through a preset clustering algorithm to obtain a relation sub-vector of each relation.
And the server clusters all the initial relation vectors of each relation through a preset clustering algorithm to obtain a relation sub-vector of each relation. The server converts the multiple triples of the relation through a preset vector conversion model to obtain multiple initial relation vectors of the relation. In a triplet, the subject, the object and the relationship are all converted into vectors through a preset vector conversion model, and the vectors obtained by converting different triples are different.
It should be noted that a relationship exists in different triples, a relationship has the same or different meaning in different triples, and an initial relationship vector converted from a relationship in different triples is different, for example, a "belonging" relationship exists in a triplet (new york, belong to, usa), and in a triplet (new york, belong to, usa), the "belonging" relationship represents a regional dependency relationship, and the initial relationship vector of the "belonging" relationship is (1,0,1, 1); the "belonging" relationship also exists in the triplet (display, belonging, tv), where the "belonging" relationship represents partial and global dependencies, and the "belonging" relationship's initial relationship vector is (1,0,0, 1); the "belonging" relationship also exists in the triplet (beijing, belonged to, china), in which the initial relationship vector of "belonging" is (1,0, 1), and the meaning of "belonging" relationship in the triplet (beijing, belonged to, china) is the same as that in the triplet (new york, belonged to, usa).
Specifically, the server selects n relation initial vectors from all initial relation vectors of a target relation as n first sub-centroid vectors, the target relation is the same at least one relation, and n is an integer greater than 1; the server calculates Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determines a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is a first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; the server takes each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; the server calculates a second sub-centroid vector of each first sub-cluster class based on a mean principle; the server calculates Euclidean distances between each relation initial vector of the target relation and n second sub-centroid vectors, and determines a second sub-vector of each relation initial vector of the target relation, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relation initial vector; the server classifies each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; the server calculates a third sub-centroid vector of each second sub-cluster class based on a mean principle; and the server repeatedly calculates the M +1 sub-centroid vector of each M sub-cluster until the M sub-centroid vector is the same as the M +1 sub-centroid vector, the n M sub-clusters are taken as the subclasses of the target relationship, the M +1 sub-centroid vector is taken as the relationship sub-vector of the target relationship, and M is an integer greater than 1.
205. Generating a relation vector of the relation in each triple through a preset formulaWherein,the relationship vector is represented by a vector of relationships,a cluster vector of the relationships is represented,the initial vector of the relationship is represented,representing the relationship subvectors.
The server generates a relation vector of the relation in each triple through a preset formulaWherein,the relationship vector is represented by a vector of relationships,a cluster vector of the relationships is represented,the initial vector of the relationship is represented,representing the relationship subvectors. The preset formula combines the relation vector with the relation clustering vector, the relation initial vector and the relation sub-vector, so that the accuracy of identifying the relation is improved.
For example, in a triplet (taiwan, belonging, china), the relationship "belonging" has the relationship initial vector of (1, -1), the relationship clustering vector of (1, 0), and the relationship word vector of (0, 1), and then in the triplet (taiwan, belonging, china), the relationship vector of the relationship "belonging" is (2, 0).
206. Calculating the relationship similarity by cosine distance:whereinAnda relationship vector representing any two relationships.
And the server calculates the relationship similarity between any two relationships based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector. The relationship similarity is calculated by combining the relationship initial vector, the relationship clustering vector and the relationship sub-vector, and whether the meaning of the relationship in each triple is the same or not can be more accurately determined by utilizing the hierarchical structure information of the relationship, so that the relationship can be classified, and the accuracy of identifying the relationship is improved. The cosine similarity calculation formula of the relationship similarity is as follows:whereinAnda relationship vector representing any two relationships.
It is understood that, in addition to calculating the relationship similarity through the cosine distance, the relationship similarity may also be calculated through other distance formulas, and is not limited herein.
207. And determining a relationship similarity result between all relationships according to the relationship similarity, wherein the relationship similarity result comprises similarity and dissimilarity.
And the server determines a relationship similarity result among all the relationships according to the relationship similarity. The server obtains a preset threshold value based on the relation similarity rule of the relation vectors in the database, judges that the relations are similar when the relation similarity exceeds the preset threshold value, and judges that the relations are dissimilar when the relation similarity does not exceed the preset threshold value. Specifically, the server judges whether the relationship similarity of the relationship pair to be classified exceeds a preset threshold value; if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, the server determines that the relationship pair to be classified is similar; and if the relation similarity of the relation pair to be classified does not exceed the preset threshold, the server determines that the relation pair to be classified is not similar.
208. And if the similar results of the relations of the relation pairs to be classified are similar and the relation pairs to be classified are any two relations, bisecting the relations to be classified into a cluster.
And if the relationship similarity results of the relationship pairs to be classified are similar and the relationship pairs to be classified are any two relationships, the server divides the relationship pairs to be classified into a cluster. In the same cluster, the relational clustering vectors of each relation are the same. And if the similar results of the relations of the relation pairs to be classified are not similar, the server keeps the original cluster of each relation in the relation pairs to be classified.
It should be noted that, after determining the cluster of the relationship, the server determines the subclass of the relationship according to whether the same relationship exists in the cluster and the original subclass of the same relationship. If the same relationship of a relationship does not exist in the knowledge graph, the server divides the relationship into a subclass separately.
It can be understood that when a new relationship is added to the knowledge graph, if the server determines that there is no similarity relationship of the relationship in the knowledge graph, the server treats the relationship alone as a cluster.
According to the embodiment of the invention, the relations are vectorized according to the upper and lower hierarchical structure information of the relations, whether the relations represent the same meaning or not is determined by calculating the similarity between the relation vectors, and the relations are classified, so that the efficiency of relation alignment in the knowledge graph is improved.
In the above description of the relation alignment method of the knowledge graph in the embodiment of the present invention, referring to fig. 3, the relation alignment apparatus of the knowledge graph in the embodiment of the present invention is described below, and an embodiment of the relation alignment apparatus of the knowledge graph in the embodiment of the present invention includes:
an obtaining unit 301, configured to obtain triple data in a knowledge graph, where the triple includes a main body, a relationship, and an attribute;
a conversion unit 302, configured to convert the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship;
a first clustering unit 303, configured to cluster all the relationship initial vectors through a preset clustering algorithm to obtain a relationship clustering vector of each relationship;
a second clustering unit 304, configured to cluster the initial relationship vectors of the same relationship in the triple data through the preset clustering algorithm to obtain a relationship sub-vector of each relationship;
a calculating unit 305, configured to calculate a relationship similarity between relationships in any two triples based on the relationship initial vector, the relationship cluster vector, and the relationship sub-vector;
a classifying unit 306, configured to classify the relationship according to the relationship similarity.
According to the embodiment of the invention, the relations are vectorized and the similarity among the relation vectors is calculated through the upper and lower hierarchical structure information based on the relations, whether the relations represent the same meaning or not is determined, and the relations are classified, so that the relation alignment efficiency in the knowledge graph is improved.
Referring to fig. 4, an embodiment of a relationship alignment apparatus for a knowledge graph according to an embodiment of the present invention includes:
an obtaining unit 301, configured to obtain triple data in a knowledge graph, where the triple includes a main body, a relationship, and an attribute;
a conversion unit 302, configured to convert the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship;
a first clustering unit 303, configured to cluster all the relationship initial vectors through a preset clustering algorithm to obtain a relationship clustering vector of each relationship;
a second clustering unit 304, configured to cluster the initial relationship vectors of the same relationship in the triple data through the preset clustering algorithm to obtain a relationship sub-vector of each relationship;
a calculating unit 305, configured to calculate a relationship similarity between relationships in any two triples based on the relationship initial vector, the relationship cluster vector, and the relationship sub-vector;
a classifying unit 306, configured to classify the relationship according to the relationship similarity.
Optionally, the conversion unit 302 is specifically configured to:
mapping subjects and objects of a target triplet into low-dimensional vectorsAndthe target triple is any triple in the triple data, and i is the total number of the triples in the triple data; adjustment by preset loss functionAndup toAt a minimum, whereinWhen saidAt the minimum, willSaid target triplets arranged as vector type, saidIs the relationship initial vector.
Optionally, the first clustering unit 303 is specifically configured to:
selecting k relation initial vectors from all the relation initial vectors as a first centroid vector, wherein k is an integer greater than 1; calculating Euclidean distances between each relationship initial vector and k first centroid vectors, and determining a first clustering vector of each relationship initial vector, wherein the first clustering vector is the first centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes; calculating a second centroid vector of each first cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector and k second centroid vectors, and determining a second clustering vector of each relationship initial vector, wherein the second clustering vector is the second centroid vector with the smallest Euclidean distance with one relationship initial vector; classifying each relationship initial vector and the corresponding second cluster vector into a cluster to obtain k second clusters; calculating a third centroid vector of each second cluster class based on a mean principle; and repeatedly calculating the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, taking the N-th cluster as k clusters, taking the (N + 1) th centroid vector as a relational clustering vector, and taking N as an integer greater than 1.
Optionally, the second clustering unit 304 is specifically configured to:
selecting n initial relation vectors from all the initial relation vectors of a target relation as n first sub-centroid vectors, wherein the target relation is the same at least one relation, and n is an integer greater than 1; calculating Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determining a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is the first sub-centroid vector with the smallest Euclidean distance with one relation initial vector; taking each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes; calculating a second sub-centroid vector of each first sub-cluster class based on a mean principle; calculating Euclidean distances between each relationship initial vector of the target relationship and n second sub-centroid vectors, and determining a second sub-vector of each relationship initial vector of the target relationship, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relationship initial vector; classifying each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes; calculating a third sub-centroid vector of each second sub-cluster class based on a mean principle; repeatedly calculating the M +1 sub-centroid vector of each Mth sub-cluster until the Mth sub-centroid vector is the same as the M +1 sub-centroid vector, taking n Mth sub-clusters as the subclasses of the target relationship, taking the M +1 sub-centroid vector as the relationship sub-vector of the target relationship, and taking M as an integer greater than 1.
Optionally, the calculating unit 305 includes:
a generating module 3051, configured to generate a relationship vector of the relationship in each triplet according to a preset formulaWherein,the relationship vector is represented by a vector of the relationship,representing the set of relational cluster vectors in a cluster of,the initial vector of said relationship is represented by,representing the relationship subvectors; 3052, calculating a relationship similarity by cosine distance:
Optionally, the classifying unit 306 includes:
a determining module 3061, configured to determine relationship similarity results among all relationships according to the relationship similarity, where the relationship similarity results include similarity and dissimilarity; the classification module 3062, configured to divide the relationship pair to be classified into two clusters if the relationship similarity results of the relationship pair to be classified are similar and the relationship pair to be classified is any two relationships; the classifying module 3062 is further configured to keep the original cluster of each relationship in the pair of relationships to be classified if the results of similarity of the relationship of the pair of relationships to be classified are not similar.
Optionally, the determining module 3061 is specifically configured to:
judging whether the relationship similarity of the relationship pairs to be classified exceeds a preset threshold value; if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, determining that the relationship pair to be classified is similar; and if the relationship similarity of the relationship pair to be classified does not exceed the preset threshold, determining that the relationship pair to be classified is not similar.
According to the embodiment of the invention, the relations are vectorized based on the upper and lower hierarchical structure information, whether the relations represent the same meaning or not is determined by calculating the similarity between the relation vectors, and the relations are classified, so that the efficiency of relation alignment in the knowledge graph is improved.
Fig. 3 to 4 describe the relation alignment apparatus of the knowledge graph in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the relation alignment apparatus of the knowledge graph in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of a relation alignment apparatus of a knowledge graph 500 according to an embodiment of the present invention, which may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 501 (e.g., one or more processors) and a memory 509, and one or more storage media 508 (e.g., one or more mass storage devices) storing applications 507 or data 506. Memory 509 and storage medium 508 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 508 may include one or more modules (not shown), each of which may include a series of instruction operations in a relationship alignment facility for a knowledge graph. Still further, the processor 501 may be configured to communicate with the storage medium 508 to execute a series of instruction operations in the storage medium 508 on the knowledge-graph relationship alignment apparatus 500.
The relation alignment apparatus 500 of the knowledge graph may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input-output interfaces 504, and/or one or more operating systems 505, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the knowledge-graph relationship alignment apparatus configuration shown in fig. 5 does not constitute a limitation of knowledge-graph relationship alignment apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. The processor 501 may perform the functions of the acquisition unit 301, the conversion unit 302, the first clustering unit 303, the second clustering unit 304, the calculation unit 305, and the classification unit 306 in the above-described embodiments.
The following describes the components of the relation alignment apparatus of the knowledge-graph in detail with reference to fig. 5:
the processor 501 is a control center of the relation alignment apparatus of the knowledge map, and may perform processing according to a set relation alignment method. The processor 501 connects various portions of the overall knowledge-graph relationship alignment apparatus using various interfaces and lines to perform the knowledge-graph relationship alignment by running or executing software programs and/or modules stored in the memory 509 and invoking data stored in the memory 509 to perform various functions of the knowledge-graph relationship alignment apparatus and process the data. The storage medium 508 and the memory 509 are carriers for storing data, the storage medium 508 may be an internal memory with a small storage capacity but a high storage speed, and the memory 509 may be an external memory with a large storage capacity but a low storage speed.
The memory 509 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing of the knowledge-graph relationship alignment apparatus 500 by executing the software programs and modules stored in the memory 509. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (classifying a relationship, etc.) required for at least one function, and the like; the storage data area may store data created from use of the relation alignment apparatus of the knowledge map (such as relation similarity, etc.), and the like. Further, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The relation alignment method program of the knowledge graph provided in the embodiment of the present invention and the received data stream are stored in the memory, and when they need to be used, the processor 501 calls from the memory 509.
When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, optical fiber, twisted pair) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., compact disk), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A relation alignment method of a knowledge graph is characterized by comprising the following steps:
acquiring triple data in a knowledge graph, wherein the triple comprises a main body, a relation and an attribute;
converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship;
clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation;
clustering the initial relation vectors of the same relation in the triple data through the preset clustering algorithm to obtain a relation sub-vector of each relation;
calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship cluster vector and the relationship sub-vector;
and classifying the relation according to the relation similarity.
2. The relation alignment method of the knowledge-graph according to claim 1, wherein the converting the triple data by a preset vector conversion model to obtain a relation initial vector of each relation comprises:
mapping subjects and objects of a target triplet into low-dimensional vectorsAndthe target triple is any triple in the triple data, and i is the total number of the triples in the triple data;
3. The method for aligning relationships of knowledge-graphs according to claim 1, wherein said clustering all of said relationship initial vectors by a preset clustering algorithm to obtain a relationship clustering vector of each relationship comprises:
selecting k relation initial vectors from all the relation initial vectors as a first centroid vector, wherein k is an integer greater than 1;
calculating Euclidean distances between each relationship initial vector and k first centroid vectors, and determining a first clustering vector of each relationship initial vector, wherein the first clustering vector is the first centroid vector with the smallest Euclidean distance with one relationship initial vector;
classifying each relationship initial vector and the corresponding first clustering vector into a cluster class to obtain k first cluster classes;
calculating a second centroid vector of each first cluster class based on a mean principle;
calculating Euclidean distances between each relationship initial vector and k second centroid vectors, and determining a second clustering vector of each relationship initial vector, wherein the second clustering vector is the second centroid vector with the smallest Euclidean distance with one relationship initial vector;
classifying each relationship initial vector and the corresponding second cluster vector into a cluster to obtain k second clusters;
calculating a third centroid vector of each second cluster class based on a mean principle;
and repeatedly calculating the (N + 1) th centroid vector of each N-th cluster until the N-th centroid vector is the same as the (N + 1) th centroid vector, taking the N-th cluster as k clusters, taking the (N + 1) th centroid vector as a relational clustering vector, and taking N as an integer greater than 1.
4. The method of aligning relationships of a knowledge-graph according to claim 1, wherein the clustering the initial relationship vectors of the same relationship in the triplet data by the preset clustering algorithm to obtain a relationship sub-vector of each relationship comprises:
selecting n initial relation vectors from all the initial relation vectors of a target relation as n first sub-centroid vectors, wherein the target relation is the same at least one relation, and n is an integer greater than 1;
calculating Euclidean distances between each relation initial vector of the target relation and n first sub-centroid vectors, and determining a first sub-vector of each relation initial vector of the target relation, wherein the first sub-vector is the first sub-centroid vector with the smallest Euclidean distance with one relation initial vector;
taking each relationship initial vector of the target relationship and the corresponding first sub-vector as a sub-cluster class to obtain n first sub-cluster classes;
calculating a second sub-centroid vector of each first sub-cluster class based on a mean principle;
calculating Euclidean distances between each relationship initial vector of the target relationship and n second sub-centroid vectors, and determining a second sub-vector of each relationship initial vector of the target relationship, wherein the second sub-vector is a second sub-centroid vector with the minimum Euclidean distance from one relationship initial vector;
classifying each relationship initial vector of the target relationship and the corresponding second sub-vector into a cluster class to obtain n second sub-cluster classes;
calculating a third sub-centroid vector of each second sub-cluster class based on a mean principle;
repeatedly calculating the M +1 sub-centroid vector of each Mth sub-cluster until the Mth sub-centroid vector is the same as the M +1 sub-centroid vector, taking n Mth sub-clusters as the subclasses of the target relationship, taking the M +1 sub-centroid vector as the relationship sub-vector of the target relationship, and taking M as an integer greater than 1.
5. The method of knowledge-graph relationship alignment of claim 1, wherein said calculating a relationship similarity between any two relationships based on said relationship initial vector, said relationship cluster vector and said relationship sub-vector comprises:
generating a relationship vector of the relationship in each triplet through a preset formulaWherein,the relationship vector is represented by a vector of the relationship,representing the set of relational cluster vectors in a cluster of,the initial vector of said relationship is represented by,representing the relationship subvectors;
6. The relation alignment method of the knowledge-graph according to any one of claims 1-5, wherein the classifying the relation according to the relation similarity comprises:
determining relationship similarity results among all relationships according to the relationship similarity, wherein the relationship similarity results comprise similarity and dissimilarity;
if the relationship similarity results of the relationship pairs to be classified are similar, and the relationship pairs to be classified are any two relationships, bisecting the relationship pairs to be classified into a cluster;
and if the similar results of the relations of the relation pairs to be classified are not similar, keeping the original clustering of each relation in the relation pairs to be classified.
7. The relation alignment method of knowledge-graph according to claim 6, wherein said determining relation similarity results between all relations according to said relation similarity comprises:
judging whether the relationship similarity of the relationship pairs to be classified exceeds a preset threshold value or not;
if the relationship similarity of the relationship pair to be classified exceeds a preset threshold, determining that the relationship pair to be classified is similar;
and if the relationship similarity of the relationship pair to be classified does not exceed the preset threshold, determining that the relationship pair to be classified is not similar.
8. A relationship alignment apparatus for a knowledge graph, comprising:
the acquisition unit is used for acquiring triple data in the knowledge graph, wherein the triple comprises a main body, a relation and an attribute;
the conversion unit is used for converting the triple data through a preset vector conversion model to obtain a relationship initial vector of each relationship;
the first clustering unit is used for clustering all the relation initial vectors through a preset clustering algorithm to obtain a relation clustering vector of each relation;
the second clustering unit is used for clustering the initial relationship vectors of the same relationship in the triple data through the preset clustering algorithm to obtain a relationship sub-vector of each relationship;
the calculating unit is used for calculating the relationship similarity between the relationships in any two triples based on the relationship initial vector, the relationship clustering vector and the relationship sub-vector;
and the classification unit is used for classifying the relationship according to the relationship similarity.
9. A relation alignment apparatus of a knowledge graph, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the relation alignment method of a knowledge graph according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium comprising instructions that when executed on a computer cause the computer to perform a method of relationship alignment of a knowledge-graph as claimed in any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910992324.0A CN111026865B (en) | 2019-10-18 | 2019-10-18 | Knowledge graph relationship alignment method, device, equipment and storage medium |
PCT/CN2019/119305 WO2021072891A1 (en) | 2019-10-18 | 2019-11-19 | Knowledge graph relationship alignment method, apparatus and device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910992324.0A CN111026865B (en) | 2019-10-18 | 2019-10-18 | Knowledge graph relationship alignment method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111026865A true CN111026865A (en) | 2020-04-17 |
CN111026865B CN111026865B (en) | 2023-07-21 |
Family
ID=70200915
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910992324.0A Active CN111026865B (en) | 2019-10-18 | 2019-10-18 | Knowledge graph relationship alignment method, device, equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111026865B (en) |
WO (1) | WO2021072891A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149400A (en) * | 2020-09-23 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
WO2022222226A1 (en) * | 2021-04-19 | 2022-10-27 | 平安科技(深圳)有限公司 | Structured-information-based relation alignment method and apparatus, and device and medium |
WO2024167029A1 (en) * | 2023-02-06 | 2024-08-15 | 엘지전자 주식회사 | Background knowledge structure alignment method and device |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113282764B (en) * | 2021-06-29 | 2023-05-23 | 南方电网科学研究院有限责任公司 | Method and device for constructing network security data knowledge graph |
CN117370583B (en) * | 2023-12-08 | 2024-03-19 | 湘江实验室 | Knowledge-graph entity alignment method and system based on generation of countermeasure network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107665252A (en) * | 2017-09-27 | 2018-02-06 | 深圳证券信息有限公司 | A kind of method and device of creation of knowledge collection of illustrative plates |
CN109783582A (en) * | 2018-12-04 | 2019-05-21 | 平安科技(深圳)有限公司 | A kind of knowledge base alignment schemes, device, computer equipment and storage medium |
CN109885692A (en) * | 2019-01-11 | 2019-06-14 | 平安科技(深圳)有限公司 | Knowledge data storage method, device, computer equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101983455B1 (en) * | 2017-09-21 | 2019-05-28 | 숭실대학교산학협력단 | Knowledge Base completion method and server |
CN107943874B (en) * | 2017-11-13 | 2019-08-23 | 平安科技(深圳)有限公司 | Knowledge mapping processing method, device, computer equipment and storage medium |
-
2019
- 2019-10-18 CN CN201910992324.0A patent/CN111026865B/en active Active
- 2019-11-19 WO PCT/CN2019/119305 patent/WO2021072891A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107665252A (en) * | 2017-09-27 | 2018-02-06 | 深圳证券信息有限公司 | A kind of method and device of creation of knowledge collection of illustrative plates |
CN109783582A (en) * | 2018-12-04 | 2019-05-21 | 平安科技(深圳)有限公司 | A kind of knowledge base alignment schemes, device, computer equipment and storage medium |
CN109885692A (en) * | 2019-01-11 | 2019-06-14 | 平安科技(深圳)有限公司 | Knowledge data storage method, device, computer equipment and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149400A (en) * | 2020-09-23 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
CN112149400B (en) * | 2020-09-23 | 2021-07-27 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
WO2022222226A1 (en) * | 2021-04-19 | 2022-10-27 | 平安科技(深圳)有限公司 | Structured-information-based relation alignment method and apparatus, and device and medium |
WO2024167029A1 (en) * | 2023-02-06 | 2024-08-15 | 엘지전자 주식회사 | Background knowledge structure alignment method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2021072891A1 (en) | 2021-04-22 |
CN111026865B (en) | 2023-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111026865A (en) | Relation alignment method, device and equipment of knowledge graph and storage medium | |
JP5503046B2 (en) | Image search based on shape | |
US8781255B2 (en) | Methods and apparatus for visual search | |
US10579661B2 (en) | System and method for machine learning and classifying data | |
CN108804641A (en) | A kind of computational methods of text similarity, device, equipment and storage medium | |
US8849030B2 (en) | Image retrieval using spatial bag-of-features | |
CN109697451B (en) | Similar image clustering method and device, storage medium and electronic equipment | |
Hahsler et al. | Introduction to stream: An extensible framework for data stream clustering research with R | |
CN111046186A (en) | Entity alignment method, device and equipment of knowledge graph and storage medium | |
JP6905603B2 (en) | Image retrieval methods, devices, equipment and readable storage media | |
TW202029079A (en) | Method and device for identifying irregular group | |
WO2018134964A1 (en) | Image search system, image search method, and program | |
WO2023108995A1 (en) | Vector similarity calculation method and apparatus, device and storage medium | |
JP6863926B2 (en) | Data analysis system and data analysis method | |
JP2005011042A (en) | Data search method, device and program and computer readable recoring medium | |
CN110825894A (en) | Data index establishing method, data index retrieving method, data index establishing device, data index retrieving device, data index establishing equipment and storage medium | |
WO2023221790A1 (en) | Image encoder training method and apparatus, device, and medium | |
WO2015001416A1 (en) | Multi-dimensional data clustering | |
JP2012079187A (en) | Feature vector generating device, feature vector generating method and program therefor | |
CN107315984B (en) | Pedestrian retrieval method and device | |
Sewisy et al. | Fast efficient clustering algorithm for balanced data | |
CN115730116A (en) | Data retrieval method and related equipment | |
CN110209895B (en) | Vector retrieval method, device and equipment | |
WO2021000674A1 (en) | Cell image recognition method and system, computer device, and readable storage medium | |
EP4163803A1 (en) | Sample data annotation system, method, and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |