CN111881290A - Distribution network multi-source grid entity fusion method based on weighted semantic similarity - Google Patents
Distribution network multi-source grid entity fusion method based on weighted semantic similarity Download PDFInfo
- Publication number
- CN111881290A CN111881290A CN202010555531.2A CN202010555531A CN111881290A CN 111881290 A CN111881290 A CN 111881290A CN 202010555531 A CN202010555531 A CN 202010555531A CN 111881290 A CN111881290 A CN 111881290A
- Authority
- CN
- China
- Prior art keywords
- ontology
- distribution network
- method based
- heterogeneous
- mapping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009826 distribution Methods 0.000 title claims abstract description 22
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 52
- 238000013507 mapping Methods 0.000 claims abstract description 41
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims description 43
- 230000004927 fusion Effects 0.000 claims description 32
- 230000010354 integration Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 description 36
- 238000005516 engineering process Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 238000007726 management method Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000007499 fusion processing Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000012952 Resampling Methods 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a distribution network multi-source network frame entity fusion method based on weighted semantic similarity, which comprises the following steps: the method comprises the following steps: extracting knowledge of the multi-source net rack to obtain a plurality of heterogeneous bodies; step two: searching the relation among a plurality of heterogeneous ontologies, establishing corresponding mapping, fusing the heterogeneous ontologies to form a plurality of knowledge graph ontology models; step three: fusing a plurality of knowledge graph body models by using a weighting algorithm; step four: and obtaining a fused result. And finally generating the final distribution network rack through the steps.
Description
Technical Field
The invention relates to a distribution network multi-source net rack entity fusion method based on weighted semantic similarity.
Background
Due to the lack of an overall planning design and a transverse communication mechanism of the power business system, the problems of mutual isolation of functional processes of all business systems, multi-head input of basic data, non-uniform data standard and the like exist, so that the problem of weak cross-functional and cross-department transverse business process management of a power supply enterprise is highlighted. By utilizing a distribution network multi-source network frame entity fusion technology based on weighted semantic similarity, a semantic-based data fusion model is established on an original data storage model, data barriers are shielded on an application layer, a cross-department, cross-professional and cross-field integrated data resource system is formed, data collection, fusion and sharing can be promoted, and the data service capability of an enterprise is enhanced, so that the application level of data analysis and the value of big data are improved, management and service promotion are promoted, and powerful support is provided for developing value-added services.
Disclosure of Invention
In order to solve the problems of multi-head input of basic data, non-uniform data standard and the like and the problem of weak cross-functional and cross-department transverse business process management of a power supply enterprise, the invention adopts a data processing method to solve the problems. The specific implementation steps are as follows:
the method comprises the following steps: and (5) extracting knowledge.
Knowledge extraction extracts three major parts respectively:
1. entity extraction
Entity extraction is to identify and extract entities from information sources, and is the most basic and critical part in information extraction.
Methods of entity extraction are generally divided into three types:
1.1 rules and dictionary based approach: under the conditions of defining text fields and semantic unit types, a rule and dictionary-based method is mainly adopted, for example, defined rules are used for extracting distribution network entities, place names, organization names, specific time, faults and other entities in texts.
1.2 statistical machine learning based method: a supervised learning algorithm in machine learning is used for the extraction of named entities, the performance of the simple supervised learning algorithm is limited by a training set, and the accuracy and the recall rate of the algorithm are not ideal. Recognizing the restrictive nature of the supervised learning algorithm, the supervised learning algorithm is combined with the rules.
1.3 extraction method facing to open domain: the open domain clustering algorithm of unsupervised learning has the basic idea that named entities are identified in a search log based on semantic features of known entities and then are clustered.
2. Relationship extraction
The relation extraction is to extract the relation between entities from an information source to solve the problem of semantic connection between the entities, and is generally divided into supervised learning extraction and semi-supervised learning extraction.
And (3) supervised learning: the relationship set in supervised learning relationship extraction is usually determined, and the relationship extraction process only needs to be treated as a simple classification problem. The accuracy of a supervised learning model under high-quality supervised data is high, but the method has the defects that a large amount of labor cost and time cost are needed for labeling text data, new relation categories are difficult to expand, the model is fragile, and the generalization capability is limited. Semi-supervised learning: and extracting a large number of new instances from the unstructured data to form new training data by using a small amount of marking information as a seed template. The main method comprises the following steps: the Bootstrap algorithm has the core idea and basic steps as follows:
(1) a resampling technique is used to extract a certain number (freely set) of samples from the original samples, a process that allows for resampling.
(2) The statistic T is calculated from the extracted samples.
(3) This is repeated N times (typically greater than 1000) to obtain the statistic T.
(4) And calculating the sample variance of the N statistics T to obtain the variance of the statistics.
3. Attribute extraction
The characteristics and properties of the entities in the information source are extracted, and the attributes of the entities can be regarded as a part-of-speech relationship between the entities and the attributes, so that the attribute extraction problem can also be regarded as a relationship extraction problem.
In the invention, the processed data mainly come from structured data of the full-service unified data center and are extracted in a template mode. Since in the definition of the onto-model,
the mapping of entities, attributes, relationships and source systems has been set, so that the extraction script of the structured data can be written at the same time, and the structured form of the relational data can be stored.
Step two: and fusing the bodies.
The invention adopts a method of comprehensively utilizing ontology mapping and ontology integration;
1. global ontology-local ontology integration
Consistent, approved knowledge between different systems is first extracted, called the global ontology. The knowledge unique to each system itself is retained, called local ontology. A mapping between the global ontology and the local ontology is established. The process is as follows: 1, importing an ontology to be mapped, 2, finding mapping: based on the natural language processing technology, the similarity between the mapping objects is compared, the similarity of the structure is found, and the mapping between the ontologies is searched by utilizing the technologies such as machine learning and the like. Thereby covering individual services throughout the system.
2. Mapping between local ontologies
And searching the relation among the local ontologies by using a concept similarity related algorithm, a character string-based method and a language-based method, and establishing a mapping rule among the ontologies according to the relation.
3. Rational representation mapping
Ontology mapping, meaning that there are two ontologies A, B, for each concept in ontology A an attempt is made to find a semantically identical or similar corresponding concept for it in ontology B, and so on for each concept or node in ontology B. The most important process of mapping is thus the discovery of semantic associations.
Step three: example fusion.
Two kinds of algorithms of alignment of paired entities and alignment of cooperative entities are comprehensively adopted. The paired entity alignment judges whether two entities are in the same physical phenomenon, and specifically judges the alignment degree of the two entities by judging attributes; the cooperative entity alignment is that the alignment between different entities is considered to be influenced mutually, and a global optimal result is obtained by coordinating the matching conditions between different objects, namely finding a common point between different entities.
1. Paired entity alignment
Pairwise entity alignment is based on a knowledge base, which is a six-tuple of a set of instances, literal quantities, a collection of relationships and attributes, relationship facts, and attribute facts. The alignment of the entities is according to a specific formula to obtain a calculated value, wherein the calculated value is a numerical value describing the similarity size, and the larger the value is, the closer the two entities are. That is, the method for calculating the alignment result can be simply described as: given two knowledge bases and a group of priori aligned data, entity matching calculation is carried out under the common control of optional adjusting parameters and a series of related external resources, and finally an alignment result is obtained.
2. Entity similarity and relationship similarity
An intuitive aligned classification method is: and correspondingly assigning different weights to each matched attribute to show the importance of the matched attribute to the alignment result, respectively assigning different weights to the attribute of the entity and the attribute of the entity related to the entity, and weighting and summing the attributes to calculate the overall similarity. Setting a similarity threshold value, and judging the result of comparing the total entity similarity score with the similarity threshold value.
3. Feature matching based on similarity functions
And converting the character strings to be matched into a set of a series of sub strings by using a function, namely a marking function of the function, and calculating according to a weighted similarity to obtain the weighted similarity.
3.1 Token-based similarity function
And converting the matched text character strings into a set of a series of sub strings by using a function, and calling the sub strings as tokens. Commonly used token-based similarity functions are the Jaccard similarity function and the cosine similarity function.
The similarity function based on the Jaccard coefficient is characterized in that the set intersection operation is order-independent, so the order of different tokens has no influence on the measurement result.
Cosine similarity also has the advantage of order independence of token-based similarity functions, and simultaneously, because of the added weight, the similarity degree of tokens can be better reflected.
3.2 similarity function based on edit distance
The similarity function based on the editing distance considers the text strings to be matched as a whole, and the minimum cost of editing operation required for converting one character string into another character string is used as the measurement for measuring the similarity of the two character strings. Common editing distance-based similarity functions are Levenshtein distance-based, Smith-Waterman distance-based, Jaro-and Jaro-Winkler distance-based similarity functions.
Given two strings s1And s2The Levenshtein distance between them equals s1Conversion to s2The minimum number of insertion, deletion and replacement operations required. The similarity function based on the Levenshtein distance may reduce the error sensitivity of the similarity matching.
The invention utilizes a similarity calculation based on weighting to calculate the real similarity of the name and the attribute of two entities in the neo4j gallery, namely the similarity after weighting.
Due to the lack of an overall planning design and a transverse communication mechanism of the power business system, the problems of mutual isolation of functional processes of all business systems, multi-head input of basic data, non-uniform data standard and the like exist, so that the problem of weak cross-functional and cross-department transverse business process management of a power supply enterprise is highlighted. By utilizing a distribution network multi-source network frame entity fusion technology based on weighted semantic similarity, a semantic-based data fusion model is established on an original data storage model, data barriers are shielded on an application layer, a cross-department, cross-professional and cross-field integrated data resource system is formed, data collection, fusion and sharing can be promoted, and the data service capability of an enterprise is enhanced, so that the application level of data analysis and the value of big data are improved, management and service promotion are promoted, and powerful support is provided for developing value-added services.
Drawings
FIG. 1 is a similarity technique flow diagram of the present invention;
FIG. 2 is a multi-source grid entity fusion diagram of the present invention;
FIG. 3 is a diagram of a body model component of the present invention.
Detailed Description
As shown in fig. 3: the invention only needs to solve the technical problem of fusing three systems of feeder line, transformer and network frame relationship and constructing a knowledge network frame.
Example 1:
extracting knowledge of the multi-source net rack to obtain a plurality of heterogeneous bodies; the method comprises the following specific steps: entity extraction, relationship extraction and attribute extraction; wherein:
the specific steps of knowledge extraction are as follows:
1. knowledge extraction
Knowledge extraction (Knowledge extraction) is the step 1 of Knowledge graph construction, and the key problems are as follows: how to automatically extract knowledge from heterogeneous data sources to get candidate pointing units? Knowledge extraction is a technique for automatically extracting structured knowledge such as entities, relationships, and entity attributes from semi-structured and unstructured data.
The purpose of knowledge extraction is to extract knowledge from data from different sources and different structures and store the extracted knowledge into a knowledge graph, and the knowledge extraction method is an important technology for realizing automatic construction of a large-scale knowledge graph. Entity extraction refers to automatic extraction from a data set to an entity. The quality of entity extraction has great influence on the subsequent knowledge acquisition efficiency and quality, and is therefore the most basic and key part in knowledge extraction.
The knowledge extraction is divided into three steps:
and (7) extracting entities. The entity extraction is the entity extraction of the formulation unit in the semi-structured data and the unstructured data.
And (9) extracting the relationship. After the entities are extracted, in order to obtain semantic information, the association relationship between the entities needs to be extracted from the related data, and the entities (concepts) are linked through the association relationship, so that a mesh knowledge structure can be formed.
And extracting attributes. The purpose of attribute extraction is to collect attribute information of a specific entity from different power grid information sources. For example, for a certain transformer, information such as the transformer identifier, the city to which the transformer belongs, and the name of the power supply unit of the transformer can be obtained from different information sources. The attribute extraction technology can collect the information from various data sources, and complete delineation of entity attributes is achieved.
In the technology, the processed data mainly come from the structured data of the full-service unified data center and are extracted in a template mode. Because the mapping of the entity, the attribute, the relation and the source system is set when the ontology model is defined, a structured data extraction script can be written according to the mapping, and the relational data is stored in a graph structure.
Example 2
The ontology fusion means that a global ontology is obtained first, and the mapping relation between each local ontology and the global ontology is searched. The ontology fusion refers to merging of heterogeneous ontologies obtained in example 1. The mapping relation finding and printing method comprises the following three steps: firstly, the method comprises the following steps: importing a theme to be mapped; II, secondly: discovering the mapping; thirdly, the method comprises the following steps: the mapping is represented.
The specific steps and methods of ontology fusion are as follows:
a common method to achieve ontology fusion is ontology integration and ontology mapping. Ontology integration directly merges a plurality of ontologies into one large ontology, and ontology mapping seeks a mapping rule among ontologies, and the two methods can eliminate the heterogeneity among ontologies.
The technology comprehensively utilizes a method of ontology mapping and ontology integration, and integrates the three established three-system ontologies to form a unified distribution network frame knowledge graph ontology model as a specification of knowledge storage.
1. Global ontology-local ontology based integration
The method firstly extracts common knowledge among the heterogeneous ontologies, and accordingly a global ontology is established. The global ontology describes knowledge that is consistently recognized among the various systems. Meanwhile, the ontology of each system can retain own unique knowledge, which is called as a local ontology. And finally, establishing mapping from the global ontology to knowledge of each local ontology, so that all knowledge in the ontologies of each business system can be covered.
2. Ontology mapping
The process of ontology mapping can be mainly divided into three steps:
the first step is as follows: and importing the ontology to be mapped. It is ensured that the components of the ontology that need to be mapped can be easily obtained.
The second step is that: a mapping is discovered. And searching for the relation between heterogeneous ontologies by using a concept similarity related algorithm, and then establishing a mapping rule between the ontologies according to the relation. To improve the accuracy of the mapping result, this step often requires manual intervention.
The third step: the mapping is represented. When mappings between ontologies are found, these mappings need to be represented reasonably.
It can be seen that the focus of the ontology mapping is to find the mapping. The present technique employs ontology mapping based on terminology and structure in conjunction with reality. The method starts from the terms of each system ontology, compares names, labels or comments related to ontology components, finds similarity among heterogeneous ontologies, and mainly utilizes a character string-based method and a language-based method.
Example 3:
example fusion means that the ontology models of the knowledge graph are fused by using a weighting algorithm. The essence of the entity fusion algorithm is a process of judging whether instance data from different knowledge maps describe the same objective physical object, entity fusion is also called entity alignment, and the technology mainly researches alignment of cross-system entities based on information such as entity attributes, entity relationships and the like in the distribution network single system network rack knowledge map.
The instance fusion process is similar to the ontology fusion process, but instance fusion is usually a large-scale data processing problem, and the time complexity and the space complexity need to be considered in the fusion process. The technology comprehensively utilizes two different algorithms of paired entity alignment and collaborative entity alignment. The paired entity alignment means that whether two entities correspond to the same physical object is independently judged, and the alignment degree of the two entities is judged by matching the characteristics of entity attributes and the like. The coordination entity alignment considers that the alignment between different entities is mutually influenced, and a global optimal alignment result is achieved by coordinating the matching conditions between different objects.
1. Principle of pairwise entity alignment algorithm
Before describing the specific principles of the algorithm, the definition of the knowledge base is explained first.
A knowledge base is a six-membered group consisting of: KB ═ I (I, L, R, P, FR, FP). Wherein, I, L, R and P are respectively 1 group of examples, literal quantity, relationship and attribute set;is a relationship fact that an SPO triple represents an object as an instance;is an attribute fact that an SPO triplet represents an object as a literal.
The formalization of entity alignment is defined as:
Alignentity(KB1,KB2)={(e1,e2,con)|e1∈KB1,e2∈,con∈[0,1]}
wherein con is a numerical value describing the similarity of the entities, and the larger con is, the more similar two entities are.
The process of aligning two knowledge base entities can be described simply as: given two knowledge bases and a group of priori aligned data, entity matching calculation is carried out under the common control of optional adjusting parameters and a series of related external resources, and finally an alignment result is obtained.
2. Entity similarity and relationship similarity
The probabilistic model based alignment method is a method of pairwise comparison based on attribute similarity, which does not consider the relationship between matching entities. The entity matching problem based on attribute similarity scores may be translated into a classification problem. An intuitive entity alignment classification method is to add similarity scores of all matching attributes, then set a similarity threshold, and judge the result of comparison between the total entity similarity score and the similarity threshold, which can be expressed in a formalized way as follows:
wherein e is1,e2Is an entity pair to be matched; t is a similarity threshold.
One of the main problems of this method is that the influence of different attributes on the final similarity is not reflected. An important solution is to assign different weights to each matching attribute pair to reflect its importance to the alignment result: defining two knowledge bases A and B, e to be matchediAnd ejTwo disjoint sets M and U are defined for the entities in A and B, respectively
M={(ei,ej)|ei=ej,ei∈A,ej∈B}
U={(ei,ej)|ei≠ej,ei∈A,ej∈B}
Defining a comparison vector x*For the vectors formed by all matched attributes of the entities to be matched, the comparison space X is all possible X*The space formed; defining the ratio of two conditional probabilities R ═ P (x)*∈X|M)/P(x*E X | U), the decision of the matching result can be expressed as:
on the assumption of comparing vector x*Under the condition that the attributes in (1) are independent of each other, the weight of the attribute is:
wherein, aiAnd biFor the i-th attribute, m, of the pair of entities to be matchediTo assume the probability that two entities are identical with their ith attribute value equal, uiThe probability that two entities are not the same that their ith attribute values are equal is assumed. Based on these two probability values, the weight ω of the ith attribute can be calculatediComprises the following steps:
the relation between the entities in the knowledge base has important significance for entity alignment, and the matching accuracy and recall rate can be effectively improved. The local entity alignment method based on the simple relationship respectively assigns different weights to the attributes of the entity and the attributes of the entity related to the entity, and calculates the overall similarity by weighted summation, which can be expressed in a formalization mode as follows:
sim(e1,e2)=αsimattr(e1,e2)+(1-α)simNB(e1,e2)
3. feature matching based on similarity functions
(1) Token-based similarity function
The similarity function based on Token converts the matched text character string into a set of a series of sub-strings by using a certain function, the sub-strings are called Token, and the function is called a labeling function and is called Token (). Commonly used token-based similarity functions are the Jaccard similarity function and the cosine similarity function.
The Jaccard coefficient is equal to the ratio of the intersection and union of the two sets, and can be used for measuring the correlation of the two sets. The calculation method is as follows:
the similarity function based on the Jaccard coefficient is characterized in that the set intersection operation is order-independent, so the order of different tokens has no influence on the measurement result.
Cosine similarity is that token sets of two text character strings are regarded as two n-dimensional vectors, and the similarity degree of the character strings represented by the two vectors is evaluated by calculating cosine values of included angles of the two vectors. The weight w of token in each vector is typically calculated using the tf-idf model, two strings s1And s2The vector of the corresponding document is represented as<w11,w12,…,w1n>,<w21,w22,…,w2n>Then s1And s2The cosine similarity of (c) can be expressed as:
wherein,
cosine similarity also has the advantage of order independence of token-based similarity functions, and simultaneously, because of the added weight, the similarity degree of tokens can be better reflected.
(2) Edit distance based similarity function
Unlike token-based similarity functions, the edit distance-based similarity function treats the text strings to be matched as a whole, and takes the minimum cost of an editing operation required for converting one string into another as a measure for measuring the similarity of the two strings. Basic editing operations include insert, delete, replace, swap locations, and the like. The similarity function based on the editing distance can effectively process error sensitivity problems such as entry errors and the like. Common editing distance-based similarity functions are Levenshtein distance-based, Smith-Waterman distance-based, Jaro-and Jaro-Winkler distance-based similarity functions.
Given two strings s1And s2The Levenshtein distance between them equals s1Conversion to s2The minimum number of insertion, deletion and replacement operations required. The similarity function based on the Levenshtein distance may reduce the error sensitivity of the similarity matching.
The similarity distance of the two character strings can be obtained when the similarity is calculated by the method, and the true similarity of the names and the attributes of the two entities in the neo4j gallery, namely the weighted similarity, is calculated by utilizing similarity calculation based on weighting. Fig. 1 is a brief summary of the algorithm.
Example 4:
the technology invents a corresponding knowledge fusion algorithm, the algorithm takes three systems of knowledge maps as input, and entities with the same type in the three systems of knowledge maps are subjected to fusion calculation through a distribution network entity semantic fusion algorithm to construct a uniform distribution network frame knowledge map.
The fusion effect of the different systems is as follows: wherein:
cms _ equip _ id: marketing service application system transformer id;
cms _ tran _ name: marketing service application system transformer name;
pms _ obj _ id: an equipment (asset) operation and maintenance lean management system transformer id;
pms _ tran _ name: the name of a transformer of the equipment (asset) operation and maintenance lean management system;
gis _ oid: a geographic information system transformer id;
gis _ tran _ name: geographical information system transformer name;
and the form fuses the marketing service application system, the marketing service application system and the geographic information system according to the fusion method, and the form is the obtained fusion result.
In the technology, the fusion refers to semantic fusion of ontology models of a marketing business application system, an equipment (asset) operation and maintenance lean management system and a geographic information system. FIG. 2 shows a specific fusion step. The ontology models of the three systems are respectively constructed, and different description modes can be defined for the same attribute, so that an ontology model fusion function is developed. The fusion function can automatically complete the fusion of the three-system ontology models to a certain extent, and supports the user to modify the fusion result so as to improve the accuracy of the fusion. The fused ontology model is a storage template of final knowledge graph instance data, and the quality of the ontology model directly influences the application effect of the graph.
The invention provides a distribution network multi-source network frame entity fusion method based on weighted semantic similarity, which improves the efficiency of marketing and distribution through work and ensures the reliability of entities and relations. Compared with the traditional method, the matching is checked manually instead of automatically. Meanwhile, the method introduces a similarity meter algorithm, a NLP natural language processing machine learning algorithm and the like, and improves the matching accuracy. In addition, the technology promotes the data collection, fusion and sharing, and enhances the enterprise data service capability, thereby improving the data analysis application level and the big data value. The data can be operated conveniently, and the fusion effect of the multi-source network rack entity of the distribution network is improved.
Claims (8)
1. A distribution network multi-source grid entity fusion method based on weighted semantic similarity is characterized by comprising the following steps:
the method comprises the following steps: carrying out knowledge extraction on the net racks of a plurality of different sources to obtain a plurality of heterogeneous bodies;
step two: searching the relation among a plurality of heterogeneous ontologies, establishing corresponding mapping, fusing the heterogeneous ontologies to form a plurality of knowledge graph ontology models;
step three: fusing a plurality of knowledge graph body models by using a weighting algorithm;
step four: and obtaining a fused result.
2. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 1, characterized in that: the knowledge extraction in the first step comprises the following steps: and (4) entity extraction, relation extraction and attribute extraction, and performing knowledge extraction on the multi-source network frame according to the sequence.
3. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 1, characterized in that: and the ontology fusion in the step two is to obtain a global ontology by adopting an ontology integration method and then obtain the mapping relation between the single heterogeneous ontology and the global ontology.
4. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 3, characterized in that: the ontology integration is to eliminate the isomerism among a plurality of heterogeneous ontologies and directly combine the heterogeneous ontologies into a global ontology.
5. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 3, characterized in that: the mapping relation is obtained by three steps: and importing an ontology to be mapped, finding a mapping and representing the mapping.
6. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 5, characterized in that: the discovery mapping is to discover the relationship between the global ontology and the single heterogeneous ontology, compare names, labels and comments of the single heterogeneous ontology and the global ontology by using a character string-based and language-based method based on the attributes of the single heterogeneous ontology, find the similarity between the single heterogeneous ontology and the global ontology, obtain the mapping relationship between the single heterogeneous ontology and the global ontology, and further obtain the knowledge graph ontology model.
7. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 1, characterized in that: the weighting algorithm in the third step is an alignment method based on a probability model, and different weights are distributed to the attributes obtained when the attributes are extracted in the knowledge extraction.
8. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 7, characterized in that: the weights are calculated for each attribute based on the Ttf-idf model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010555531.2A CN111881290A (en) | 2020-06-17 | 2020-06-17 | Distribution network multi-source grid entity fusion method based on weighted semantic similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010555531.2A CN111881290A (en) | 2020-06-17 | 2020-06-17 | Distribution network multi-source grid entity fusion method based on weighted semantic similarity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111881290A true CN111881290A (en) | 2020-11-03 |
Family
ID=73157632
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010555531.2A Pending CN111881290A (en) | 2020-06-17 | 2020-06-17 | Distribution network multi-source grid entity fusion method based on weighted semantic similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111881290A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112528043A (en) * | 2020-12-18 | 2021-03-19 | 中国南方电网有限责任公司 | Power grid maintenance single structured storage method and system based on knowledge graph |
CN112635078A (en) * | 2020-11-06 | 2021-04-09 | 辽宁工程技术大学 | Traditional Chinese medicine knowledge graph construction and visualization method |
CN112966027A (en) * | 2021-03-22 | 2021-06-15 | 青岛科技大学 | Entity association mining method based on dynamic probe |
CN113010688A (en) * | 2021-03-05 | 2021-06-22 | 北京信息科技大学 | Knowledge graph construction method, device and equipment and computer readable storage medium |
CN113159320A (en) * | 2021-03-08 | 2021-07-23 | 北京航空航天大学 | Scientific and technological resource data integration method and device based on knowledge graph |
CN113360668A (en) * | 2021-06-03 | 2021-09-07 | 中国电力科学研究院有限公司 | Unified data model construction method, system, terminal device and readable storage medium |
CN113609086A (en) * | 2021-07-31 | 2021-11-05 | 云南电网有限责任公司信息中心 | Method for constructing unified power grid network frame data sharing pool based on weight dynamic adjustment |
CN113705236A (en) * | 2021-04-02 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Entity comparison method, device, equipment and computer readable storage medium |
CN115329158A (en) * | 2022-10-17 | 2022-11-11 | 湖南能源大数据中心有限责任公司 | Data association method based on multi-source heterogeneous power data |
CN115544276A (en) * | 2022-12-01 | 2022-12-30 | 南方电网数字电网研究院有限公司 | Metering device knowledge graph construction method and metering device archive checking method |
CN116304115A (en) * | 2023-05-19 | 2023-06-23 | 中央军委后勤保障部信息中心 | Knowledge-graph-based material matching and replacing method and device |
CN116541472A (en) * | 2023-03-22 | 2023-08-04 | 麦博(上海)健康科技有限公司 | Knowledge graph construction method in medical field |
CN117110798A (en) * | 2023-10-25 | 2023-11-24 | 国网江苏省电力有限公司苏州供电分公司 | Fault detection method and system for intelligent power distribution network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250412A (en) * | 2016-07-22 | 2016-12-21 | 浙江大学 | The knowledge mapping construction method merged based on many source entities |
CN109033129A (en) * | 2018-06-04 | 2018-12-18 | 桂林电子科技大学 | Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method |
CN110674311A (en) * | 2019-09-05 | 2020-01-10 | 国家电网有限公司 | Knowledge graph-based power asset heterogeneous data fusion method |
-
2020
- 2020-06-17 CN CN202010555531.2A patent/CN111881290A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250412A (en) * | 2016-07-22 | 2016-12-21 | 浙江大学 | The knowledge mapping construction method merged based on many source entities |
CN109033129A (en) * | 2018-06-04 | 2018-12-18 | 桂林电子科技大学 | Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method |
CN110674311A (en) * | 2019-09-05 | 2020-01-10 | 国家电网有限公司 | Knowledge graph-based power asset heterogeneous data fusion method |
Non-Patent Citations (2)
Title |
---|
HAPPYGRIL3: "知识图谱融合_本体概念层的融合方法与技术", pages 1 - 2, Retrieved from the Internet <URL:https://www.cnblogs.com/hapyygril/p/11983228.html> * |
庄严,李国良,冯建华: "知识库实体对齐技术综述", 计算机研究与发展, no. 53, pages 65 - 192 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112635078A (en) * | 2020-11-06 | 2021-04-09 | 辽宁工程技术大学 | Traditional Chinese medicine knowledge graph construction and visualization method |
CN112528043A (en) * | 2020-12-18 | 2021-03-19 | 中国南方电网有限责任公司 | Power grid maintenance single structured storage method and system based on knowledge graph |
CN113010688A (en) * | 2021-03-05 | 2021-06-22 | 北京信息科技大学 | Knowledge graph construction method, device and equipment and computer readable storage medium |
CN113159320A (en) * | 2021-03-08 | 2021-07-23 | 北京航空航天大学 | Scientific and technological resource data integration method and device based on knowledge graph |
CN112966027B (en) * | 2021-03-22 | 2022-10-21 | 青岛科技大学 | Entity association mining method based on dynamic probe |
CN112966027A (en) * | 2021-03-22 | 2021-06-15 | 青岛科技大学 | Entity association mining method based on dynamic probe |
CN113705236B (en) * | 2021-04-02 | 2024-06-11 | 腾讯科技(深圳)有限公司 | Entity comparison method, device, equipment and computer readable storage medium |
CN113705236A (en) * | 2021-04-02 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Entity comparison method, device, equipment and computer readable storage medium |
CN113360668A (en) * | 2021-06-03 | 2021-09-07 | 中国电力科学研究院有限公司 | Unified data model construction method, system, terminal device and readable storage medium |
CN113609086A (en) * | 2021-07-31 | 2021-11-05 | 云南电网有限责任公司信息中心 | Method for constructing unified power grid network frame data sharing pool based on weight dynamic adjustment |
CN115329158A (en) * | 2022-10-17 | 2022-11-11 | 湖南能源大数据中心有限责任公司 | Data association method based on multi-source heterogeneous power data |
CN115544276A (en) * | 2022-12-01 | 2022-12-30 | 南方电网数字电网研究院有限公司 | Metering device knowledge graph construction method and metering device archive checking method |
CN116541472A (en) * | 2023-03-22 | 2023-08-04 | 麦博(上海)健康科技有限公司 | Knowledge graph construction method in medical field |
CN116541472B (en) * | 2023-03-22 | 2024-07-30 | 麦博(上海)健康科技有限公司 | Knowledge graph construction method in medical field |
CN116304115A (en) * | 2023-05-19 | 2023-06-23 | 中央军委后勤保障部信息中心 | Knowledge-graph-based material matching and replacing method and device |
CN116304115B (en) * | 2023-05-19 | 2023-08-11 | 中央军委后勤保障部信息中心 | Knowledge-graph-based material matching and replacing method and device |
CN117110798B (en) * | 2023-10-25 | 2024-02-13 | 国网江苏省电力有限公司苏州供电分公司 | Fault detection method and system for intelligent power distribution network |
CN117110798A (en) * | 2023-10-25 | 2023-11-24 | 国网江苏省电力有限公司苏州供电分公司 | Fault detection method and system for intelligent power distribution network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111881290A (en) | Distribution network multi-source grid entity fusion method based on weighted semantic similarity | |
CN116628172B (en) | Dialogue method for multi-strategy fusion in government service field based on knowledge graph | |
CN108804521B (en) | Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system | |
US10725836B2 (en) | Intent-based organisation of APIs | |
CN110609902B (en) | Text processing method and device based on fusion knowledge graph | |
CN104933164B (en) | In internet mass data name entity between relationship extracting method and its system | |
CN112131449B (en) | Method for realizing cultural resource cascade query interface based on ElasticSearch | |
CN110619051B (en) | Question sentence classification method, device, electronic equipment and storage medium | |
WO2018151856A1 (en) | Intelligent matching system with ontology-aided relation extraction | |
CN105893611B (en) | Method for constructing interest topic semantic network facing social network | |
CN110633365A (en) | Word vector-based hierarchical multi-label text classification method and system | |
CN110097278B (en) | Intelligent sharing and fusion training system and application system for scientific and technological resources | |
KR20060045783A (en) | Mining service requests for product support | |
CN110633366A (en) | Short text classification method, device and storage medium | |
CN116628173B (en) | Intelligent customer service information generation system and method based on keyword extraction | |
CN113254630B (en) | Domain knowledge map recommendation method for global comprehensive observation results | |
CN116127090B (en) | Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction | |
CN113761208A (en) | Scientific and technological innovation information classification method and storage device based on knowledge graph | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN117744784B (en) | Medical scientific research knowledge graph construction and intelligent retrieval method and system | |
CN117973519A (en) | Knowledge graph-based data processing method | |
CN115905705A (en) | Industrial algorithm model recommendation method based on industrial big data | |
CN113032353A (en) | Data sharing method, system, electronic device and medium | |
CN118445406A (en) | Integration system based on massive polymorphic circuit heritage information | |
CN117010373A (en) | Recommendation method for category and group to which asset management data of power equipment belong |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |