[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111881290A - Distribution network multi-source grid entity fusion method based on weighted semantic similarity - Google Patents

Distribution network multi-source grid entity fusion method based on weighted semantic similarity Download PDF

Info

Publication number
CN111881290A
CN111881290A CN202010555531.2A CN202010555531A CN111881290A CN 111881290 A CN111881290 A CN 111881290A CN 202010555531 A CN202010555531 A CN 202010555531A CN 111881290 A CN111881290 A CN 111881290A
Authority
CN
China
Prior art keywords
ontology
distribution network
method based
heterogeneous
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010555531.2A
Other languages
Chinese (zh)
Inventor
秦丹丹
郑高峰
刘丽
李龙跃
王鑫
张淑娟
赵龙
汪玉
高博
徐斌
李金中
王潇
孙伟
李博
卞真旭
仇茹嘉
钱光超
邵珺伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
State Grid Anhui Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
State Grid Anhui Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd, State Grid Anhui Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202010555531.2A priority Critical patent/CN111881290A/en
Publication of CN111881290A publication Critical patent/CN111881290A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distribution network multi-source network frame entity fusion method based on weighted semantic similarity, which comprises the following steps: the method comprises the following steps: extracting knowledge of the multi-source net rack to obtain a plurality of heterogeneous bodies; step two: searching the relation among a plurality of heterogeneous ontologies, establishing corresponding mapping, fusing the heterogeneous ontologies to form a plurality of knowledge graph ontology models; step three: fusing a plurality of knowledge graph body models by using a weighting algorithm; step four: and obtaining a fused result. And finally generating the final distribution network rack through the steps.

Description

Distribution network multi-source grid entity fusion method based on weighted semantic similarity
Technical Field
The invention relates to a distribution network multi-source net rack entity fusion method based on weighted semantic similarity.
Background
Due to the lack of an overall planning design and a transverse communication mechanism of the power business system, the problems of mutual isolation of functional processes of all business systems, multi-head input of basic data, non-uniform data standard and the like exist, so that the problem of weak cross-functional and cross-department transverse business process management of a power supply enterprise is highlighted. By utilizing a distribution network multi-source network frame entity fusion technology based on weighted semantic similarity, a semantic-based data fusion model is established on an original data storage model, data barriers are shielded on an application layer, a cross-department, cross-professional and cross-field integrated data resource system is formed, data collection, fusion and sharing can be promoted, and the data service capability of an enterprise is enhanced, so that the application level of data analysis and the value of big data are improved, management and service promotion are promoted, and powerful support is provided for developing value-added services.
Disclosure of Invention
In order to solve the problems of multi-head input of basic data, non-uniform data standard and the like and the problem of weak cross-functional and cross-department transverse business process management of a power supply enterprise, the invention adopts a data processing method to solve the problems. The specific implementation steps are as follows:
the method comprises the following steps: and (5) extracting knowledge.
Knowledge extraction extracts three major parts respectively:
1. entity extraction
Entity extraction is to identify and extract entities from information sources, and is the most basic and critical part in information extraction.
Methods of entity extraction are generally divided into three types:
1.1 rules and dictionary based approach: under the conditions of defining text fields and semantic unit types, a rule and dictionary-based method is mainly adopted, for example, defined rules are used for extracting distribution network entities, place names, organization names, specific time, faults and other entities in texts.
1.2 statistical machine learning based method: a supervised learning algorithm in machine learning is used for the extraction of named entities, the performance of the simple supervised learning algorithm is limited by a training set, and the accuracy and the recall rate of the algorithm are not ideal. Recognizing the restrictive nature of the supervised learning algorithm, the supervised learning algorithm is combined with the rules.
1.3 extraction method facing to open domain: the open domain clustering algorithm of unsupervised learning has the basic idea that named entities are identified in a search log based on semantic features of known entities and then are clustered.
2. Relationship extraction
The relation extraction is to extract the relation between entities from an information source to solve the problem of semantic connection between the entities, and is generally divided into supervised learning extraction and semi-supervised learning extraction.
And (3) supervised learning: the relationship set in supervised learning relationship extraction is usually determined, and the relationship extraction process only needs to be treated as a simple classification problem. The accuracy of a supervised learning model under high-quality supervised data is high, but the method has the defects that a large amount of labor cost and time cost are needed for labeling text data, new relation categories are difficult to expand, the model is fragile, and the generalization capability is limited. Semi-supervised learning: and extracting a large number of new instances from the unstructured data to form new training data by using a small amount of marking information as a seed template. The main method comprises the following steps: the Bootstrap algorithm has the core idea and basic steps as follows:
(1) a resampling technique is used to extract a certain number (freely set) of samples from the original samples, a process that allows for resampling.
(2) The statistic T is calculated from the extracted samples.
(3) This is repeated N times (typically greater than 1000) to obtain the statistic T.
(4) And calculating the sample variance of the N statistics T to obtain the variance of the statistics.
3. Attribute extraction
The characteristics and properties of the entities in the information source are extracted, and the attributes of the entities can be regarded as a part-of-speech relationship between the entities and the attributes, so that the attribute extraction problem can also be regarded as a relationship extraction problem.
In the invention, the processed data mainly come from structured data of the full-service unified data center and are extracted in a template mode. Since in the definition of the onto-model,
the mapping of entities, attributes, relationships and source systems has been set, so that the extraction script of the structured data can be written at the same time, and the structured form of the relational data can be stored.
Step two: and fusing the bodies.
The invention adopts a method of comprehensively utilizing ontology mapping and ontology integration;
1. global ontology-local ontology integration
Consistent, approved knowledge between different systems is first extracted, called the global ontology. The knowledge unique to each system itself is retained, called local ontology. A mapping between the global ontology and the local ontology is established. The process is as follows: 1, importing an ontology to be mapped, 2, finding mapping: based on the natural language processing technology, the similarity between the mapping objects is compared, the similarity of the structure is found, and the mapping between the ontologies is searched by utilizing the technologies such as machine learning and the like. Thereby covering individual services throughout the system.
2. Mapping between local ontologies
And searching the relation among the local ontologies by using a concept similarity related algorithm, a character string-based method and a language-based method, and establishing a mapping rule among the ontologies according to the relation.
3. Rational representation mapping
Ontology mapping, meaning that there are two ontologies A, B, for each concept in ontology A an attempt is made to find a semantically identical or similar corresponding concept for it in ontology B, and so on for each concept or node in ontology B. The most important process of mapping is thus the discovery of semantic associations.
Step three: example fusion.
Two kinds of algorithms of alignment of paired entities and alignment of cooperative entities are comprehensively adopted. The paired entity alignment judges whether two entities are in the same physical phenomenon, and specifically judges the alignment degree of the two entities by judging attributes; the cooperative entity alignment is that the alignment between different entities is considered to be influenced mutually, and a global optimal result is obtained by coordinating the matching conditions between different objects, namely finding a common point between different entities.
1. Paired entity alignment
Pairwise entity alignment is based on a knowledge base, which is a six-tuple of a set of instances, literal quantities, a collection of relationships and attributes, relationship facts, and attribute facts. The alignment of the entities is according to a specific formula to obtain a calculated value, wherein the calculated value is a numerical value describing the similarity size, and the larger the value is, the closer the two entities are. That is, the method for calculating the alignment result can be simply described as: given two knowledge bases and a group of priori aligned data, entity matching calculation is carried out under the common control of optional adjusting parameters and a series of related external resources, and finally an alignment result is obtained.
2. Entity similarity and relationship similarity
An intuitive aligned classification method is: and correspondingly assigning different weights to each matched attribute to show the importance of the matched attribute to the alignment result, respectively assigning different weights to the attribute of the entity and the attribute of the entity related to the entity, and weighting and summing the attributes to calculate the overall similarity. Setting a similarity threshold value, and judging the result of comparing the total entity similarity score with the similarity threshold value.
3. Feature matching based on similarity functions
And converting the character strings to be matched into a set of a series of sub strings by using a function, namely a marking function of the function, and calculating according to a weighted similarity to obtain the weighted similarity.
3.1 Token-based similarity function
And converting the matched text character strings into a set of a series of sub strings by using a function, and calling the sub strings as tokens. Commonly used token-based similarity functions are the Jaccard similarity function and the cosine similarity function.
The similarity function based on the Jaccard coefficient is characterized in that the set intersection operation is order-independent, so the order of different tokens has no influence on the measurement result.
Cosine similarity also has the advantage of order independence of token-based similarity functions, and simultaneously, because of the added weight, the similarity degree of tokens can be better reflected.
3.2 similarity function based on edit distance
The similarity function based on the editing distance considers the text strings to be matched as a whole, and the minimum cost of editing operation required for converting one character string into another character string is used as the measurement for measuring the similarity of the two character strings. Common editing distance-based similarity functions are Levenshtein distance-based, Smith-Waterman distance-based, Jaro-and Jaro-Winkler distance-based similarity functions.
Given two strings s1And s2The Levenshtein distance between them equals s1Conversion to s2The minimum number of insertion, deletion and replacement operations required. The similarity function based on the Levenshtein distance may reduce the error sensitivity of the similarity matching.
The invention utilizes a similarity calculation based on weighting to calculate the real similarity of the name and the attribute of two entities in the neo4j gallery, namely the similarity after weighting.
Due to the lack of an overall planning design and a transverse communication mechanism of the power business system, the problems of mutual isolation of functional processes of all business systems, multi-head input of basic data, non-uniform data standard and the like exist, so that the problem of weak cross-functional and cross-department transverse business process management of a power supply enterprise is highlighted. By utilizing a distribution network multi-source network frame entity fusion technology based on weighted semantic similarity, a semantic-based data fusion model is established on an original data storage model, data barriers are shielded on an application layer, a cross-department, cross-professional and cross-field integrated data resource system is formed, data collection, fusion and sharing can be promoted, and the data service capability of an enterprise is enhanced, so that the application level of data analysis and the value of big data are improved, management and service promotion are promoted, and powerful support is provided for developing value-added services.
Drawings
FIG. 1 is a similarity technique flow diagram of the present invention;
FIG. 2 is a multi-source grid entity fusion diagram of the present invention;
FIG. 3 is a diagram of a body model component of the present invention.
Detailed Description
As shown in fig. 3: the invention only needs to solve the technical problem of fusing three systems of feeder line, transformer and network frame relationship and constructing a knowledge network frame.
Example 1:
extracting knowledge of the multi-source net rack to obtain a plurality of heterogeneous bodies; the method comprises the following specific steps: entity extraction, relationship extraction and attribute extraction; wherein:
the specific steps of knowledge extraction are as follows:
1. knowledge extraction
Knowledge extraction (Knowledge extraction) is the step 1 of Knowledge graph construction, and the key problems are as follows: how to automatically extract knowledge from heterogeneous data sources to get candidate pointing units? Knowledge extraction is a technique for automatically extracting structured knowledge such as entities, relationships, and entity attributes from semi-structured and unstructured data.
The purpose of knowledge extraction is to extract knowledge from data from different sources and different structures and store the extracted knowledge into a knowledge graph, and the knowledge extraction method is an important technology for realizing automatic construction of a large-scale knowledge graph. Entity extraction refers to automatic extraction from a data set to an entity. The quality of entity extraction has great influence on the subsequent knowledge acquisition efficiency and quality, and is therefore the most basic and key part in knowledge extraction.
The knowledge extraction is divided into three steps:
and (7) extracting entities. The entity extraction is the entity extraction of the formulation unit in the semi-structured data and the unstructured data.
And (9) extracting the relationship. After the entities are extracted, in order to obtain semantic information, the association relationship between the entities needs to be extracted from the related data, and the entities (concepts) are linked through the association relationship, so that a mesh knowledge structure can be formed.
And extracting attributes. The purpose of attribute extraction is to collect attribute information of a specific entity from different power grid information sources. For example, for a certain transformer, information such as the transformer identifier, the city to which the transformer belongs, and the name of the power supply unit of the transformer can be obtained from different information sources. The attribute extraction technology can collect the information from various data sources, and complete delineation of entity attributes is achieved.
In the technology, the processed data mainly come from the structured data of the full-service unified data center and are extracted in a template mode. Because the mapping of the entity, the attribute, the relation and the source system is set when the ontology model is defined, a structured data extraction script can be written according to the mapping, and the relational data is stored in a graph structure.
Example 2
The ontology fusion means that a global ontology is obtained first, and the mapping relation between each local ontology and the global ontology is searched. The ontology fusion refers to merging of heterogeneous ontologies obtained in example 1. The mapping relation finding and printing method comprises the following three steps: firstly, the method comprises the following steps: importing a theme to be mapped; II, secondly: discovering the mapping; thirdly, the method comprises the following steps: the mapping is represented.
The specific steps and methods of ontology fusion are as follows:
a common method to achieve ontology fusion is ontology integration and ontology mapping. Ontology integration directly merges a plurality of ontologies into one large ontology, and ontology mapping seeks a mapping rule among ontologies, and the two methods can eliminate the heterogeneity among ontologies.
The technology comprehensively utilizes a method of ontology mapping and ontology integration, and integrates the three established three-system ontologies to form a unified distribution network frame knowledge graph ontology model as a specification of knowledge storage.
1. Global ontology-local ontology based integration
The method firstly extracts common knowledge among the heterogeneous ontologies, and accordingly a global ontology is established. The global ontology describes knowledge that is consistently recognized among the various systems. Meanwhile, the ontology of each system can retain own unique knowledge, which is called as a local ontology. And finally, establishing mapping from the global ontology to knowledge of each local ontology, so that all knowledge in the ontologies of each business system can be covered.
2. Ontology mapping
The process of ontology mapping can be mainly divided into three steps:
the first step is as follows: and importing the ontology to be mapped. It is ensured that the components of the ontology that need to be mapped can be easily obtained.
The second step is that: a mapping is discovered. And searching for the relation between heterogeneous ontologies by using a concept similarity related algorithm, and then establishing a mapping rule between the ontologies according to the relation. To improve the accuracy of the mapping result, this step often requires manual intervention.
The third step: the mapping is represented. When mappings between ontologies are found, these mappings need to be represented reasonably.
It can be seen that the focus of the ontology mapping is to find the mapping. The present technique employs ontology mapping based on terminology and structure in conjunction with reality. The method starts from the terms of each system ontology, compares names, labels or comments related to ontology components, finds similarity among heterogeneous ontologies, and mainly utilizes a character string-based method and a language-based method.
Example 3:
example fusion means that the ontology models of the knowledge graph are fused by using a weighting algorithm. The essence of the entity fusion algorithm is a process of judging whether instance data from different knowledge maps describe the same objective physical object, entity fusion is also called entity alignment, and the technology mainly researches alignment of cross-system entities based on information such as entity attributes, entity relationships and the like in the distribution network single system network rack knowledge map.
The instance fusion process is similar to the ontology fusion process, but instance fusion is usually a large-scale data processing problem, and the time complexity and the space complexity need to be considered in the fusion process. The technology comprehensively utilizes two different algorithms of paired entity alignment and collaborative entity alignment. The paired entity alignment means that whether two entities correspond to the same physical object is independently judged, and the alignment degree of the two entities is judged by matching the characteristics of entity attributes and the like. The coordination entity alignment considers that the alignment between different entities is mutually influenced, and a global optimal alignment result is achieved by coordinating the matching conditions between different objects.
1. Principle of pairwise entity alignment algorithm
Before describing the specific principles of the algorithm, the definition of the knowledge base is explained first.
A knowledge base is a six-membered group consisting of: KB ═ I (I, L, R, P, FR, FP). Wherein, I, L, R and P are respectively 1 group of examples, literal quantity, relationship and attribute set;
Figure BDA0002544087810000101
is a relationship fact that an SPO triple represents an object as an instance;
Figure BDA0002544087810000102
is an attribute fact that an SPO triplet represents an object as a literal.
The formalization of entity alignment is defined as:
Alignentity(KB1,KB2)={(e1,e2,con)|e1∈KB1,e2∈,con∈[0,1]}
wherein con is a numerical value describing the similarity of the entities, and the larger con is, the more similar two entities are.
The process of aligning two knowledge base entities can be described simply as: given two knowledge bases and a group of priori aligned data, entity matching calculation is carried out under the common control of optional adjusting parameters and a series of related external resources, and finally an alignment result is obtained.
2. Entity similarity and relationship similarity
The probabilistic model based alignment method is a method of pairwise comparison based on attribute similarity, which does not consider the relationship between matching entities. The entity matching problem based on attribute similarity scores may be translated into a classification problem. An intuitive entity alignment classification method is to add similarity scores of all matching attributes, then set a similarity threshold, and judge the result of comparison between the total entity similarity score and the similarity threshold, which can be expressed in a formalized way as follows:
Figure BDA0002544087810000111
wherein e is1,e2Is an entity pair to be matched; t is a similarity threshold.
One of the main problems of this method is that the influence of different attributes on the final similarity is not reflected. An important solution is to assign different weights to each matching attribute pair to reflect its importance to the alignment result: defining two knowledge bases A and B, e to be matchediAnd ejTwo disjoint sets M and U are defined for the entities in A and B, respectively
M={(ei,ej)|ei=ej,ei∈A,ej∈B}
U={(ei,ej)|ei≠ej,ei∈A,ej∈B}
Defining a comparison vector x*For the vectors formed by all matched attributes of the entities to be matched, the comparison space X is all possible X*The space formed; defining the ratio of two conditional probabilities R ═ P (x)*∈X|M)/P(x*E X | U), the decision of the matching result can be expressed as:
Figure BDA0002544087810000121
on the assumption of comparing vector x*Under the condition that the attributes in (1) are independent of each other, the weight of the attribute is:
Figure BDA0002544087810000122
wherein, aiAnd biFor the i-th attribute, m, of the pair of entities to be matchediTo assume the probability that two entities are identical with their ith attribute value equal, uiThe probability that two entities are not the same that their ith attribute values are equal is assumed. Based on these two probability values, the weight ω of the ith attribute can be calculatediComprises the following steps:
Figure BDA0002544087810000123
the relation between the entities in the knowledge base has important significance for entity alignment, and the matching accuracy and recall rate can be effectively improved. The local entity alignment method based on the simple relationship respectively assigns different weights to the attributes of the entity and the attributes of the entity related to the entity, and calculates the overall similarity by weighted summation, which can be expressed in a formalization mode as follows:
sim(e1,e2)=αsimattr(e1,e2)+(1-α)simNB(e1,e2)
3. feature matching based on similarity functions
(1) Token-based similarity function
The similarity function based on Token converts the matched text character string into a set of a series of sub-strings by using a certain function, the sub-strings are called Token, and the function is called a labeling function and is called Token (). Commonly used token-based similarity functions are the Jaccard similarity function and the cosine similarity function.
The Jaccard coefficient is equal to the ratio of the intersection and union of the two sets, and can be used for measuring the correlation of the two sets. The calculation method is as follows:
Figure BDA0002544087810000131
the similarity function based on the Jaccard coefficient is characterized in that the set intersection operation is order-independent, so the order of different tokens has no influence on the measurement result.
Cosine similarity is that token sets of two text character strings are regarded as two n-dimensional vectors, and the similarity degree of the character strings represented by the two vectors is evaluated by calculating cosine values of included angles of the two vectors. The weight w of token in each vector is typically calculated using the tf-idf model, two strings s1And s2The vector of the corresponding document is represented as<w11,w12,…,w1n>,<w21,w22,…,w2n>Then s1And s2The cosine similarity of (c) can be expressed as:
Figure BDA0002544087810000132
wherein,
Figure BDA0002544087810000133
Figure BDA0002544087810000134
cosine similarity also has the advantage of order independence of token-based similarity functions, and simultaneously, because of the added weight, the similarity degree of tokens can be better reflected.
(2) Edit distance based similarity function
Unlike token-based similarity functions, the edit distance-based similarity function treats the text strings to be matched as a whole, and takes the minimum cost of an editing operation required for converting one string into another as a measure for measuring the similarity of the two strings. Basic editing operations include insert, delete, replace, swap locations, and the like. The similarity function based on the editing distance can effectively process error sensitivity problems such as entry errors and the like. Common editing distance-based similarity functions are Levenshtein distance-based, Smith-Waterman distance-based, Jaro-and Jaro-Winkler distance-based similarity functions.
Given two strings s1And s2The Levenshtein distance between them equals s1Conversion to s2The minimum number of insertion, deletion and replacement operations required. The similarity function based on the Levenshtein distance may reduce the error sensitivity of the similarity matching.
The similarity distance of the two character strings can be obtained when the similarity is calculated by the method, and the true similarity of the names and the attributes of the two entities in the neo4j gallery, namely the weighted similarity, is calculated by utilizing similarity calculation based on weighting. Fig. 1 is a brief summary of the algorithm.
Example 4:
the technology invents a corresponding knowledge fusion algorithm, the algorithm takes three systems of knowledge maps as input, and entities with the same type in the three systems of knowledge maps are subjected to fusion calculation through a distribution network entity semantic fusion algorithm to construct a uniform distribution network frame knowledge map.
The fusion effect of the different systems is as follows: wherein:
cms _ equip _ id: marketing service application system transformer id;
cms _ tran _ name: marketing service application system transformer name;
pms _ obj _ id: an equipment (asset) operation and maintenance lean management system transformer id;
pms _ tran _ name: the name of a transformer of the equipment (asset) operation and maintenance lean management system;
gis _ oid: a geographic information system transformer id;
gis _ tran _ name: geographical information system transformer name;
Figure BDA0002544087810000151
Figure BDA0002544087810000161
and the form fuses the marketing service application system, the marketing service application system and the geographic information system according to the fusion method, and the form is the obtained fusion result.
In the technology, the fusion refers to semantic fusion of ontology models of a marketing business application system, an equipment (asset) operation and maintenance lean management system and a geographic information system. FIG. 2 shows a specific fusion step. The ontology models of the three systems are respectively constructed, and different description modes can be defined for the same attribute, so that an ontology model fusion function is developed. The fusion function can automatically complete the fusion of the three-system ontology models to a certain extent, and supports the user to modify the fusion result so as to improve the accuracy of the fusion. The fused ontology model is a storage template of final knowledge graph instance data, and the quality of the ontology model directly influences the application effect of the graph.
The invention provides a distribution network multi-source network frame entity fusion method based on weighted semantic similarity, which improves the efficiency of marketing and distribution through work and ensures the reliability of entities and relations. Compared with the traditional method, the matching is checked manually instead of automatically. Meanwhile, the method introduces a similarity meter algorithm, a NLP natural language processing machine learning algorithm and the like, and improves the matching accuracy. In addition, the technology promotes the data collection, fusion and sharing, and enhances the enterprise data service capability, thereby improving the data analysis application level and the big data value. The data can be operated conveniently, and the fusion effect of the multi-source network rack entity of the distribution network is improved.

Claims (8)

1. A distribution network multi-source grid entity fusion method based on weighted semantic similarity is characterized by comprising the following steps:
the method comprises the following steps: carrying out knowledge extraction on the net racks of a plurality of different sources to obtain a plurality of heterogeneous bodies;
step two: searching the relation among a plurality of heterogeneous ontologies, establishing corresponding mapping, fusing the heterogeneous ontologies to form a plurality of knowledge graph ontology models;
step three: fusing a plurality of knowledge graph body models by using a weighting algorithm;
step four: and obtaining a fused result.
2. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 1, characterized in that: the knowledge extraction in the first step comprises the following steps: and (4) entity extraction, relation extraction and attribute extraction, and performing knowledge extraction on the multi-source network frame according to the sequence.
3. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 1, characterized in that: and the ontology fusion in the step two is to obtain a global ontology by adopting an ontology integration method and then obtain the mapping relation between the single heterogeneous ontology and the global ontology.
4. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 3, characterized in that: the ontology integration is to eliminate the isomerism among a plurality of heterogeneous ontologies and directly combine the heterogeneous ontologies into a global ontology.
5. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 3, characterized in that: the mapping relation is obtained by three steps: and importing an ontology to be mapped, finding a mapping and representing the mapping.
6. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 5, characterized in that: the discovery mapping is to discover the relationship between the global ontology and the single heterogeneous ontology, compare names, labels and comments of the single heterogeneous ontology and the global ontology by using a character string-based and language-based method based on the attributes of the single heterogeneous ontology, find the similarity between the single heterogeneous ontology and the global ontology, obtain the mapping relationship between the single heterogeneous ontology and the global ontology, and further obtain the knowledge graph ontology model.
7. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 1, characterized in that: the weighting algorithm in the third step is an alignment method based on a probability model, and different weights are distributed to the attributes obtained when the attributes are extracted in the knowledge extraction.
8. The distribution network multi-source rack entity fusion method based on the weighted semantic similarity according to claim 7, characterized in that: the weights are calculated for each attribute based on the Ttf-idf model.
CN202010555531.2A 2020-06-17 2020-06-17 Distribution network multi-source grid entity fusion method based on weighted semantic similarity Pending CN111881290A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010555531.2A CN111881290A (en) 2020-06-17 2020-06-17 Distribution network multi-source grid entity fusion method based on weighted semantic similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010555531.2A CN111881290A (en) 2020-06-17 2020-06-17 Distribution network multi-source grid entity fusion method based on weighted semantic similarity

Publications (1)

Publication Number Publication Date
CN111881290A true CN111881290A (en) 2020-11-03

Family

ID=73157632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010555531.2A Pending CN111881290A (en) 2020-06-17 2020-06-17 Distribution network multi-source grid entity fusion method based on weighted semantic similarity

Country Status (1)

Country Link
CN (1) CN111881290A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528043A (en) * 2020-12-18 2021-03-19 中国南方电网有限责任公司 Power grid maintenance single structured storage method and system based on knowledge graph
CN112635078A (en) * 2020-11-06 2021-04-09 辽宁工程技术大学 Traditional Chinese medicine knowledge graph construction and visualization method
CN112966027A (en) * 2021-03-22 2021-06-15 青岛科技大学 Entity association mining method based on dynamic probe
CN113010688A (en) * 2021-03-05 2021-06-22 北京信息科技大学 Knowledge graph construction method, device and equipment and computer readable storage medium
CN113159320A (en) * 2021-03-08 2021-07-23 北京航空航天大学 Scientific and technological resource data integration method and device based on knowledge graph
CN113360668A (en) * 2021-06-03 2021-09-07 中国电力科学研究院有限公司 Unified data model construction method, system, terminal device and readable storage medium
CN113609086A (en) * 2021-07-31 2021-11-05 云南电网有限责任公司信息中心 Method for constructing unified power grid network frame data sharing pool based on weight dynamic adjustment
CN113705236A (en) * 2021-04-02 2021-11-26 腾讯科技(深圳)有限公司 Entity comparison method, device, equipment and computer readable storage medium
CN115329158A (en) * 2022-10-17 2022-11-11 湖南能源大数据中心有限责任公司 Data association method based on multi-source heterogeneous power data
CN115544276A (en) * 2022-12-01 2022-12-30 南方电网数字电网研究院有限公司 Metering device knowledge graph construction method and metering device archive checking method
CN116304115A (en) * 2023-05-19 2023-06-23 中央军委后勤保障部信息中心 Knowledge-graph-based material matching and replacing method and device
CN116541472A (en) * 2023-03-22 2023-08-04 麦博(上海)健康科技有限公司 Knowledge graph construction method in medical field
CN117110798A (en) * 2023-10-25 2023-11-24 国网江苏省电力有限公司苏州供电分公司 Fault detection method and system for intelligent power distribution network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250412A (en) * 2016-07-22 2016-12-21 浙江大学 The knowledge mapping construction method merged based on many source entities
CN109033129A (en) * 2018-06-04 2018-12-18 桂林电子科技大学 Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method
CN110674311A (en) * 2019-09-05 2020-01-10 国家电网有限公司 Knowledge graph-based power asset heterogeneous data fusion method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250412A (en) * 2016-07-22 2016-12-21 浙江大学 The knowledge mapping construction method merged based on many source entities
CN109033129A (en) * 2018-06-04 2018-12-18 桂林电子科技大学 Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method
CN110674311A (en) * 2019-09-05 2020-01-10 国家电网有限公司 Knowledge graph-based power asset heterogeneous data fusion method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAPPYGRIL3: "知识图谱融合_本体概念层的融合方法与技术", pages 1 - 2, Retrieved from the Internet <URL:https://www.cnblogs.com/hapyygril/p/11983228.html> *
庄严,李国良,冯建华: "知识库实体对齐技术综述", 计算机研究与发展, no. 53, pages 65 - 192 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112635078A (en) * 2020-11-06 2021-04-09 辽宁工程技术大学 Traditional Chinese medicine knowledge graph construction and visualization method
CN112528043A (en) * 2020-12-18 2021-03-19 中国南方电网有限责任公司 Power grid maintenance single structured storage method and system based on knowledge graph
CN113010688A (en) * 2021-03-05 2021-06-22 北京信息科技大学 Knowledge graph construction method, device and equipment and computer readable storage medium
CN113159320A (en) * 2021-03-08 2021-07-23 北京航空航天大学 Scientific and technological resource data integration method and device based on knowledge graph
CN112966027B (en) * 2021-03-22 2022-10-21 青岛科技大学 Entity association mining method based on dynamic probe
CN112966027A (en) * 2021-03-22 2021-06-15 青岛科技大学 Entity association mining method based on dynamic probe
CN113705236B (en) * 2021-04-02 2024-06-11 腾讯科技(深圳)有限公司 Entity comparison method, device, equipment and computer readable storage medium
CN113705236A (en) * 2021-04-02 2021-11-26 腾讯科技(深圳)有限公司 Entity comparison method, device, equipment and computer readable storage medium
CN113360668A (en) * 2021-06-03 2021-09-07 中国电力科学研究院有限公司 Unified data model construction method, system, terminal device and readable storage medium
CN113609086A (en) * 2021-07-31 2021-11-05 云南电网有限责任公司信息中心 Method for constructing unified power grid network frame data sharing pool based on weight dynamic adjustment
CN115329158A (en) * 2022-10-17 2022-11-11 湖南能源大数据中心有限责任公司 Data association method based on multi-source heterogeneous power data
CN115544276A (en) * 2022-12-01 2022-12-30 南方电网数字电网研究院有限公司 Metering device knowledge graph construction method and metering device archive checking method
CN116541472A (en) * 2023-03-22 2023-08-04 麦博(上海)健康科技有限公司 Knowledge graph construction method in medical field
CN116541472B (en) * 2023-03-22 2024-07-30 麦博(上海)健康科技有限公司 Knowledge graph construction method in medical field
CN116304115A (en) * 2023-05-19 2023-06-23 中央军委后勤保障部信息中心 Knowledge-graph-based material matching and replacing method and device
CN116304115B (en) * 2023-05-19 2023-08-11 中央军委后勤保障部信息中心 Knowledge-graph-based material matching and replacing method and device
CN117110798B (en) * 2023-10-25 2024-02-13 国网江苏省电力有限公司苏州供电分公司 Fault detection method and system for intelligent power distribution network
CN117110798A (en) * 2023-10-25 2023-11-24 国网江苏省电力有限公司苏州供电分公司 Fault detection method and system for intelligent power distribution network

Similar Documents

Publication Publication Date Title
CN111881290A (en) Distribution network multi-source grid entity fusion method based on weighted semantic similarity
CN116628172B (en) Dialogue method for multi-strategy fusion in government service field based on knowledge graph
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
US10725836B2 (en) Intent-based organisation of APIs
CN110609902B (en) Text processing method and device based on fusion knowledge graph
CN104933164B (en) In internet mass data name entity between relationship extracting method and its system
CN112131449B (en) Method for realizing cultural resource cascade query interface based on ElasticSearch
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
WO2018151856A1 (en) Intelligent matching system with ontology-aided relation extraction
CN105893611B (en) Method for constructing interest topic semantic network facing social network
CN110633365A (en) Word vector-based hierarchical multi-label text classification method and system
CN110097278B (en) Intelligent sharing and fusion training system and application system for scientific and technological resources
KR20060045783A (en) Mining service requests for product support
CN110633366A (en) Short text classification method, device and storage medium
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN113761208A (en) Scientific and technological innovation information classification method and storage device based on knowledge graph
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN117744784B (en) Medical scientific research knowledge graph construction and intelligent retrieval method and system
CN117973519A (en) Knowledge graph-based data processing method
CN115905705A (en) Industrial algorithm model recommendation method based on industrial big data
CN113032353A (en) Data sharing method, system, electronic device and medium
CN118445406A (en) Integration system based on massive polymorphic circuit heritage information
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination