[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109739939A - The data fusion method and device of knowledge mapping - Google Patents

The data fusion method and device of knowledge mapping Download PDF

Info

Publication number
CN109739939A
CN109739939A CN201811635696.XA CN201811635696A CN109739939A CN 109739939 A CN109739939 A CN 109739939A CN 201811635696 A CN201811635696 A CN 201811635696A CN 109739939 A CN109739939 A CN 109739939A
Authority
CN
China
Prior art keywords
entity
data
module
attribute
data platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811635696.XA
Other languages
Chinese (zh)
Inventor
刘涛
朱宏明
顾江
姜逸之
王晓文
周游
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yingtuo Information Technology (shanghai) Co Ltd
Original Assignee
Yingtuo Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yingtuo Information Technology (shanghai) Co Ltd filed Critical Yingtuo Information Technology (shanghai) Co Ltd
Priority to CN201811635696.XA priority Critical patent/CN109739939A/en
Publication of CN109739939A publication Critical patent/CN109739939A/en
Priority to PCT/CN2019/124552 priority patent/WO2020135048A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application provides a kind of data fusion method of knowledge mapping and devices, the system for executing the method includes the data platform configured with unified access interface, the described method includes: being converted to triplet format after the data from different data sources are handled, by the unified access interface storage to data platform, and receive the diagram data index information that the data platform returns;It by Attribute transposition is one or more child partitions by the entity stored in the data platform according to the diagram data index information;To the candidate entity being divided into identical child partition to similarity calculation is carried out, the matching entities pair for meeting default similarity condition are filtered out;The entity property value of the matching entities pair is supplemented and/or replaced, generating unified entity indicates.The application can effectively solve the problem of data fusion that available data integration technology is unable to flexible adaptation difference knowledge base by above-mentioned means.

Description

The data fusion method and device of knowledge mapping
Technical field
This application involves knowledge mapping technical fields, particularly, are related to the data fusion method and dress of a kind of knowledge mapping It sets.
Background technique
Knowledge mapping is a kind of one for describing various entities or concept and its relationship present in real world and constituting Huge semantic network figure, node presentation-entity or concept, side are then made of attribute or relationship.Present knowledge mapping by For referring to various large-scale knowledge bases.Wherein: entity refers to distinguishability and certain self-existent things, such as Some country, certain company, someone etc..Attribute refers to that the intrinsic characteristic of an entity, such as country have " population ", " face The different attributes (as shown in Figure 4) such as product ", company have the attributes such as " title ", " legal representative ".Relationship is an entity and another The linked character of one entity, for example some register of company, in some country, someone takes office in some company etc..
The node of knowledge mapping and side generally use the form of triple (S-P-O, Subject-Property-Object) Definition, including forms, the knowledge mapping such as (entity 1- relation-entity 2) and (entity-attribute-attribute value) can be expressed as ternary The set of group, can show as the form (as shown in Figure 4) of figure, and carry out data using chart database on data model Storage and management.
Knowledge Source is extensive in real world, knowledge very different, from different data sources that there are quality of knowledge repeats, The problems such as knowledge base hierarchical structure lacks;In addition, different data sources may have same entity the different representation of knowledge, than Such as, some corporate entity has name attribute ' Alibaba ' in Baidupedia, and certain grabbed from google search The name attribute of a corporate entity is ' alibaba ', the two entities are possible to be directed toward the same entity in real world, because This needs the relationship by their attribute and extension to merge into each other, to generate unique entity section in knowledge mapping Point, disambiguation generate the knowledge base of high quality.
Available data integration program generally comprise subregion index, similarity calculation and entity fusion etc. key steps, but When specific implementation corresponding partitioning algorithm, similarity mode algorithm and entity can be selected according to the characteristics of data source and knowledge base Alignment algorithm, and above scheme is integrated into a complete system, when the range of data source or knowledge base changes, it is New demand is adapted to, needs to rebuild data fusion system.
Summary of the invention
The application provides the data fusion method and device of a kind of knowledge mapping, for solving available data integration technology not The problem of data fusion of energy flexible adaptation difference knowledge base.
A kind of data fusion method of knowledge mapping disclosed in the present application, the system for executing the method include configured with system The data platform of one access interface, which comprises be converted to ternary after being handled the data from different data sources Group format by the unified access interface storage to data platform, and receives the diagram data index that the data platform returns Information;It by Attribute transposition is one or more by the entity stored in the data platform according to the diagram data index information Child partition;To the candidate entity being divided into identical child partition to similarity calculation is carried out, the default similarity item of satisfaction is filtered out The matching entities pair of part;The entity property value of the matching entities pair is supplemented and/or replaced, unified entity table is generated Show.
Preferably, in step according to the diagram data index information, by the entity stored in the data platform by attribute It is divided into before one or more child partitions, further includes: by the storage after being converted to triplet format from multiple data sources Entity in data platform is aligned according to the physical meaning of its attribute.
Preferably, the child partition division mode is that the globally unique subregion key generated according to entity attribute carries out equivalent draw Point, or divided based on default Clustering Model.
Preferably, it is default that satisfaction is filtered out to similarity calculation is carried out to the candidate entity being divided into identical child partition The matching entities pair of similarity condition, specifically: for the attribute of entity itself and the category of other entities relevant to the entity Property be respectively set different weights, weighted sum calculates the overall similarity of candidate entity pair;Candidate in child partition if they are the same The overall similarity of entity pair is more than default similarity threshold, then by candidate's entity to as matching entities pair.
Preferably, the method supplemented the entity property value of missing is to obtain or carried out manually from network by crawler Filling.
Preferably, the diagram data index information is storage address of the diagram data in the data platform of triplet format And its metadata.
A kind of data fusion device of knowledge mapping disclosed in the present application, including data platform, data preprocessing module, reality Body division module, Entities Matching module and entity Fusion Module, in which: the data platform is configured with unified access interface;Institute Data preprocessing module is stated for being converted to triplet format after being handled the data from different data sources, by described Unified access interface storage receives the diagram data index information that the data platform returns to data platform;The entity point The diagram data index information that area's module is exported according to the data preprocessing module, the entity stored in the data platform is pressed Attribute transposition is one or more child partitions;The Entities Matching module is used to the entity division module being divided into identical son Candidate entity in subregion filters out the matching entities pair for meeting default similarity condition to similarity calculation is carried out;The reality The entity property value for the matching entities pair that body Fusion Module is used to filter out the Entities Matching module is supplemented and/or is replaced It changes, generating unified entity indicates.
Preferably, the entity division module includes equivalent subregion submodule and/or cluster subregion submodule;The equivalence Subregion submodule is for carrying out the entity being stored in data platform according to the globally unique subregion key that entity attribute generates etc. Value divides;The cluster subregion submodule divides the entity being stored in data platform based on default Clustering Model.
Preferably, the Entities Matching module specifically includes similarity calculation submodule and Comparative sub-module;It is described similar Degree computational submodule is used to that difference to be respectively set for the attribute of entity itself and other entity attributes relevant to the entity Weight, weighted sum calculates the overall similarity of candidate entity pair;The Comparative sub-module is for judging in identical child partition The overall similarity of candidate entity pair whether be more than default similarity threshold, if so, by candidate's entity to as matching Entity pair.
Preferably, described device further includes data processing module and/or attribute alignment module;The data processing module is used In by the unified access interface in data platform node entities data and side solid data handle, and return to number Next module is passed to according to processing result;The attribute alignment module is used for will be from the pre- through the data of multiple data sources The entity being stored in data platform after processing module processing is aligned according to the physical meaning of its attribute.
Disclosed herein as well is a kind of storage mediums for being recorded on the program for executing the above method.
Compared with prior art, the application has the following advantages:
Each stage in the application preferred embodiment scheme has upstream and downstream dependence, but different phase on assembly line Between only by data format constrain, by data platform provide unified interface realize mutually decoupling, can stand-alone development complete. The algorithm in each stage itself can be replaced flexibly, and by realizing the customized stage, new stream can be inserted between different phase The journey stage freely works out customized demand.In addition, there is no limit for framework of the application to data platform, such as can adopt With Hadoop distributed file system or cloud computing framework, to facilitate, in the case where data volume increases, extension is calculated and storage provides Source.
Detailed description of the invention
The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as the limitation to the application.And whole In a attached drawing, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the flow diagram of the data fusion method first embodiment of the application knowledge mapping;
Fig. 2 is the flow diagram of the data fusion method second embodiment of the application knowledge mapping;
Fig. 3 is the structural schematic diagram of one embodiment of data fusion device of the application knowledge mapping;
Fig. 4 is the diagram data model schematic of knowledge mapping.
Specific embodiment
In order to make the above objects, features, and advantages of the present application more apparent, with reference to the accompanying drawing and it is specific real Applying mode, the present application will be further described in detail.
In the description of the present application, it is to be understood that term " first ", " second " are used for description purposes only, and cannot It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include one or more of the features.The meaning of " plurality " is two Or it is more than two, unless otherwise specifically defined.The terms "include", "comprise" and similar terms are understood to out The term of putting property, i.e., " including/including but not limited to ".Term "based" is " being based at least partially on ".Term " embodiment " It indicates " at least one embodiment ";Term " another embodiment " expression " at least one other embodiment ".The phase of other terms Pass definition provides in will be described below.
Referring to Fig.1, the process of the data fusion method first embodiment of the application knowledge mapping is shown, this method is executed The data platform of promising each stage offer running environment and computing resource is arranged in the system of embodiment, and each stage can pass through number Interaction is realized according to the unified access interface of platform.In the specific implementation, data platform can be constructed in Hadoop distributed document On system, cloud computing framework (such as Amazon AWS EMR) or other frameworks, in this regard, the application not limits.The method is implemented Example includes following several stages:
1. data preprocessing phase (InputStage): by (such as structural data A in multiple isomorphisms or heterogeneous data source The format (SPO format) of identical entity and its attribute is processed into the data of unstructured data B), as follow-up phase Input.
By configuring different data source information and data model, by data from data source extract, cleaning, deformation after with Unified data format stores on data platform.Such as relevant database data source, by configuring link information, reality Body type and entity table, relationship type and relation table, so that it may extract the SPO data of needs.For chart database, node (entity-attribute-attribute value) and side (entity-relationship-entity) are natural SPO structures.
The part configuration parameter of data preprocessing phase is as shown in the table.
Different data sources are located in advance when it is implemented, can be realized using customized (CustomInputStage) mode Reason, interface form are as follows:
It is configured defined in upper table by reading, realizes parsing, storing data after reading remote data.Such as to non-structural Change data source, machine learning interface, network interface etc. can be called to complete knowledge and extracted, saved into triplet information, returned and protect The address of deposit data and metadata information.
2. the entity subregion stage (BlockingStage):, according to its attribute, will be divided from the entity of multiple data sources To different child partitions (Block), to reduce the data scale of candidate matches pair.
For needing matched data source S and T, it is assumed that the solid data scale of data source S is m, the entity number of data source T It is n according to scale, needing to examine matched data scale is m*n.Under big data scene, this data scale is substantially nothing What method was realized, it is necessary to reduce and need matched data to scale.
When it is implemented, matched entity impossible in two data sources can be divided different data are divided into advance Qu Zhong substantially reduces the data scale in each data subregion, and multiple data subregions can be completed with parallel computation.
For example, being generally registered in the entity of country variant in real world for needing matched corporate entity in S and T It is unlikely to be same company, then can be attribute according to the state of company, it is divided into more than 220 (country) data Subregion.For each subregion, can continue to divide child partition further according to same or similar attribute.For example, in ' beauty Company below state ' subregion can continue to assign to new subregion according to identical ' state ' attribute.Matched data are finally needed to advise Mould is equal to the sum of all data subregions, and in subsequent calculating, all data subregions can be with parallel computation, so as to larger Reduce to degree the whole matching time.
The part configuration parameter in entity subregion stage is as shown in the table.
Furthermore it is possible to pass through the square partition of customized partitioning algorithm extension entity subregion stage (BlockingStage) Formula, for example, passing through following interface form:
Can the attribute according to used in subregion where current entity and next subzone generate globally unique subregion key (block key), so that data are divided into next subregion.When the possibility matching entities logarithm of the subregion reach minimum or When total number of partitions reaches maximum value, which does not continue to divide.
To the partitioning algorithm based on cluster, it can use trained Clustering Model and realize that interface form is as follows:
Clustering Model can directly predict current entity, and correspond in some class, at this time number of partitions etc. In the class quantity of Clustering Model.Certainly can also continue to divide subregion on the basis of cluster.
3. the Entities Matching stage (MatchStage):, can be according to entity itself for the candidate entity pair in same subregion Attribute and be respectively set different weights from its related entity attributes, and it is real by weighted sum to calculate the candidate The overall similarity of body pair;By the candidate entity more than certain similarity threshold to filtering out, as matching entities pair.
It should be noted that this process design allow to be inserted into it is some matching is done directly based on strongly connected rule, such as Company data in two data sources, if its be all listed company and list stock code it is identical, can be straight Matching is connect, so that the process of similarity calculation is skipped, to reduce the computation complexity of matching stage.
When providing validation data set, it can be compared, be tested with validation data set by the result that matching algorithm generates Demonstrate,prove the accuracy of matching algorithm.By adjusting attribute and weight parameter and similarity threshold, multiple comparison between calculation results, with Accuracy is continuously improved.Such as Liang Ge corporate entity by title and stock code Similarity-Weighted and compares, if title It being indicated in different data sources with different language, similarity weight is just lower, it needs to turn down its weight, and stock code Similarity relative weighting should set it is some higher.
The Entities Matching algorithm of the application can be by adjusting parameter successive ignition, to improve the accuracy of matching result.
The part configuration parameter in Entities Matching stage (MatchStage) is as shown in the table.
By customized Entities Matching algorithm, it can compare whether two entities are directed toward the same representation of knowledge.Interface Form is as follows:
In previous example, using two disaggregated model of machine learning trained in advance, with each attributes similarity of two entities Vector infers whether the probability that can be classified as the same entity as input (being then is 1).
Last matched entity will be to will be output in results set.
4. entity fusing stage (MergeStage): to the data in the different data sources for being actually pointed to same entity, root According to blending algorithm, entity property value is supplemented, replaced and is standardized, ultimately generating unified entity indicates.
Customized blending algorithm is generally required, interface form is as follows:
It is realized when data fusion in combination with different business rules, such as the settable multiple anonymities of title, mailbox, address etc. Standardized format can be used.And the attribute data of missing can be filled by crawler or manually, construct high quality Data, facilitate search, analysis of knowledge mapping etc. apply.
In a further embodiment, in addition to several stages defined above, the stage of different function can be entered with layout (such as data processing stage).The interface of following form can be used:
Data to be treated are transmitted by input configuration parameter, output are written after the completion of processing, and pass to down In one stage, realize the extension of system function.
The application realizes the universal pipeline (Pipeline) that entity merges under big data scene by above-mentioned means. Assembly line is made of multiple stages (Stage), each stage can by way of configuring flexible expansion, and can will make by oneself Adopted stage (CustomStage) is programmed into assembly line to adapt to different application scenarios.In addition to data preprocessing phase (InputStage) there was only output output, other each stages all have input input configuration.Input configuration may specify the rank Duan Yunhang needs the list of entities from different data sources, relation list, data address and the related data metamessage obtained (schema includes table name, column name etc.).Input data has been read until the stage, has run algorithm, has been written to data platform, and will All data addresses and metadata are exported by output.Therefore each stage can by input and output series operation, Input parameter can be individually specified to run.
Referring to Fig. 2, the process of the data fusion method second embodiment of the application knowledge mapping is shown, with above-mentioned first The difference of embodiment of the method is, increases an attribute align stage between data preprocessing phase and entity subregion stage (Attribute Matching): for will be from the entity root of multiple data sources being stored in data platform after pretreatment Be aligned according to the physical meaning of its attribute, such as by " Address " field of " address " field of data source A and data source B into Row alignment, the field being aligned in subsequent partitions and matching stage will be handled as the field of same meaning.
When it is implemented, the physical meaning of entity attribute can manually be set, it can also be by the way that one be arranged in systems The form of the attribute meaning table of comparisons is realized, in this regard, the application not limits.
Disclosed herein as well is a kind of storage mediums for being recorded on the program for executing the above method.It is described to deposit Storage media includes any mechanism being configured to by the readable form storage of computer (by taking computer as an example) or transmission information.Example Such as, storage medium includes read-only memory (ROM), random access memory (RAM), magnetic disk storage medium, optical storage media, sudden strain of a muscle Fast storage medium, electricity, light, sound or transmitting signal (for example, carrier wave, infrared signal, digital signal etc.) of other forms etc..
Referring to Fig. 3, the structural block diagram of one embodiment of data fusion device of the application knowledge mapping, including data are shown Platform 10, data preprocessing module 11, entity division module 12, Entities Matching module 13 and entity Fusion Module 14, in which:
Data platform 10 is configured with unified access interface, provides calculating and storage service for other modules.The application logarithm There is no limit can use for convenience of extension calculating and storage resource in the case where data volume increases framework according to platform Hadoop distributed file system or cloud computing framework.
Data preprocessing module 11 is for being converted to triple (S-P- after being handled the data from different data sources O) format by the unified access interface storage to data platform 10, and receives the diagram data index of the return of data platform 10 Information.Wherein, diagram data index information can be storage address and its member of the diagram data in data platform 10 of triplet format Data.
Entity division module 12 is used for according to the diagram data index information, is equalled data by the unified access interface The entity stored in platform 10 is one or more child partitions by Attribute transposition.When it is implemented, entity division module 12 can wrap Include the equivalence that the globally unique subregion key generated according to entity attribute carries out equivalent division to the entity being stored in data platform Subregion submodule, based on the cluster subregion submodule that default Clustering Model divides the entity being stored in data platform, And/or the submodule of other partitioned modes.
Entities Matching module 13 is used to screen the candidate entity being divided into identical child partition to similarity calculation is carried out Meet the matching entities pair of default similarity condition out.
Entity Fusion Module 14 is generated for the entity property value of the matching entities pair to be supplemented and/or replaced Unified entity indicates.
Each functional module of the application Installation practice on assembly line have upstream and downstream dependence, but disparate modules it Between only by data format constrain, by data platform provide unified interface realize mutually decoupling, can stand-alone development complete.Respectively Algorithm of module itself can be replaced flexibly, by the realization customized stage, can be inserted into new module between different modules, Freely work out customized demand.For example, in order to improve adaptability and following entities point to various different data sources Area, matching and fusion accuracy, can be inserted between data preprocessing module 11 and entity division module 12 attribute alignment Module 15, for the entity in data platform 10 will to be stored in after the processing of data preprocessing module 11 from different data sources It is aligned according to the physical meaning of its attribute.Such as by " Address " field of " address " field of data source A and data source B It is aligned, the field being aligned in subsequent partitions and matching stage will be handled as the field of same meaning.
In further preferred embodiment embodiment, Entities Matching module 13 can specifically include similarity calculation submodule And Comparative sub-module;Similarity calculation submodule therein be used for for entity itself attribute and it is relevant to the entity other Different weights is respectively set in entity attributes, and weighted sum calculates the overall similarity of candidate entity pair;Comparative sub-module is used In the overall similarity for judging the candidate entity pair in identical child partition whether be more than default similarity threshold, if so, should Candidate entity is to as matching entities pair.
In another preferred embodiment embodiment, described device can further include data processing module, for passing through the system One access interface in data platform node entities data and side solid data handle, and returned data processing result pass Pass next module.
Above-mentioned data processing module can be realized using following form:
Wherein, data to be treated are transmitted by input configuration parameter, is write the result into after the completion of data processing Output, and the functional module in next stage is passed to, the extension of realization device function.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.For the dress of the application For setting embodiment, since it is basically similar to the method embodiment, so being described relatively simple, related place is referring to method reality Apply the explanation of example part.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The module of explanation may or may not be physically separated, and both can be located in one place or may be distributed over In multiple network units.Some or all of the modules therein can be selected to realize this embodiment scheme according to the actual needs Purpose.Those of ordinary skill in the art can understand and implement without creative efforts.
Specific examples are used herein to illustrate the principle and implementation manner of the present application, and above embodiments are said It is bright to be merely used to help understand the present processes and its core concept;At the same time, for those skilled in the art, foundation The thought of the application, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not It is interpreted as the limitation to the application.

Claims (10)

1. a kind of data fusion method of knowledge mapping, which is characterized in that the system for executing the method includes configured with unified The data platform of access interface, which comprises
Triplet format is converted to after data from different data sources are handled, is stored by the unified access interface To data platform, and receive the diagram data index information that the data platform returns;
It is according to the diagram data index information, the entity stored in the data platform is sub for one or more by Attribute transposition Subregion;
To the candidate entity being divided into identical child partition to similarity calculation is carried out, filters out and meet default similarity condition Matching entities pair;
The entity property value of the matching entities pair is supplemented and/or replaced, generating unified entity indicates.
2., will be described the method according to claim 1, wherein in step according to the diagram data index information Before the entity stored in data platform is one or more child partitions by Attribute transposition, further includes: multiple data sources will be come from The entity being stored in data platform after triplet format that is converted to be aligned according to the physical meaning of its attribute.
3. the method according to claim 1, wherein the child partition division mode is to be generated according to entity attribute Globally unique subregion key carry out equivalent division, or divided based on default Clustering Model.
4. the method according to claim 1, wherein to the candidate entity being divided into identical child partition to progress Similarity calculation filters out the matching entities pair for meeting default similarity condition, specifically:
Different weights is respectively set for the attribute and other entity attributes relevant to the entity of entity itself, weighting is asked With the overall similarity for calculating candidate entity pair;
The overall similarity of the candidate entity pair in child partition is more than default similarity threshold if they are the same, then by candidate's entity pair As matching entities pair.
5. the method according to claim 1, wherein the method supplemented the entity property value of missing is logical Crawler is crossed to obtain from network or manually filled.
6. the method according to claim 1, wherein the diagram data index information is the figure number of triplet format According to the storage address and its metadata in the data platform.
7. a kind of data fusion device of knowledge mapping, which is characterized in that including data platform, data preprocessing module, entity Division module, Entities Matching module and entity Fusion Module, in which:
The data platform is configured with unified access interface;
The data preprocessing module is led to for being converted to triplet format after being handled the data from different data sources The unified access interface storage is crossed to data platform, and receives the diagram data index information that the data platform returns;
The diagram data index information that the entity division module is exported according to the data preprocessing module, by the data platform The entity of middle storage is one or more child partitions by Attribute transposition;
The Entities Matching module is used to the entity division module being divided into the candidate entity in identical child partition to progress Similarity calculation filters out the matching entities pair for meeting default similarity condition;
The entity property value for the matching entities pair that the entity Fusion Module is used to filter out the Entities Matching module carries out Supplement and/or replacement, generating unified entity indicates.
8. device according to claim 7, which is characterized in that the entity division module includes equivalent subregion submodule And/or cluster subregion submodule;
The equivalence subregion submodule is used for the globally unique subregion key that generates according to entity attribute to being stored in data platform Entity carry out equivalent division;
The cluster subregion submodule divides the entity being stored in data platform based on default Clustering Model;
The Entities Matching module specifically includes similarity calculation submodule and Comparative sub-module;
The attribute and other entity attributes relevant to the entity that the similarity calculation submodule is used for as entity itself Different weights is respectively set, weighted sum calculates the overall similarity of candidate entity pair;
The Comparative sub-module is used to judge whether the overall similarity of the candidate entity pair in identical child partition to be more than default phase Like degree threshold value, if so, by candidate's entity to as matching entities pair.
9. device according to claim 7, which is characterized in that described device further includes data processing module and/or attribute Alignment module;
The data processing module is used for through the unified access interface to the node entities data and side reality in data platform Volume data is handled, and returned data processing result passes to next module;
The attribute alignment module will be for that will be stored in number after data preprocessing module processing from multiple data sources It is aligned according to the entity in platform according to the physical meaning of its attribute.
10. a kind of storage medium, which is characterized in that the storage medium is stored with any described for perform claim requirement 1 ~ 6 Method program.
CN201811635696.XA 2018-12-29 2018-12-29 The data fusion method and device of knowledge mapping Pending CN109739939A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811635696.XA CN109739939A (en) 2018-12-29 2018-12-29 The data fusion method and device of knowledge mapping
PCT/CN2019/124552 WO2020135048A1 (en) 2018-12-29 2019-12-11 Data merging method and apparatus for knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811635696.XA CN109739939A (en) 2018-12-29 2018-12-29 The data fusion method and device of knowledge mapping

Publications (1)

Publication Number Publication Date
CN109739939A true CN109739939A (en) 2019-05-10

Family

ID=66362378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811635696.XA Pending CN109739939A (en) 2018-12-29 2018-12-29 The data fusion method and device of knowledge mapping

Country Status (2)

Country Link
CN (1) CN109739939A (en)
WO (1) WO2020135048A1 (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427415A (en) * 2019-08-02 2019-11-08 泰康保险集团股份有限公司 Knowledge share method, device, system media and electronic equipment
CN110532304A (en) * 2019-09-06 2019-12-03 京东城市(北京)数字科技有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN110580294A (en) * 2019-09-11 2019-12-17 腾讯科技(深圳)有限公司 Entity fusion method, device, equipment and storage medium
CN110598072A (en) * 2019-09-24 2019-12-20 恩亿科(北京)数据科技有限公司 Feature data aggregation method and device
CN110704635A (en) * 2019-09-16 2020-01-17 金色熊猫有限公司 Conversion method and device for ternary group data in knowledge graph
CN110826316A (en) * 2019-11-06 2020-02-21 北京交通大学 Method for identifying sensitive information applied to referee document
CN110929105A (en) * 2019-11-28 2020-03-27 杭州云徙科技有限公司 User ID (identity) association method based on big data technology
CN111026874A (en) * 2019-11-22 2020-04-17 海信集团有限公司 Data processing method and server of knowledge graph
CN111125376A (en) * 2019-12-23 2020-05-08 秒针信息技术有限公司 Knowledge graph generation method and device, data processing equipment and storage medium
WO2020114022A1 (en) * 2018-12-04 2020-06-11 平安科技(深圳)有限公司 Knowledge base alignment method and apparatus, computer device and storage medium
CN111291196A (en) * 2020-01-22 2020-06-16 腾讯科技(深圳)有限公司 Method and device for improving knowledge graph and method and device for processing data
WO2020135048A1 (en) * 2018-12-29 2020-07-02 颖投信息科技(上海)有限公司 Data merging method and apparatus for knowledge graph
CN111444351A (en) * 2020-03-24 2020-07-24 清华苏州环境创新研究院 Method and device for constructing knowledge graph in industrial process field
CN111475653A (en) * 2019-12-30 2020-07-31 北京国双科技有限公司 Method and device for constructing knowledge graph in oil and gas exploration and development field
CN111522803A (en) * 2020-04-14 2020-08-11 北京仁科互动网络技术有限公司 Tenant interaction method and device of software service platform and electronic equipment
CN111563133A (en) * 2020-05-06 2020-08-21 支付宝(杭州)信息技术有限公司 Method and system for data fusion based on entity relationship
CN111597239A (en) * 2020-04-10 2020-08-28 中科驭数(北京)科技有限公司 Data alignment method and device
CN112182330A (en) * 2020-09-23 2021-01-05 创新奇智(成都)科技有限公司 Knowledge graph construction method and device, electronic equipment and computer storage medium
WO2021082100A1 (en) * 2019-10-30 2021-05-06 平安科技(深圳)有限公司 Method and apparatus for aligning entities of knowledge graph, device, and storage medium
CN112906826A (en) * 2021-03-30 2021-06-04 平安科技(深圳)有限公司 Multi-dimension-based knowledge graph fusion method and device and computer equipment
CN113297213A (en) * 2021-04-29 2021-08-24 军事科学院系统工程研究院网络信息研究所 Dynamic multi-attribute matching method for entity object
CN113392227A (en) * 2021-05-31 2021-09-14 交控科技股份有限公司 Metadata knowledge map engine system facing rail transit field
CN113760995A (en) * 2021-09-09 2021-12-07 上海明略人工智能(集团)有限公司 Entity linking method, system, equipment and storage medium
CN113901264A (en) * 2021-11-12 2022-01-07 央视频融媒体发展有限公司 Method and system for matching periodic entities among movie and television attribute data sources
CN113934866A (en) * 2021-12-17 2022-01-14 鲁班(北京)电子商务科技有限公司 Commodity entity matching method and device based on set similarity
CN114282073A (en) * 2022-03-02 2022-04-05 支付宝(杭州)信息技术有限公司 Data storage method and device and data reading method and device
CN114861818A (en) * 2022-05-25 2022-08-05 平安普惠企业管理有限公司 Main data matching method, device, equipment and storage medium based on artificial intelligence
CN114896363A (en) * 2022-04-19 2022-08-12 北京月新时代科技股份有限公司 Data management method, device, equipment and medium
CN115577318A (en) * 2022-09-30 2023-01-06 北京大数据先进技术研究院 Data fusion evaluation method, system, equipment and storage medium based on semi-physical object
CN117556058A (en) * 2024-01-11 2024-02-13 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device
CN117725555A (en) * 2024-02-08 2024-03-19 暗物智能科技(广州)有限公司 Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699252B (en) * 2021-03-25 2021-07-23 成都数联铭品科技有限公司 Processing method of attribute data applied to knowledge graph and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145523A (en) * 2017-04-12 2017-09-08 浙江大学 Large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching
US20180082183A1 (en) * 2011-02-22 2018-03-22 Thomson Reuters Global Resources Machine learning-based relationship association and related discovery and search engines
CN108647318A (en) * 2018-05-10 2018-10-12 北京航空航天大学 A kind of knowledge fusion method based on multi-source data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2874073A1 (en) * 2013-11-18 2015-05-20 Fujitsu Limited System, apparatus, program and method for data aggregation
CN105956015A (en) * 2016-04-22 2016-09-21 四川中软科技有限公司 Service platform integration method based on big data
CN107545046B (en) * 2017-08-17 2021-05-25 北京奇安信科技有限公司 Fusion method and device for multi-source heterogeneous data
CN107958086A (en) * 2017-12-18 2018-04-24 北京睿力科技有限公司 The multi-source heterogeneous database data for solving data semantic Heterogeneity integrates method
CN109033129B (en) * 2018-06-04 2021-08-03 桂林电子科技大学 Multi-source information fusion knowledge graph representation learning method based on self-adaptive weight
CN109739939A (en) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 The data fusion method and device of knowledge mapping

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082183A1 (en) * 2011-02-22 2018-03-22 Thomson Reuters Global Resources Machine learning-based relationship association and related discovery and search engines
CN107145523A (en) * 2017-04-12 2017-09-08 浙江大学 Large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching
CN108647318A (en) * 2018-05-10 2018-10-12 北京航空航天大学 A kind of knowledge fusion method based on multi-source data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
》,《图书情报工作》杂志社编: "《面向MOOC的图书馆嵌入式服务创新》", 北京:海洋出版社, pages: 154 - 155 *

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020114022A1 (en) * 2018-12-04 2020-06-11 平安科技(深圳)有限公司 Knowledge base alignment method and apparatus, computer device and storage medium
WO2020135048A1 (en) * 2018-12-29 2020-07-02 颖投信息科技(上海)有限公司 Data merging method and apparatus for knowledge graph
CN110427415A (en) * 2019-08-02 2019-11-08 泰康保险集团股份有限公司 Knowledge share method, device, system media and electronic equipment
CN110532304A (en) * 2019-09-06 2019-12-03 京东城市(北京)数字科技有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN110580294B (en) * 2019-09-11 2022-11-29 腾讯科技(深圳)有限公司 Entity fusion method, device, equipment and storage medium
CN110580294A (en) * 2019-09-11 2019-12-17 腾讯科技(深圳)有限公司 Entity fusion method, device, equipment and storage medium
CN110704635A (en) * 2019-09-16 2020-01-17 金色熊猫有限公司 Conversion method and device for ternary group data in knowledge graph
CN110704635B (en) * 2019-09-16 2023-12-12 金色熊猫有限公司 Method and device for converting triplet data in knowledge graph
CN110598072A (en) * 2019-09-24 2019-12-20 恩亿科(北京)数据科技有限公司 Feature data aggregation method and device
CN110598072B (en) * 2019-09-24 2022-03-01 恩亿科(北京)数据科技有限公司 Feature data aggregation method and device
WO2021082100A1 (en) * 2019-10-30 2021-05-06 平安科技(深圳)有限公司 Method and apparatus for aligning entities of knowledge graph, device, and storage medium
CN110826316B (en) * 2019-11-06 2021-08-10 北京交通大学 Method for identifying sensitive information applied to referee document
CN110826316A (en) * 2019-11-06 2020-02-21 北京交通大学 Method for identifying sensitive information applied to referee document
CN111026874A (en) * 2019-11-22 2020-04-17 海信集团有限公司 Data processing method and server of knowledge graph
CN110929105A (en) * 2019-11-28 2020-03-27 杭州云徙科技有限公司 User ID (identity) association method based on big data technology
CN110929105B (en) * 2019-11-28 2022-11-29 广东云徙智能科技有限公司 User ID (identity) association method based on big data technology
CN111125376A (en) * 2019-12-23 2020-05-08 秒针信息技术有限公司 Knowledge graph generation method and device, data processing equipment and storage medium
CN111125376B (en) * 2019-12-23 2023-08-29 秒针信息技术有限公司 Knowledge graph generation method and device, data processing equipment and storage medium
CN111475653A (en) * 2019-12-30 2020-07-31 北京国双科技有限公司 Method and device for constructing knowledge graph in oil and gas exploration and development field
CN111475653B (en) * 2019-12-30 2021-03-02 北京国双科技有限公司 Method and device for constructing knowledge graph in oil and gas exploration and development field
CN111291196A (en) * 2020-01-22 2020-06-16 腾讯科技(深圳)有限公司 Method and device for improving knowledge graph and method and device for processing data
CN111291196B (en) * 2020-01-22 2024-03-22 腾讯科技(深圳)有限公司 Knowledge graph perfecting method and device, and data processing method and device
CN111444351B (en) * 2020-03-24 2023-09-12 清华苏州环境创新研究院 Knowledge graph construction method and device in industrial process field
CN111444351A (en) * 2020-03-24 2020-07-24 清华苏州环境创新研究院 Method and device for constructing knowledge graph in industrial process field
CN111597239A (en) * 2020-04-10 2020-08-28 中科驭数(北京)科技有限公司 Data alignment method and device
CN111522803B (en) * 2020-04-14 2023-05-19 北京仁科互动网络技术有限公司 Tenant interaction method and device of software service platform and electronic equipment
CN111522803A (en) * 2020-04-14 2020-08-11 北京仁科互动网络技术有限公司 Tenant interaction method and device of software service platform and electronic equipment
CN111563133A (en) * 2020-05-06 2020-08-21 支付宝(杭州)信息技术有限公司 Method and system for data fusion based on entity relationship
CN112182330A (en) * 2020-09-23 2021-01-05 创新奇智(成都)科技有限公司 Knowledge graph construction method and device, electronic equipment and computer storage medium
CN112906826A (en) * 2021-03-30 2021-06-04 平安科技(深圳)有限公司 Multi-dimension-based knowledge graph fusion method and device and computer equipment
CN113297213B (en) * 2021-04-29 2023-09-12 军事科学院系统工程研究院网络信息研究所 Dynamic multi-attribute matching method for entity object
CN113297213A (en) * 2021-04-29 2021-08-24 军事科学院系统工程研究院网络信息研究所 Dynamic multi-attribute matching method for entity object
CN113392227B (en) * 2021-05-31 2024-04-19 交控科技股份有限公司 Metadata knowledge graph engine system oriented to rail transit field
CN113392227A (en) * 2021-05-31 2021-09-14 交控科技股份有限公司 Metadata knowledge map engine system facing rail transit field
CN113760995A (en) * 2021-09-09 2021-12-07 上海明略人工智能(集团)有限公司 Entity linking method, system, equipment and storage medium
CN113901264A (en) * 2021-11-12 2022-01-07 央视频融媒体发展有限公司 Method and system for matching periodic entities among movie and television attribute data sources
CN113934866A (en) * 2021-12-17 2022-01-14 鲁班(北京)电子商务科技有限公司 Commodity entity matching method and device based on set similarity
CN114282073B (en) * 2022-03-02 2022-07-15 支付宝(杭州)信息技术有限公司 Data storage method and device and data reading method and device
CN114282073A (en) * 2022-03-02 2022-04-05 支付宝(杭州)信息技术有限公司 Data storage method and device and data reading method and device
CN114896363A (en) * 2022-04-19 2022-08-12 北京月新时代科技股份有限公司 Data management method, device, equipment and medium
CN114861818A (en) * 2022-05-25 2022-08-05 平安普惠企业管理有限公司 Main data matching method, device, equipment and storage medium based on artificial intelligence
CN115577318B (en) * 2022-09-30 2023-07-21 北京大数据先进技术研究院 Semi-physical-based data fusion evaluation method, system, equipment and storage medium
CN115577318A (en) * 2022-09-30 2023-01-06 北京大数据先进技术研究院 Data fusion evaluation method, system, equipment and storage medium based on semi-physical object
CN117556058A (en) * 2024-01-11 2024-02-13 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device
CN117556058B (en) * 2024-01-11 2024-05-24 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device
CN117725555A (en) * 2024-02-08 2024-03-19 暗物智能科技(广州)有限公司 Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium
CN117725555B (en) * 2024-02-08 2024-06-11 暗物智能科技(广州)有限公司 Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2020135048A1 (en) 2020-07-02

Similar Documents

Publication Publication Date Title
CN109739939A (en) The data fusion method and device of knowledge mapping
US11599714B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
CN110990638B (en) Large-scale data query acceleration device and method based on FPGA-CPU heterogeneous environment
CN110347719B (en) Enterprise foreign trade risk early warning method and system based on big data
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
US20210097089A1 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN110750649A (en) Knowledge graph construction and intelligent response method, device, equipment and storage medium
US20160217189A1 (en) Augmenting queries when searching a semantic database
EP2973038A1 (en) Classifying resources using a deep network
CN111414491A (en) Power grid industry knowledge graph construction method, device and equipment
US11423018B1 (en) Multivariate analysis replica intelligent ambience evolving system
US11809506B1 (en) Multivariant analyzing replicating intelligent ambience evolving system
CN102123172A (en) Implementation method of Web service discovery based on neural network clustering optimization
KR20180129001A (en) Method and System for Entity summarization based on multilingual projected entity space
CN114996549A (en) Intelligent tracking method and system based on active object information mining
CN115344698A (en) Label processing method, label processing device, computer equipment, storage medium and program product
CN117149804A (en) Data processing method, device, electronic equipment and storage medium
CN115713386A (en) Multi-source information fusion commodity recommendation method and system
WO2023278154A1 (en) Apparatus and method for transforming unstructured data sources into both relational entities and machine learning models that support structured query language queries
CN116702784B (en) Entity linking method, entity linking device, computer equipment and storage medium
CN110032574A (en) The processing method and processing device of SQL statement
CN112966084B (en) Knowledge graph-based answer query method, device, equipment and storage medium
CN116523041A (en) Knowledge graph construction method, retrieval method and system for equipment field and electronic equipment
KR102454261B1 (en) Collaborative partner recommendation system and method based on user information
CN114519106A (en) Document level entity relation extraction method and system based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190510

RJ01 Rejection of invention patent application after publication