[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112732937A - Hidden relation acquisition method, device, equipment and medium based on knowledge graph - Google Patents

Hidden relation acquisition method, device, equipment and medium based on knowledge graph Download PDF

Info

Publication number
CN112732937A
CN112732937A CN202110037710.1A CN202110037710A CN112732937A CN 112732937 A CN112732937 A CN 112732937A CN 202110037710 A CN202110037710 A CN 202110037710A CN 112732937 A CN112732937 A CN 112732937A
Authority
CN
China
Prior art keywords
enterprise
data
processed
relation
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110037710.1A
Other languages
Chinese (zh)
Inventor
马宁亚
陈奕安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Asset Management Co Ltd
Original Assignee
Ping An Asset Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Asset Management Co Ltd filed Critical Ping An Asset Management Co Ltd
Priority to CN202110037710.1A priority Critical patent/CN112732937A/en
Publication of CN112732937A publication Critical patent/CN112732937A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the field of artificial intelligence, in particular to a hidden relation acquisition method, device, equipment and medium based on a knowledge graph. The method comprises the following steps: acquiring enterprise data to be processed, and extracting basic attribute information in the enterprise data to be processed; comparing the basic attribute information to acquire data with a hidden relation from the enterprise data to be processed; processing the to-be-processed enterprise data which cannot acquire the hidden relation based on the basic attribute information according to the knowledge graph so as to determine the to-be-processed enterprise data with the indirect relation; acquiring enterprise classification and indirect relation type of the enterprise data to be processed with indirect relation; and cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type to determine the data with the hidden relation. In addition, the invention also relates to a block chain technology, and the privacy information of the user can be stored in the block chain node. By adopting the method, the hidden relation among enterprises can be accurately extracted.

Description

Hidden relation acquisition method, device, equipment and medium based on knowledge graph
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a hidden relationship acquisition method, apparatus, device, and medium based on a knowledge graph.
Background
Data mining refers to the process of algorithmically searching a large amount of data for information hidden therein. In real life, mining needs to be carried out on big data to determine hidden information between data, for example, in the process of enterprise credit risk judgment, credit of enterprises needs to be determined according to relationships between the enterprises. But often enterprises generally have a direct relationship or a hidden relationship between them. In the traditional technology, the direct relation can be obtained through a disclosed financial statement and the like, and the hidden relation is obtained without relevant content.
Therefore, a method for acquiring the hidden relationship between enterprises is urgently needed.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a method, an apparatus, a device, and a medium for acquiring hidden relations based on a knowledge graph, which can accurately extract hidden relations between enterprises.
A hidden relation obtaining method based on knowledge graph includes:
acquiring enterprise data to be processed, and extracting basic attribute information in the enterprise data to be processed;
comparing the basic attribute information to acquire data with hidden relations from the enterprise data to be processed;
processing the enterprise data to be processed, which cannot acquire a hidden relation based on the basic attribute information, according to a knowledge graph to determine the enterprise data to be processed with an indirect relation;
acquiring the enterprise classification and the indirect relation type of the enterprise data to be processed with the indirect relation;
and cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type to determine the data with the hidden relation.
In one embodiment, the obtaining of the enterprise classification of the to-be-processed enterprise data having indirect relationship includes:
reading the classification field of the enterprise data to be processed with indirect relation;
when the classification field has an enterprise classification, acquiring the enterprise classification in the classification field;
when the classification field does not have enterprise classification, extracting the business data of each position from the to-be-processed enterprise data with indirect relation;
and obtaining the enterprise classification of the enterprise data to be processed according to the operation data of each position.
In one embodiment, the obtaining the enterprise classification of the to-be-processed enterprise data according to the business data of each location includes:
performing word segmentation processing on the operation data of each position, and acquiring word segmentation positions of each word in the operation data;
calculating a reverse document frequency value of the word segmentation according to the word segmentation and the corresponding word segmentation position;
acquiring the participles of which the reverse document frequency values are greater than a preset value;
and obtaining the enterprise classification of the enterprise data to be processed with indirect relation through the obtained word segmentation.
In one embodiment, the manner of obtaining the indirect relationship type includes:
acquiring enterprise nodes corresponding to a plurality of enterprise data to be processed connected through a related node;
and obtaining the indirect relation type according to the acquired knowledge graph structures of the enterprise nodes and the associated nodes.
In one embodiment, the basic attribute information comprises at least one of a business name, a contact address, a business address and a business person name; the comparing the basic attribute information to obtain data with hidden relations from the enterprise data to be processed includes:
acquiring to-be-processed enterprise data with the similarity of the enterprise name larger than a similarity threshold value as data with a hidden relation; or
Acquiring to-be-processed enterprise data with the same contact way as data with a hidden relation; or
Acquiring longitude and latitude information based on the enterprise address, and acquiring enterprise data to be processed with the difference value of the longitude and latitude information within a preset range as data with a hidden relation; or
And acquiring data with hidden relations from the enterprise data to be processed through the enterprise personnel names.
In one embodiment, the obtaining data with a hidden relationship from the to-be-processed enterprise data by the enterprise staff name includes:
extracting resume information corresponding to the enterprise personnel name;
extracting a first enterprise name from the resume information;
comparing a second business name in the to-be-processed business data associated with the business personnel name with the first business name;
and when the similarity between the second enterprise name and the first enterprise name is greater than an enterprise name threshold value, determining that the to-be-processed enterprise data associated with the enterprise personnel name is data with a hidden relation.
A hidden relation acquisition apparatus based on a knowledge-graph, the apparatus comprising:
the system comprises a first data acquisition module, a second data acquisition module and a first data processing module, wherein the first data acquisition module is used for acquiring enterprise data to be processed and extracting basic attribute information in the enterprise data to be processed;
the comparison module is used for comparing the basic attribute information to acquire data with hidden relations from the enterprise data to be processed;
the knowledge graph processing module is used for processing the enterprise data to be processed, which cannot acquire the hidden relation based on the basic attribute information, according to a knowledge graph so as to determine the enterprise data to be processed with the indirect relation;
the second data acquisition module is used for acquiring the enterprise classification and the indirect relationship type of the enterprise data to be processed with the indirect relationship;
and the hidden relation acquisition module is used for cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type so as to determine the data with the hidden relation.
In one embodiment, the second data acquisition module includes:
the reading unit is used for reading the classification field of the enterprise data to be processed with the indirect relation;
the first enterprise classification acquisition unit is used for acquiring enterprise classifications in the classification fields when the classification fields have enterprise classifications;
the operation data acquisition unit is used for extracting operation data of each position from the to-be-processed enterprise data with indirect relation when the classification field does not have enterprise classification;
and the second enterprise classification acquisition unit is used for acquiring the enterprise classification of the enterprise data to be processed according to the operation data of each position.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method in any of the above embodiments when executing the computer program.
A computer storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method in any of the above embodiments.
According to the method, the device, the equipment and the medium for acquiring the hidden relation based on the knowledge graph, firstly, the enterprise data to be processed is processed through the basic attributes, so that the calculation force requirement is low through the processing of the basic attributes, the speed is high, the data processing amount is reduced, secondly, the knowledge graph is processed, so that the speed of processing the knowledge graph can be increased on the premise of reducing the data processing amount, the integral processing speed is increased, in addition, the basic attributes are processed firstly, the efficiency is increased, and then, the accuracy is increased through the knowledge graph processing.
Drawings
FIG. 1 is a diagram illustrating an application scenario of a hidden relationship obtaining method based on a knowledge graph in an embodiment;
FIG. 2 is a schematic flow chart of a hidden relationship acquisition method based on a knowledge graph in an embodiment;
FIG. 3 is a flowchart of the enterprise category acquisition step in one embodiment;
FIG. 4 is a block diagram of an apparatus for obtaining hidden relationships based on knowledge-graphs in one embodiment;
FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The hidden relation obtaining method based on the knowledge graph can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 sends the enterprise data to be processed to the server 104, and the server 104 obtains the enterprise data to be processed and extracts basic attribute information in the enterprise data to be processed; comparing the basic attribute information to acquire data with a hidden relation from the enterprise data to be processed; processing the to-be-processed enterprise data which cannot acquire the hidden relation based on the basic attribute information according to the knowledge graph so as to determine the to-be-processed enterprise data with the indirect relation; acquiring enterprise classification and indirect relation type of the enterprise data to be processed with indirect relation; and cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type so as to determine the data with the hidden relation. The enterprise data to be processed is processed through the basic attributes firstly, the processing through the basic attributes has low calculation force requirement and high speed, the data processing amount is reduced, then the data is processed through the knowledge graph, the speed of processing the knowledge graph can be increased on the premise of reducing the data processing amount, the overall processing speed is increased, in addition, the basic attributes are processed firstly, the efficiency is improved, and the accuracy is improved through the knowledge graph processing.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a hidden relation obtaining method based on a knowledge graph is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
s202: and acquiring enterprise data to be processed, and extracting basic attribute information in the enterprise data to be processed.
Specifically, the to-be-processed enterprise data may be preprocessed enterprise data, that is, the to-be-processed enterprise data that has been proposed and has an association relationship disclosed by a financial statement and the like, and the part of the data has a direct relationship and does not need to be acquired.
In addition, direct relationships can also be formed through the knowledge graph, namely the data of the to-be-processed enterprises with nodes connected directly through edges in the knowledge graph are removed, and the top five customers, suppliers, accounts receivable and fund exchange of the enterprises are taken as candidate sets mined for hidden relationships, because the hidden relationships are more likely to exist in four relationships and are more meaningful in practical application.
The basic attribute information is basic information in the enterprise data to be processed, which may include, but is not limited to, at least one of an enterprise name, a contact address, an enterprise address, and an enterprise personnel name.
S204: and comparing the basic attribute information to acquire the data with the hidden relation from the enterprise data to be processed.
Specifically, the server may determine the enterprises with the same basic attribute information by comparing the basic attribute information, so that the to-be-processed enterprise data with the same basic attribute information is data with a hidden relationship, and the same basic attribute information is used as the hidden relationship.
For example, the server may simultaneously compare the basic attribute information of each type in a parallel manner, and finally output the to-be-processed enterprise data with the hidden relationship, while continuing to perform the next processing on the to-be-processed enterprise data without the hidden relationship.
S206: and processing the to-be-processed enterprise data which cannot be acquired based on the basic attribute information according to the knowledge graph so as to determine the to-be-processed enterprise data with the indirect relationship.
Specifically, the judgment is carried out through the knowledge graph, namely the to-be-processed enterprise data with the relation is determined through each node and edge in the knowledge graph. The indirect relationship means that at least 2 hops exist instead of nodes directly connected through edges, that is, enterprise data to be processed corresponding to nodes connected with one node through at least two edges, and the server inputs the enterprise data to be processed, which cannot acquire the hidden relationship based on the basic attribute information, into the knowledge graph so as to acquire the enterprise data to be processed corresponding to the nodes connected with one node through at least two edges.
S208: and acquiring the enterprise classification and the indirect relation type of the enterprise data to be processed with the indirect relation.
S210: and cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type so as to determine the data with the hidden relation.
In particular, the enterprise category refers to an industry to which the enterprise belongs, such as a financial enterprise and the like. The indirect relationship type refers to a type of a structure formed by nodes connected with one node through at least two edges, such as a V-type, a star-type, and the like.
Specifically, because the financial enterprise shows that the association of the enterprise connected by the financial enterprise is weak through expert experience, the financial enterprise may be determined according to the enterprise type, and the financial enterprise in the to-be-processed enterprise data corresponding to the node connected by at least two edges and one node is deleted, and the to-be-processed enterprise data of the preset indirect relationship type is deleted, for example, the to-be-processed enterprise data of the V-type relationship is deleted, because the AB enterprise guarantees the C enterprise, and the expert experience shows that the association of the AB enterprise is weak in this structure. And finally, acquiring the remaining to-be-processed enterprise data with the existing relationship as the data with the hidden relationship.
Optionally, after the to-be-processed enterprise data having the indirect relationship is cleaned according to the enterprise classification and the indirect relationship type, the to-be-processed enterprise data corresponding to the authority may be deleted, because there are many enterprises associated with the authority.
According to the hidden relation obtaining method based on the knowledge graph, firstly, enterprise data to be processed are processed through the basic attributes, the calculation force requirement is low through the processing of the basic attributes, the speed is high, the data processing amount is reduced, secondly, the knowledge graph is processed, the speed of processing the knowledge graph can be increased on the premise that the data processing amount is reduced, the overall processing speed is increased, in addition, the basic attributes are processed firstly, the efficiency is improved, and then the accuracy is improved through the knowledge graph processing.
In one embodiment, obtaining the enterprise classification of the to-be-processed enterprise data with indirect relationship comprises: reading a classification field of the enterprise data to be processed with the indirect relation; when the enterprise classification exists in the classification field, acquiring the enterprise classification in the classification field; when the classification field does not have enterprise classification, extracting the business data of each position from the to-be-processed enterprise data with indirect relation; and obtaining the enterprise classification of the enterprise data to be processed according to the operation data of each position.
In one embodiment, obtaining the enterprise classification of the enterprise data to be processed according to the business data of each position comprises: performing word segmentation processing on the operation data of each position, and acquiring word segmentation positions of each word segmentation in the operation data; calculating a reverse document frequency value of the word segmentation according to the word segmentation and the corresponding word segmentation position; acquiring the participles of which the reverse document frequency values are greater than a preset value; and obtaining the enterprise classification of the enterprise data to be processed with indirect relation through the obtained word segmentation.
Specifically, referring to fig. 3, fig. 3 is a flowchart illustrating the enterprise category obtaining step in one embodiment. The classification field is a field which represents the industry related to the enterprise in the enterprise data to be processed, and may include a plurality of fields, if the enterprise classification exists in the classification field, the enterprise classification may be directly obtained, and if the enterprise classification does not exist, the judgment needs to be performed according to the operation data in the enterprise data to be processed, where the operation data of the enterprise to be processed may include an enterprise name, a text of an operation range, a name of an enterprise directly associated with the enterprise, and the like, and the enterprise names, the text of the operation range, and the names of enterprises directly associated with the enterprise also represent the position of the operation data.
The server firstly performs word segmentation processing on the operation data, preferably performs word segmentation through an industry keyword, and then obtains the position of each word segmentation, so that a reverse document frequency value is calculated through the word segmentation and the corresponding word segmentation position, for example, according to an original formula of the reverse document frequency value:
idf(i)=log(M1/M2+1)
wherein, M1 represents the number of documents containing the word in the industry category containing the ith word most, and M2 represents the number of documents containing the word in the industry category containing the ith word second most. Then, according to the probability of the preset position of each word segmentation, for example, the maximum probability is 60% in the name, 30% in the business text, and 10% in the name of the directly related enterprise of the enterprise, respectively, calculation is performed to determine the enterprise classification of the enterprise data to be processed.
In the embodiment, the enterprise classification is determined by introducing the reverse order document number of the participles and the position of the participles, so that the calculation of the enterprise classification is more accurate.
In one embodiment, the manner of obtaining the indirect relationship type includes: acquiring enterprise nodes corresponding to a plurality of enterprise data to be processed connected through a related node; and obtaining the indirect relation type according to the acquired knowledge graph structures of the enterprise nodes and the associated nodes.
Specifically, the two-hop structure, that is, the structure of the enterprise node connected by two edges and an associated node, is mainly identified here, where the server may obtain the two-hop enterprise nodes, and then construct a knowledge graph structure of the enterprise node and the associated node, such as a V-shape or a closed triangle, and use the type as an indirect relationship type.
In the above embodiment, the indirect relationship type is directly obtained through the knowledge graph structure, which is more direct and has strong visualization.
In one embodiment, the basic attribute information comprises at least one of a business name, a contact address, a business address, and a business person name; comparing the basic attribute information to acquire the data with the hidden relation from the enterprise data to be processed, wherein the method comprises the following steps: acquiring to-be-processed enterprise data with the similarity of the enterprise names larger than a similarity threshold value as data with hidden relations; or acquiring the enterprise data to be processed with the same contact way as the data with the hidden relation; or acquiring longitude and latitude information based on the enterprise address, and acquiring enterprise data to be processed with the difference value of the longitude and latitude information within a preset range as data with a hidden relation; or acquiring the data with the hidden relation from the enterprise data to be processed through the name of the enterprise personnel.
Specifically, for the contact information, whether the mailboxes, telephones and faxes of the two companies are the same or not is directly calculated, and if yes, the mailboxes, telephones and faxes are used as data with a hidden relation.
And for the enterprise names, calculating the similarity of the names of the two companies, calculating the Jaro-Winkler distance cosine similarity of the word embedded vector, and eliminating the companies belonging to the same group.
And for the enterprise address, calculating the similarity of the registered address of the company in the same way, namely calculating the Jaro-Winkler distance cosine similarity of the word embedded vector, eliminating the companies belonging to the same group, preferably combining GIS map data, namely acquiring longitude and latitude information corresponding to the registered address, and then acquiring the enterprise data to be processed with the difference value of the longitude and latitude information within a preset range as the data with the hidden relation.
In one embodiment, the obtaining of the data with the hidden relationship from the to-be-processed enterprise data by the name of the enterprise personnel includes: extracting resume information corresponding to the names of the enterprise personnel; extracting a first enterprise name from the calendar information; comparing a second enterprise name in the to-be-processed enterprise data associated with the enterprise personnel name with the first enterprise name; and when the similarity between the second enterprise name and the first enterprise name is greater than the enterprise name threshold value, determining that the to-be-processed enterprise data associated with the enterprise personnel name is data with a hidden relation.
Specifically, by using the resume NER extraction and the synonym identification submodel, whether the same high administration (legal, director, high administration, director secret) exists between the two enterprises currently or historically or not is determined, and if the association is directly determined, the relatives or classmate relations between the important persons in the same reason are also treated in the same way.
In practical application, based on a BERT-BILSTM-CNN-CRF framework, NER is carried out on the personal resume to extract a company name, then BERT embedded vectors of the company name are calculated, cosine similarity is solved for the extracted company name list, and the high-ranking management is judged if the cosine similarity is greater than a threshold value. (at least 2 of the past company similarities are larger than the threshold value or 1 of the similarities are larger than the threshold value but the same year of birth as the person) so as to determine that the enterprise data to be processed associated with the name of the enterprise person is the data with hidden relation.
It is emphasized that, in order to further ensure the privacy and security of the to-be-processed enterprise data and the data with hidden relationships, the to-be-processed enterprise data and the data with hidden relationships may also be stored in nodes of a blockchain.
It should be understood that although the steps in the flowcharts of fig. 2 and 3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 4, there is provided a hidden relation obtaining apparatus based on a knowledge-graph, including: a first data acquisition module 100, a comparison module 200, a knowledge graph processing module 300, a second data acquisition module 400, and a hidden relationship acquisition module 500, wherein:
the first data acquisition module 100 is configured to acquire enterprise data to be processed and extract basic attribute information in the enterprise data to be processed;
a comparison module 200, configured to compare the basic attribute information to obtain data with a hidden relationship from the to-be-processed enterprise data;
the knowledge graph processing module 300 is configured to process to-be-processed enterprise data, for which a hidden relationship cannot be obtained based on basic attribute information, according to a knowledge graph, so as to determine to-be-processed enterprise data having an indirect relationship;
a second data obtaining module 400, configured to obtain an enterprise classification and an indirect relationship type of the to-be-processed enterprise data having an indirect relationship;
and the hidden relationship obtaining module 500 is configured to clean the to-be-processed enterprise data with the indirect relationship according to the enterprise classification and the indirect relationship type, so as to determine the data with the hidden relationship.
In one embodiment, the second data obtaining module 400 includes:
the reading unit is used for reading the classification field of the enterprise data to be processed with the indirect relation;
the first enterprise classification acquisition unit is used for acquiring enterprise classifications in the classification fields when the enterprise classifications exist in the classification fields;
the operation data acquisition unit is used for extracting operation data of each position from the to-be-processed enterprise data with indirect relation when the classification field does not have enterprise classification;
and the second enterprise classification acquisition unit is used for acquiring enterprise classification of the enterprise data to be processed according to the operation data of each position.
In one embodiment, the second enterprise category acquiring unit includes:
the word segmentation subunit is used for carrying out word segmentation processing on the operation data of each position and acquiring the word segmentation position of each word in the operation data;
the reverse document frequency value calculating operator unit is used for calculating the reverse document frequency value of the participle according to the participle and the corresponding participle position;
the word segmentation selection subunit is used for acquiring the word segmentation of which the reverse document frequency value is greater than a preset value;
and the enterprise classification determining subunit obtains the enterprise classification of the enterprise data to be processed with indirect relation through the obtained word segmentation.
In one embodiment, the second data obtaining module 400 includes:
the node determining unit is used for acquiring enterprise nodes corresponding to a plurality of enterprise data to be processed which are connected through one associated node;
and the relationship type determining unit is used for obtaining the indirect relationship type according to the acquired knowledge graph structures of the enterprise nodes and the associated nodes.
In one embodiment, the basic attribute information comprises at least one of a business name, a contact address, a business address, and a business person name; the comparison module 200 is configured to obtain to-be-processed enterprise data with a similarity greater than a similarity threshold of the enterprise name as data with a hidden relationship; or acquiring the enterprise data to be processed with the same contact way as the data with the hidden relation; or acquiring longitude and latitude information based on the enterprise address, and acquiring enterprise data to be processed with the difference value of the longitude and latitude information within a preset range as data with a hidden relation; or acquiring the data with the hidden relation from the enterprise data to be processed through the name of the enterprise personnel.
In one embodiment, the alignment module 200 includes:
the extraction unit is used for extracting resume information corresponding to the enterprise personnel names and extracting a first enterprise name from the resume information;
the comparison unit is used for comparing a second enterprise name in the to-be-processed enterprise data associated with the enterprise personnel name with the first enterprise name;
and the output unit is used for determining that the to-be-processed enterprise data associated with the enterprise personnel names is data with a hidden relation when the similarity between the second enterprise name and the first enterprise name is greater than the enterprise name threshold value.
For specific limitations of the hidden relation acquiring apparatus based on the knowledge graph, reference may be made to the above limitations of the hidden relation acquiring method based on the knowledge graph, and details thereof are not repeated here. All or part of the modules in the knowledge-graph-based hidden relation acquisition device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing enterprise data to be processed. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of knowledge-graph based hidden relationship acquisition.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring enterprise data to be processed, and extracting basic attribute information in the enterprise data to be processed; comparing the basic attribute information to acquire data with a hidden relation from the enterprise data to be processed; processing the to-be-processed enterprise data which cannot acquire the hidden relation based on the basic attribute information according to the knowledge graph so as to determine the to-be-processed enterprise data with the indirect relation; acquiring enterprise classification and indirect relation type of the enterprise data to be processed with indirect relation; and cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type so as to determine the data with the hidden relation.
In one embodiment, the obtaining of the enterprise classification of the enterprise data to be processed having indirect relationship, which is realized by the processor when the computer program is executed, comprises: reading a classification field of the enterprise data to be processed with the indirect relation; when the enterprise classification exists in the classification field, acquiring the enterprise classification in the classification field; when the classification field does not have enterprise classification, extracting the business data of each position from the to-be-processed enterprise data with indirect relation; and obtaining the enterprise classification of the enterprise data to be processed according to the operation data of each position.
In one embodiment, the obtaining of the enterprise classification of the enterprise data to be processed according to the business data of each location by the processor when the processor executes the computer program comprises: performing word segmentation processing on the operation data of each position, and acquiring word segmentation positions of each word segmentation in the operation data; calculating a reverse document frequency value of the word segmentation according to the word segmentation and the corresponding word segmentation position; acquiring the participles of which the reverse document frequency values are greater than a preset value; and obtaining the enterprise classification of the enterprise data to be processed with indirect relation through the obtained word segmentation.
In one embodiment, the manner of obtaining the type of indirect relationship implemented by the processor when executing the computer program comprises: acquiring enterprise nodes corresponding to a plurality of enterprise data to be processed connected through a related node; and obtaining the indirect relation type according to the acquired knowledge graph structures of the enterprise nodes and the associated nodes.
In one embodiment, the underlying attribute information implemented by the processor when executing the computer program includes at least one of a business name, a contact address, a business address, and a business person name; comparing the basic attribute information to acquire the data with the hidden relation from the enterprise data to be processed, wherein the method comprises the following steps: acquiring to-be-processed enterprise data with the similarity of the enterprise names larger than a similarity threshold value as data with hidden relations; or acquiring the enterprise data to be processed with the same contact way as the data with the hidden relation; or acquiring longitude and latitude information based on the enterprise address, and acquiring enterprise data to be processed with the difference value of the longitude and latitude information within a preset range as data with a hidden relation; or acquiring the data with the hidden relation from the enterprise data to be processed through the name of the enterprise personnel.
In one embodiment, the obtaining of the data with the hidden relationship from the enterprise data to be processed by the enterprise personnel name when the processor executes the computer program includes: extracting resume information corresponding to the names of the enterprise personnel; extracting a first enterprise name from the calendar information; comparing a second enterprise name in the to-be-processed enterprise data associated with the enterprise personnel name with the first enterprise name; and when the similarity between the second enterprise name and the first enterprise name is greater than the enterprise name threshold value, determining that the to-be-processed enterprise data associated with the enterprise personnel name is data with a hidden relation.
In one embodiment, a computer storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of: acquiring enterprise data to be processed, and extracting basic attribute information in the enterprise data to be processed; comparing the basic attribute information to acquire data with a hidden relation from the enterprise data to be processed; processing the to-be-processed enterprise data which cannot acquire the hidden relation based on the basic attribute information according to the knowledge graph so as to determine the to-be-processed enterprise data with the indirect relation; acquiring enterprise classification and indirect relation type of the enterprise data to be processed with indirect relation; and cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type so as to determine the data with the hidden relation.
In one embodiment, the enterprise classification for obtaining pending enterprise data for which an indirect relationship exists, implemented when the computer program is executed by the processor, includes: reading a classification field of the enterprise data to be processed with the indirect relation; when the enterprise classification exists in the classification field, acquiring the enterprise classification in the classification field; when the classification field does not have enterprise classification, extracting the business data of each position from the to-be-processed enterprise data with indirect relation; and obtaining the enterprise classification of the enterprise data to be processed according to the operation data of each position.
In one embodiment, the enterprise classification of the enterprise data to be processed according to the business data of each position, which is realized by the computer program when the computer program is executed by the processor, comprises the following steps: performing word segmentation processing on the operation data of each position, and acquiring word segmentation positions of each word segmentation in the operation data; calculating a reverse document frequency value of the word segmentation according to the word segmentation and the corresponding word segmentation position; acquiring the participles of which the reverse document frequency values are greater than a preset value; and obtaining the enterprise classification of the enterprise data to be processed with indirect relation through the obtained word segmentation.
In one embodiment, the manner of obtaining the indirect relationship type implemented when the computer program is executed by the processor includes: acquiring enterprise nodes corresponding to a plurality of enterprise data to be processed connected through a related node; and obtaining the indirect relation type according to the acquired knowledge graph structures of the enterprise nodes and the associated nodes.
In one embodiment, the underlying attribute information implemented when the computer program is executed by the processor includes at least one of a business name, a contact address, a business address, and a business person name; comparing the basic attribute information to acquire the data with the hidden relation from the enterprise data to be processed, wherein the method comprises the following steps: acquiring to-be-processed enterprise data with the similarity of the enterprise names larger than a similarity threshold value as data with hidden relations; or acquiring the enterprise data to be processed with the same contact way as the data with the hidden relation; or acquiring longitude and latitude information based on the enterprise address, and acquiring enterprise data to be processed with the difference value of the longitude and latitude information within a preset range as data with a hidden relation; or acquiring the data with the hidden relation from the enterprise data to be processed through the name of the enterprise personnel.
In one embodiment, the obtaining of data having a hidden relationship from enterprise data to be processed by an enterprise personnel name, which is implemented when a computer program is executed by a processor, includes: extracting resume information corresponding to the names of the enterprise personnel; extracting a first enterprise name from the calendar information; comparing a second enterprise name in the to-be-processed enterprise data associated with the enterprise personnel name with the first enterprise name; and when the similarity between the second enterprise name and the first enterprise name is greater than the enterprise name threshold value, determining that the to-be-processed enterprise data associated with the enterprise personnel name is data with a hidden relation.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A hidden relation obtaining method based on knowledge graph is characterized by comprising the following steps:
acquiring enterprise data to be processed, and extracting basic attribute information in the enterprise data to be processed;
comparing the basic attribute information to acquire data with hidden relations from the enterprise data to be processed;
processing the enterprise data to be processed, which cannot acquire a hidden relation based on the basic attribute information, according to a knowledge graph to determine the enterprise data to be processed with an indirect relation;
acquiring the enterprise classification and the indirect relation type of the enterprise data to be processed with the indirect relation;
and cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type to determine the data with the hidden relation.
2. The method of claim 1, wherein the obtaining the enterprise classification of the enterprise data to be processed with indirect relationship comprises:
reading the classification field of the enterprise data to be processed with indirect relation;
when the classification field has an enterprise classification, acquiring the enterprise classification in the classification field;
when the classification field does not have enterprise classification, extracting the business data of each position from the to-be-processed enterprise data with indirect relation;
and obtaining the enterprise classification of the enterprise data to be processed according to the operation data of each position.
3. The method of claim 2, wherein the obtaining the business classification of the business data to be processed according to the business data of each location comprises:
performing word segmentation processing on the operation data of each position, and acquiring word segmentation positions of each word in the operation data;
calculating a reverse document frequency value of the word segmentation according to the word segmentation and the corresponding word segmentation position;
acquiring the participles of which the reverse document frequency values are greater than a preset value;
and obtaining the enterprise classification of the enterprise data to be processed with indirect relation through the obtained word segmentation.
4. The method according to claim 1, wherein the indirect relationship type is obtained by:
acquiring enterprise nodes corresponding to a plurality of enterprise data to be processed connected through a related node;
and obtaining the indirect relation type according to the acquired knowledge graph structures of the enterprise nodes and the associated nodes.
5. The method of any one of claims 1 to 4, wherein the basic attribute information comprises at least one of a business name, a contact address, a business address, and a business person name; the comparing the basic attribute information to obtain data with hidden relations from the enterprise data to be processed includes:
acquiring to-be-processed enterprise data with the similarity of the enterprise name larger than a similarity threshold value as data with a hidden relation; or
Acquiring to-be-processed enterprise data with the same contact way as data with a hidden relation; or
Acquiring longitude and latitude information based on the enterprise address, and acquiring enterprise data to be processed with the difference value of the longitude and latitude information within a preset range as data with a hidden relation; or
And acquiring data with hidden relations from the enterprise data to be processed through the enterprise personnel names.
6. The method according to claim 5, wherein the obtaining data with hidden relationships from the to-be-processed enterprise data by the enterprise personnel name comprises:
extracting resume information corresponding to the enterprise personnel name;
extracting a first enterprise name from the resume information;
comparing a second business name in the to-be-processed business data associated with the business personnel name with the first business name;
and when the similarity between the second enterprise name and the first enterprise name is greater than an enterprise name threshold value, determining that the to-be-processed enterprise data associated with the enterprise personnel name is data with a hidden relation.
7. A hidden relation acquisition apparatus based on a knowledge graph, the apparatus comprising:
the system comprises a first data acquisition module, a second data acquisition module and a first data processing module, wherein the first data acquisition module is used for acquiring enterprise data to be processed and extracting basic attribute information in the enterprise data to be processed;
the comparison module is used for comparing the basic attribute information to acquire data with hidden relations from the enterprise data to be processed;
the knowledge graph processing module is used for processing the enterprise data to be processed, which cannot acquire the hidden relation based on the basic attribute information, according to a knowledge graph so as to determine the enterprise data to be processed with the indirect relation;
the second data acquisition module is used for acquiring the enterprise classification and the indirect relationship type of the enterprise data to be processed with the indirect relationship;
and the hidden relation acquisition module is used for cleaning the to-be-processed enterprise data with the indirect relation according to the enterprise classification and the indirect relation type so as to determine the data with the hidden relation.
8. The apparatus of claim 7, wherein the second data acquisition module comprises:
the reading unit is used for reading the classification field of the enterprise data to be processed with the indirect relation;
the first enterprise classification acquisition unit is used for acquiring enterprise classifications in the classification fields when the classification fields have enterprise classifications;
the operation data acquisition unit is used for extracting operation data of each position from the to-be-processed enterprise data with indirect relation when the classification field does not have enterprise classification;
and the second enterprise classification acquisition unit is used for acquiring the enterprise classification of the enterprise data to be processed according to the operation data of each position.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.
10. A computer storage medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202110037710.1A 2021-01-12 2021-01-12 Hidden relation acquisition method, device, equipment and medium based on knowledge graph Pending CN112732937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110037710.1A CN112732937A (en) 2021-01-12 2021-01-12 Hidden relation acquisition method, device, equipment and medium based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110037710.1A CN112732937A (en) 2021-01-12 2021-01-12 Hidden relation acquisition method, device, equipment and medium based on knowledge graph

Publications (1)

Publication Number Publication Date
CN112732937A true CN112732937A (en) 2021-04-30

Family

ID=75590513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110037710.1A Pending CN112732937A (en) 2021-01-12 2021-01-12 Hidden relation acquisition method, device, equipment and medium based on knowledge graph

Country Status (1)

Country Link
CN (1) CN112732937A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361962A (en) * 2021-06-30 2021-09-07 支付宝(杭州)信息技术有限公司 Method and device for identifying enterprise risk based on block chain network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189867A (en) * 2018-10-23 2019-01-11 中山大学 Relationship discovery method, apparatus and storage medium based on Corporate Intellectual map
CN109523153A (en) * 2018-11-12 2019-03-26 平安科技(深圳)有限公司 Acquisition methods, device, computer equipment and the storage medium of illegal fund collection enterprise
CN110297918A (en) * 2019-06-25 2019-10-01 深圳市酷开网络科技有限公司 A kind of method, intelligent terminal and storage medium calculating movie and television contents degree of correlation
CN110825882A (en) * 2019-10-09 2020-02-21 西安交通大学 Knowledge graph-based information system management method
CN111310469A (en) * 2020-01-16 2020-06-19 北京明略软件系统有限公司 Method and device for searching invisible relationship between entities, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189867A (en) * 2018-10-23 2019-01-11 中山大学 Relationship discovery method, apparatus and storage medium based on Corporate Intellectual map
CN109523153A (en) * 2018-11-12 2019-03-26 平安科技(深圳)有限公司 Acquisition methods, device, computer equipment and the storage medium of illegal fund collection enterprise
CN110297918A (en) * 2019-06-25 2019-10-01 深圳市酷开网络科技有限公司 A kind of method, intelligent terminal and storage medium calculating movie and television contents degree of correlation
CN110825882A (en) * 2019-10-09 2020-02-21 西安交通大学 Knowledge graph-based information system management method
CN111310469A (en) * 2020-01-16 2020-06-19 北京明略软件系统有限公司 Method and device for searching invisible relationship between entities, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361962A (en) * 2021-06-30 2021-09-07 支付宝(杭州)信息技术有限公司 Method and device for identifying enterprise risk based on block chain network

Similar Documents

Publication Publication Date Title
CN110489520B (en) Knowledge graph-based event processing method, device, equipment and storage medium
CN108509485B (en) Data preprocessing method and device, computer equipment and storage medium
CN109189367B (en) Data processing method, device, server and storage medium
CN108876133A (en) Risk assessment processing method, device, server and medium based on business information
CN110781246A (en) Enterprise association relationship construction method and system
CN109325118B (en) Unbalanced sample data preprocessing method and device and computer equipment
CN111784392A (en) Abnormal user group detection method, device and equipment based on isolated forest
CN110990390B (en) Data cooperative processing method, device, computer equipment and storage medium
WO2019148712A1 (en) Phishing website detection method, device, computer equipment and storage medium
CN112613917A (en) Information pushing method, device and equipment based on user portrait and storage medium
CN111858467B (en) File data processing method, device, equipment and medium based on artificial intelligence
CN108334625B (en) User information processing method and device, computer equipment and storage medium
CN109886719B (en) Data mining processing method and device based on grid and computer equipment
CN110336786B (en) Message sending method, device, computer equipment and storage medium
CN112417315A (en) User portrait generation method, device, equipment and medium based on website registration
CN112559526A (en) Data table export method and device, computer equipment and storage medium
CN110825817B (en) Enterprise suspected association judgment method and system
CN116089620A (en) Electronic archive data management method and system
CN114495137B (en) Bill abnormity detection model generation method and bill abnormity detection method
CN112732937A (en) Hidden relation acquisition method, device, equipment and medium based on knowledge graph
CN109460500B (en) Hotspot event discovery method and device, computer equipment and storage medium
CN117313058A (en) Information identification method, apparatus, computer device and storage medium
CN110598124A (en) Numerical value attribute mining method and device, computer equipment and storage medium
CN108647288A (en) Method for digging, device, computer equipment and the storage medium of business connection
CN111339373B (en) Atlas feature extraction method, atlas feature extraction system, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210430