CN118446662B - Information management method and system based on data fusion - Google Patents
Information management method and system based on data fusion Download PDFInfo
- Publication number
- CN118446662B CN118446662B CN202410897206.2A CN202410897206A CN118446662B CN 118446662 B CN118446662 B CN 118446662B CN 202410897206 A CN202410897206 A CN 202410897206A CN 118446662 B CN118446662 B CN 118446662B
- Authority
- CN
- China
- Prior art keywords
- entity
- post
- semantic vector
- demand
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007726 management method Methods 0.000 title claims abstract description 41
- 230000004927 fusion Effects 0.000 title claims abstract description 28
- 239000013598 vector Substances 0.000 claims abstract description 173
- 239000011159 matrix material Substances 0.000 claims abstract description 43
- 230000007115 recruitment Effects 0.000 claims abstract description 20
- 238000000034 method Methods 0.000 claims description 61
- 238000004364 calculation method Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 17
- 230000015654 memory Effects 0.000 claims description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 238000003064 k means clustering Methods 0.000 claims description 12
- 238000010276 construction Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000001914 filtration Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 101000637771 Homo sapiens Solute carrier family 35 member G1 Proteins 0.000 description 1
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 102000053339 human SLC35G1 Human genes 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/105—Human resources
- G06Q10/1053—Employment or hiring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to an information management method and system based on data fusion, which relate to the technical field of recruitment information management and comprise the steps of establishing a knowledge triplet set, constructing a domain knowledge map, generating a candidate entity alias set for a position demand text, constructing a candidate entity set for each candidate entity alias, outputting an entity semantic vector matrix and a relation semantic vector matrix by utilizing a TransE model, calculating the entity type with the highest semantic relevance score in the candidate entity set as a result entity, calculating the whole semantic vector of the position demand text, clustering all position demand texts to obtain a position demand cluster, extracting key entities of the position demand cluster, constructing a personalized position image, constructing a job seeker image, acquiring the position demand cluster which is most interested by the job seeker, calculating the recommendation weight of each key entity, recommending positions corresponding to M key entities with the highest recommendation weight to the job seeker, and realizing the nationwide recruitment.
Description
Technical Field
The application relates to the technical field of recruitment information management, in particular to an information management method and system based on data fusion.
Background
With the continuous development of economic globalization and the internet, the development of the internet has promoted explosive growth of information and also has promoted the development of economic globalization, wherein an important aspect of economic globalization is the continuous growth of the number and scale of nationwide enterprises. With the growth of nationwide enterprises, how to effectively manage mass recruitment data from different countries and regions has become an important topic of nationwide enterprise human resource management. The traditional recruitment information management method is difficult to adapt to the requirement of the big data age, and innovative technical means are needed to promote the informatization and intelligence level of the national recruitment.
The technical scheme of the conventional transnational recruitment information management mainly comprises the following steps: traditional machine learning models such as collaborative filtering, matrix decomposition, etc. can only make recommendations using shallow features of candidates or positions, ignoring deep associations between multidimensional information such as skills, industry, enterprise contexts, etc.
The requirements of different types of recruitment posts on candidates are quite different, how to deeply mine the matching degree between the candidates and enterprises according to post descriptions and enterprise requirements, automatically construct a personalized candidate evaluation index system, dynamically optimize index weights by using a machine learning algorithm, and improve the accuracy of human post matching, and is a problem with practical application value.
The Chinese patent name with publication number CN117893184A is a human resource information management method based on big data, which discloses the following steps: establishing a human resource information management database according to recruitment requirements of the recruitment company and personal resume data; presetting an input matching data standard of recruitment requirements of a recruitment company and personal resume data; retrieving corresponding data in a human resource information management database according to the recruitment requirement and the personal resume data which meet the input standards; the human resource information management database adopts various data to carry out correlation analysis, and comprehensively judges the human resource information data with high matching degree. The correlation of various data can be rapidly analyzed in the prior art, and then the most desirable human resource information of a user is obtained, but the patent adopts a keyword retrieval mode to match recruitment requirements and resume, ignores the correlation of a semantic layer, and has limited matching accuracy and diversity.
Disclosure of Invention
The present application aims to solve at least one of the technical problems in the related art to some extent. Therefore, the application aims to provide the information management method and the system based on data fusion, which realize the intellectualization and refinement of the transnational recruitment.
One aspect of the present application provides an information management method based on data fusion, including:
Step S100: collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types;
the post demand text of various posts is collected, an entity type set ST and a relation type set GX are defined, the specific method for establishing the domain knowledge graph according to the post demand text comprises the following steps of:
step S110: collecting post demand texts of various posts from an enterprise trans-national recruitment management database, wherein the post demand texts comprise the following contents: post description, job requirements and working experience;
Step S120: a set of entity types ST is defined, ,Representing a B-th entity type in the entity type set; b represents the total number of entity types;
Step S130: a set of relationship types GX of entity types is defined, ,Representing a C-th relationship type in the set of relationship types; c represents the total number of relationship types;
Step S140: establishing knowledge triplet set between entity type and relation type according to post demand text ,Representing the type of the entity of the header,Representing the type of tail entity,Representing the C-th relationship type in the set of relationship types, C ε {1, … …, C }; constructing a domain knowledge graph among entity type sets, relationship type sets and knowledge triplet sets;
The construction method of the domain knowledge graph comprises the following steps: taking entity types as nodes and relation types as edges, and constructing a knowledge triplet set according to the relation types between the entity types extracted from the post demand textThe knowledge triplet set comprises two entity types、And a relationship type connecting the two entity typesHead entity typeThrough the c-th relationship typeConnection to tail entity typeDividing the two entity types into head entity types according to the relation typesAnd tail entity typeConstructing and obtaining a domain knowledge graph;
Step S200: generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes; the candidate entity alias set includes candidate entity aliases;
Generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, utilizing a knowledge triplet set training TransE model, outputting an entity semantic vector matrix and a relation semantic vector matrix, calculating to obtain an entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and utilizing the entity semantic vector matrix to calculate the overall semantic vector of the post demand text, wherein the concrete method comprises the following steps:
step S210: defining entity alias dictionary Wherein, the method comprises the steps of, wherein,An ith entity type representing a set of entity types in the domain knowledge graph,An A-th entity alias representing an i-th entity type; a represents the total number of entity aliases;
step S220: for the post demand text, generating a candidate entity alias set by adopting a dictionary matching method, and for each candidate entity alias Searching all entity types matched with the entity alias dictionary from the entity alias dictionary to obtain a candidate entity set, and constructing an entity link [ ],);
Step S230: a negative sample triplet is obtained from the knowledge triplet set by adopting a negative sampling method, a TransE model is obtained by training the knowledge triplet set and the negative sample triplet, and an entity semantic vector matrix and a relationship semantic vector matrix are obtained by output;
step S240: entity semantic vector matrix output according to TransE model, for candidate entity aliases Retrieving entity types from entity semantic vector matricesEntity semantic vector of (2)Calculating candidate entity aliasesSemantic relevance scores of the entity types in the candidate entity set are compared with the semantic relevance scores of the entity types in the candidate entity set, and the entity type with the highest semantic relevance score is selected as a result entity of the candidate entity alias;
The candidate entity aliasThe calculation formula of the semantic relevance score with the i-th entity type is: Wherein, the method comprises the steps of, wherein, A semantic vector representing the alias of the candidate entity,Representing a cosine similarity function,AndRepresenting entity semantic vectors, respectivelySemantic vector with candidate entity aliasesIs a die length of (2);
Step S250: for each candidate entity alias in the post demand text, acquiring the entity type with the highest semantic relevance score in the candidate entity set, and acquiring the entity semantic vector of each entity type in the candidate entity set from the entity semantic vector matrix to acquire the entity semantic vector set of the post demand text ,For the entity semantic vector of the D entity type, calculating the overall semantic vector of the entity semantic vector set, wherein the calculation formula is as follows: d is the total number of entity types in the candidate entity set;
Step S260: calculating the whole semantic vector of all post demand texts to obtain a whole semantic vector set ,An overall semantic vector representing the E-th post demand text;
Step S300: taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster;
the specific method for clustering all post demand texts by using the whole semantic vector as a classification feature and adopting a K-means clustering algorithm to obtain K post demand clusters comprises the following steps:
step S310: whole semantic vector set for all post demand text Taking the whole semantic vectors as classification features, randomly selecting H whole semantic vectors as clustering centers of initial post demand clusters;
step S320: for each whole semantic vector, calculating the Euclidean distance from the whole semantic vector to the clustering center, and distributing the whole semantic vector to the post demand clustering cluster corresponding to the nearest clustering center until each whole semantic vector is distributed;
step S330: calculating the average value of all the whole semantic vectors in each post demand cluster, taking the average value as a new cluster center, and repeating the steps S320-S330 until the whole semantic vectors in each post demand cluster are not changed any more, thereby obtaining K post demand clusters ,K=H;
The specific method for extracting the key entity in each post demand cluster and constructing the personalized post image for each post demand cluster comprises the following steps:
Step S340: for the kth post demand cluster Counting the frequency of result entities of all post demand texts, and extracting F result entities with highest frequency as a key entity set of the post cluster,Representing a post demand class clusterIs the F-th key entity of (a);;
Step S350: for the f-th key entity In the domain knowledge graphAs a central node, dig its adjacent one-hop subgraphs as followsKey entity subgraph for centerObtaining a post demand clusterAll key entity subgraphs of (a)Merging all the key entity subgraphs to obtain a post demand class clusterIs a personalized post image of (a);;
Step S360: calculating key entitiesImportance weight of corresponding nodeThe calculation formula is as follows: Wherein, the method comprises the steps of, wherein, For key entities in all post requirement textsIs used for the frequency of (a),Is the sum of the frequencies of all key entities;
Step S370: the relation type between the key entities is the weight of the edge between the f key entity and the h key entity The calculation formula of (2) is as follows: Wherein, the method comprises the steps of, wherein, Entity semantic vectors for the f-th key entity,Entity semantic vectors that are the h-th key entity,In order to connect the relation semantic vector of the edge between the f key entity and the h key entity, the relation semantic vector is obtained by a relation semantic vector matrix output by TransE models,Calculating a symbol for the modulo length;
Step S400: constructing a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker;
The job seeker image is constructed according to the job seeker resume, the post demand text which is historically delivered by the job seeker is obtained to obtain the post demand cluster which is most interesting, the recommendation weight of each key entity is calculated by utilizing the personalized post image of the post demand cluster, and the posts corresponding to M key entities with the highest recommendation weights are recommended to job seekers by the specific method comprising the following steps:
step S410: extracting attribute text in job seeker resume, including: the professional skills, the academic level and the working experience, the attribute text is expressed as the corresponding entity type and relation type in the domain knowledge graph, the entity type of the job seeker and the relation type of the job seeker are obtained, and the image of the job seeker is constructed ;
Step S420: acquiring information of post demand text which is historically delivered by a job seeker, counting the times of historical delivery of each post demand cluster by the job seeker, and recording the times of historical delivery of the kth post demand cluster by the job seeker asCalculating the interest degree of the job seeker on the kth post demand cluster, wherein the calculation formula is as follows:, the total number of historical delivery times for job seekers;
step S430: selecting the position demand cluster with the highest interest as the position demand cluster with the highest interest of job seekers, and marking the position demand cluster as the position demand cluster with the highest interest of job seekers ;
Step S440: position demand cluster using most interesting job seekersCorresponding personalized post imagePost demand clusterThe f-th key entity in (a) isCalculating the entity type of each job seekerAnd key entitySemantic relevance scores between the two, and a calculation formula is as follows: Wherein, the method comprises the steps of, wherein, Respectively represent the entity types of job seekersAnd key entitiesIs a function of the entity semantic vector of (a),Respectively represent the entity types of job seekersAnd key entitiesIs a modular length of the entity semantic vector;
step S450: scoring semantic relevance above a score threshold Is taken as a key entitySemantic-related job seeker entity types to obtain related entity setsJ is the key entityA total number of semantically related job seeker entity types; For the J-th and key entity in the related entity set Semantic-related job seeker entity types;
Step S460: calculating job seeker image and personalized post image The recommendation weight of each key entity in the system is calculated according to the following formula: Wherein, the method comprises the steps of, wherein, Clustering for post demandsImportance weight of the f-th key entity of (c),Is the key entityImportance weight of semantically related jth job seeker entity type,Respectively, post demand clusterThe f-th key entity of (2)Entity semantic vectors of the j-th job seeker entity types related to semantics;;
Step S470: will personalize the post image The key entities in the list are ranked from high to low according to recommendation weights, the first M key entities are selected, and posts corresponding to the M key entities are recommended to job seekers.
One aspect of the present application provides an information management system based on data fusion, including:
The triplet and atlas construction module is used for collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between the entity type and the relation type, and constructing a domain knowledge atlas according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types;
The whole semantic vector calculation module is used for generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, utilizing a knowledge triplet set training TransE model, outputting an entity semantic vector matrix and a relation semantic vector matrix, calculating to obtain an entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the whole semantic vector of the post demand text by utilizing the entity semantic vector matrix; the candidate entity alias set includes candidate entity aliases;
The clustering and post image construction module is used for clustering all post demand texts by taking the whole semantic vector as a classification characteristic and adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster and constructing a personalized post image for each post demand cluster;
The job seeker post recommending module is used for constructing a job seeker image according to the job seeker resume, acquiring post demand texts which are historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommending weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommending weight to the job seeker.
An aspect of the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs steps in a data fusion based information management method.
An aspect of the present application provides a readable storage medium storing a computer program adapted to be loaded by a processor for performing steps in a data fusion based information management method.
Compared with the prior art, the information management method based on data fusion provided by the application has the following advantages:
And defining entity types and relation types by using post demand texts in an enterprise trans-national recruitment management database, and constructing a knowledge triplet, so that a domain knowledge map which comprehensively reflects post demand characteristics is established. Compared with the traditional post recommendation method based on keyword matching, the knowledge graph can more deeply and accurately represent semantic information of post requirements.
For the candidate entity aliases in the post demand text, the application generates a candidate entity set by using the entity aliases dictionary, and calculates the semantic relevance between the candidate entity aliases and the candidate entity by using TransE model, thereby realizing the link and disambiguation of the candidate entity. Compared with a rule or statistics-based entity linking method, the method can measure the correlation between the entity aliases and the entities in the semantic space of the knowledge graph, and the link and disambiguation accuracy is higher.
The application utilizes the whole semantic vector of the post demand text to carry out K-means clustering, and obtains the post demand cluster with similar semantic. On the basis, the key entity of each cluster is extracted, and a personalized post image is constructed by utilizing the relation information in the knowledge graph, so that the key characteristics of each type of post requirements are described. Compared with simple keyword or topic clustering, the method has great advantages in semantic similarity and knowledge richness, so that the clustering result is more accurate, and the personalized post image contains more valuable information.
According to the application, a job seeker image is constructed according to the job seeker resume, the interested post demand cluster is determined according to the history delivery behavior of the job seeker image, and the personalized post image of the cluster is utilized for post recommendation. In the recommending process, semantic relativity, entity importance and structural relation between the job seeker entity and the post key entity are comprehensively considered, so that the recommending result is ensured in individuation and relativity. Compared with collaborative filtering or rule-based recommendation, the method disclosed by the application fuses the multi-aspect information of the image, the post characteristics and the knowledge graph of the job seeker, and has a better recommendation effect.
Drawings
FIG. 1 is a flow chart of a method for data fusion-based information management method provided by the application;
FIG. 2 is a functional block diagram of an information management system based on data fusion according to the present application;
Fig. 3 is a schematic structural diagram of an electronic device according to the present application;
fig. 4 is a schematic structural diagram of a readable storage medium according to the present application.
Detailed Description
For a better understanding of the application, various aspects of the application will be described in more detail with reference to the accompanying drawings. It should be understood that the detailed description is merely illustrative of exemplary embodiments of the application and is not intended to limit the scope of the application in any way. Like reference numerals refer to like elements throughout the specification. The expression "and/or" includes any and all combinations of one or more of the associated listed items.
In the drawings, the size, dimensions and shape of elements have been slightly adjusted for convenience of description. The figures are merely examples and are not drawn to scale. As used herein, the terms "about," "approximately," and similar terms are used as terms of a table approximation, not as terms of a table degree, and are intended to account for inherent deviations in measured or calculated values that will be recognized by one of ordinary skill in the art. In addition, in the present application, the order in which the steps are described does not necessarily indicate the order in which the steps occur in actual practice unless explicitly defined otherwise or the context may be inferred.
It will be further understood that terms such as "comprises," "comprising," "includes," "including," "having," "contains," and/or "containing" are open-ended, rather than closed-ended, terms that specify the presence of the stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof. Furthermore, when a statement such as "at least one of the following" appears after a list of features listed, it modifies the entire list of features rather than just modifying the individual elements in the list. Furthermore, when describing embodiments of the application, use of "may" means "one or more embodiments of the application. Also, the term "exemplary" is intended to refer to an example or illustration.
Unless otherwise defined, all terms (including engineering and technical terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present application pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
In addition, the embodiments of the present application and the features of the embodiments may be combined with each other without collision. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Example 1
As shown in fig. 1, the information management method based on data fusion provided by the application includes:
Step S100: collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types;
The entity type set refers to a set covering all categories in the enterprise transnational recruitment information, and the entity types can include, for example: post name, expertise, and academic level;
The set of relationship types is a set that characterizes semantic associations between entities, and the relationship types may include, for example: requirements, belongings, correlations;
the post demand text of various posts is collected, an entity type set ST and a relation type set GX are defined, the specific method for establishing the domain knowledge graph according to the post demand text comprises the following steps of:
step S110: collecting post demand texts of various posts from an enterprise trans-national recruitment management database, wherein the post demand texts comprise the following contents: post description, job requirements and working experience;
Step S120: a set of entity types ST is defined, ,Representing a B-th entity type in the entity type set; b represents the total number of entity types;
Step S130: a set of relationship types GX of entity types is defined, ,Representing a C-th relationship type in the set of relationship types; c represents the total number of relationship types;
Step S140: establishing knowledge triplet set between entity type and relation type according to post demand text ,Representing the type of the entity of the header,Representing the type of tail entity,Representing the C-th relationship type in the set of relationship types, C ε {1, … …, C }; constructing a domain knowledge graph among entity type sets, relationship type sets and knowledge triplet sets;
The construction method of the domain knowledge graph comprises the following steps: taking entity types as nodes and relation types as edges, and constructing a knowledge triplet set according to the relation types between the entity types extracted from the post demand textThe knowledge triplet set comprises two entity types、And a relationship type connecting the two entity typesHead entity typeThrough the c-th relationship typeConnection to tail entity typeDividing the two entity types into head entity types according to the relation typesAnd tail entity typeConstructing and obtaining a domain knowledge graph;
Step S200: generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes; the candidate entity alias set includes candidate entity aliases;
Generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, utilizing a knowledge triplet set training TransE model, outputting an entity semantic vector matrix and a relation semantic vector matrix, calculating to obtain an entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and utilizing the entity semantic vector matrix to calculate the overall semantic vector of the post demand text, wherein the concrete method comprises the following steps:
step S210: defining entity alias dictionary Wherein, the method comprises the steps of, wherein,An ith entity type representing a set of entity types in the domain knowledge graph,An A-th entity alias representing an i-th entity type; a represents the total number of entity aliases;
step S220: for the post demand text, generating a candidate entity alias set by adopting a dictionary matching method, and for each candidate entity alias Searching all entity types matched with the entity alias dictionary from the entity alias dictionary to obtain a candidate entity set, and constructing an entity link [ ],);
The candidate entity alias refers to an entity which is automatically extracted from the post requirement text and possibly has ambiguity or uncertainty, and needs to be disambiguated and linked to determine the reference of the entity in the domain knowledge graph;
The candidate entity refers to entity name representation in the domain knowledge graph;
for example, a dictionary matching method is adopted to match a candidate entity alias with a software engineer from a post requirement text, and a candidate entity set { "software engineer", "software development engineer", "software test engineer" } matched with the software engineer "can be obtained from an entity alias dictionary according to the relation between the candidate entity alias and an entity type;
step S230: a negative sample triplet is obtained from the knowledge triplet set by adopting a negative sampling method, a TransE model is obtained by training the knowledge triplet set and the negative sample triplet, and an entity semantic vector matrix and a relationship semantic vector matrix are obtained by output;
The negative sample triplet is obtained by a negative sampling method from the knowledge triplet set, and the concrete process of obtaining TransE model by training the knowledge triplet set and the negative sample triplet is as follows:
Step S231: utilizing knowledge triplet sets As training samples, for each knowledge tripletGenerating corresponding negative sampling triples by adopting a negative sampling methodConstructing a negative sample set;
Step S232: initializing semantic vectors of each entity type and relation type in the knowledge triplet set, and defining TransE a loss function of the model as follows:
,
Wherein, gamma is the interval super parameter, Representing the function of the loss of the hinge,Representing a comparison of 0 andTo output a larger value,A scoring function representing a training sample is presented,A scoring function representing a negative sampling triplet; l is a loss function;
the calculation formula of the scoring function of the training sample is as follows:
;
The calculation formula of the scoring function of the negative sampling triplet is as follows:
;
step S233: taking the minimized loss function L as a training target of TransE model, enabling the scoring function of the training sample to be minimum and the scoring function of the negative sampling triplet to be maximum, and outputting to obtain an entity semantic vector matrix and a relationship semantic vector matrix;
the entity semantic vector matrix represents the distribution of all entity types in the domain knowledge graph in a low-dimensional semantic space;
The relation semantic vector matrix represents the distribution of all relation types in the domain knowledge graph in a low-dimensional semantic space;
step S240: entity semantic vector matrix output according to TransE model, for candidate entity aliases Retrieving entity types from entity semantic vector matricesEntity semantic vector of (2)Calculating candidate entity aliasesSemantic relevance scores of the entity types in the candidate entity set are compared with the semantic relevance scores of the entity types in the candidate entity set, and the entity type with the highest semantic relevance score is selected as a result entity of the candidate entity alias;
The candidate entity aliasThe calculation formula of the semantic relevance score with the i-th entity type is: Wherein, the method comprises the steps of, wherein, A semantic vector representing the alias of the candidate entity,Representing a cosine similarity function,AndRepresenting entity semantic vectors, respectivelySemantic vector with candidate entity aliasesIs a die length of (2);
Step S250: for each candidate entity alias in the post demand text, acquiring the entity type with the highest semantic relevance score in the candidate entity set, and acquiring the entity semantic vector of each entity type in the candidate entity set from the entity semantic vector matrix to acquire the entity semantic vector set of the post demand text ,For the entity semantic vector of the D entity type, calculating the overall semantic vector of the entity semantic vector set, wherein the calculation formula is as follows: d is the total number of entity types in the candidate entity set;
Step S260: calculating the whole semantic vector of all post demand texts to obtain a whole semantic vector set ,An overall semantic vector representing the E-th post demand text;
Step S300: taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster;
the specific method for clustering all post demand texts by using the whole semantic vector as a classification feature and adopting a K-means clustering algorithm to obtain K post demand clusters comprises the following steps:
step S310: whole semantic vector set for all post demand text Taking the whole semantic vectors as classification features, randomly selecting H whole semantic vectors as clustering centers of initial post demand clusters;
step S320: for each whole semantic vector, calculating the Euclidean distance from the whole semantic vector to the clustering center, and distributing the whole semantic vector to the post demand clustering cluster corresponding to the nearest clustering center until each whole semantic vector is distributed;
Step S330: calculating the average value of all the whole semantic vectors in each post demand cluster, taking the average value as a new cluster center, and repeating the steps until the whole semantic vectors in each post demand cluster are not changed any more, thereby obtaining K post demand clusters ,K=H;
Each post requirement cluster represents a category of post requirements with similar semanteme;
the specific method for extracting the key entity in each post demand cluster and constructing the personalized post image for each post demand cluster comprises the following steps:
Step S340: for the kth post demand cluster Counting the frequency of result entities of all post demand texts, and extracting F result entities with highest frequency as a key entity set of the post cluster,Representing a post demand class clusterIs the F-th key entity of (a);;
Step S350: for the f-th key entity In the domain knowledge graphAs a central node, dig its adjacent one-hop subgraphs as followsKey entity subgraph for centerObtaining a post demand clusterAll key entity subgraphs of (a)Merging all the key entity subgraphs to obtain a post demand class clusterIs a personalized post image of (a);;
The adjacent one-hop subgraphs refer to all nodes which can be reached by one side in the domain knowledge graph from the central node and the relationship types represented by the sides, and represent key entities and directly related knowledge structures thereof;
step S360: calculating key entities Importance weight of corresponding nodeThe calculation formula is as follows: Wherein, the method comprises the steps of, wherein, For key entities in all post requirement textsIs used for the frequency of (a),Is the sum of the frequencies of all key entities;
Step S370: the relation type between the key entities is the weight of the edge between the f key entity and the h key entity The calculation formula of (2) is as follows: Wherein, the method comprises the steps of, wherein, Entity semantic vectors for the f-th key entity,Entity semantic vectors that are the h-th key entity,In order to connect the relation semantic vector of the edge between the f key entity and the h key entity, the relation semantic vector is obtained by a relation semantic vector matrix output by TransE models,Calculating a symbol for the modulo length;
Step S400: constructing a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker;
The job seeker image is constructed according to the job seeker resume, the post demand text which is historically delivered by the job seeker is obtained to obtain the post demand cluster which is most interesting, the recommendation weight of each key entity is calculated by utilizing the personalized post image of the post demand cluster, and the posts corresponding to M key entities with the highest recommendation weights are recommended to job seekers by the specific method comprising the following steps:
step S410: extracting attribute text in job seeker resume, including: the professional skills, the academic level and the working experience, the attribute text is expressed as the corresponding entity type and relation type in the domain knowledge graph, the entity type of the job seeker and the relation type of the job seeker are obtained, and the image of the job seeker is constructed ;
The method for representing the attribute text as the corresponding entity type and the relationship type in the domain knowledge graph is the same as the method for representing the candidate entity alias in the post demand text as the result entity;
The specific method for representing the attribute text as the corresponding entity type and relationship type in the domain knowledge graph comprises the following steps: obtaining a candidate attribute entity alias set according to the attribute text, wherein the candidate attribute entity alias set is a set containing all entity aliases in the attribute text, which is directly obtained according to the attribute text; finding out the entity type corresponding to each candidate attribute entity alias according to the entity alias dictionary, and generating a candidate attribute entity set, wherein the candidate attribute entity set is a set formed by all entity types corresponding to the candidate attribute entity alias in the entity alias dictionary; obtaining entity semantic vectors corresponding to each entity type in the candidate attribute entity set from an entity semantic vector matrix output by the TransE model, calculating semantic relevance scores of each candidate attribute entity alias and each entity type, selecting the entity type with the highest semantic relevance score as the corresponding entity type of the candidate attribute entity alias in the domain knowledge graph, and taking the corresponding edge of the entity type corresponding to the candidate attribute entity alias in the domain knowledge graph as the corresponding relation type of the attribute text in the domain knowledge graph;
Step S420: acquiring information of post demand text which is historically delivered by a job seeker, counting the times of historical delivery of each post demand cluster by the job seeker, and recording the times of historical delivery of the kth post demand cluster by the job seeker as Calculating the interest degree of the job seeker on the kth post demand cluster, wherein the calculation formula is as follows:, the total number of historical delivery times for job seekers;
step S430: selecting the position demand cluster with the highest interest as the position demand cluster with the highest interest of job seekers, and marking the position demand cluster as the position demand cluster with the highest interest of job seekers ;
Step S440: position demand cluster using most interesting job seekersCorresponding personalized post imagePost demand clusterThe f-th key entity in (a) isCalculating the entity type of each job seekerAnd key entitySemantic relevance scores between the two, and a calculation formula is as follows: Wherein, the method comprises the steps of, wherein, Respectively represent the entity types of job seekersAnd key entitiesIs a function of the entity semantic vector of (a),Respectively represent the entity types of job seekersAnd key entitiesIs a modular length of the entity semantic vector;
The said Obtaining a relation semantic vector matrix output by the TransE model;
step S450: scoring semantic relevance above a score threshold Is taken as a key entitySemantic-related job seeker entity types to obtain related entity setsJ is the key entityA total number of semantically related job seeker entity types; For the J-th and key entity in the related entity set Semantic-related job seeker entity types;
The score threshold Setting is performed by those skilled in the art according to actual needs and experience;
Step S460: calculating job seeker image and personalized post image The recommendation weight of each key entity in the system is calculated according to the following formula: Wherein, the method comprises the steps of, wherein, Clustering for post demandsImportance weight of the f-th key entity of (c),Is the key entityImportance weight of semantically related jth job seeker entity type,Respectively, post demand clusterThe f-th key entity of (2)Entity semantic vectors of the j-th job seeker entity types related to semantics;;
The said AndObtaining importance weights of corresponding nodes in the personalized post images and the job seeker images;
The said Obtaining a relation semantic vector matrix output by the TransE model;
Step S470: will personalize the post image The key entities in the list are ranked from high to low according to recommendation weights, the first M key entities are selected, and posts corresponding to the M key entities are recommended to job seekers.
Example 2
As shown in fig. 2, the information management system based on data fusion provided by the present application includes:
The triplet and atlas construction module is used for collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between the entity type and the relation type, and constructing a domain knowledge atlas according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types;
The whole semantic vector calculation module is used for generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, utilizing a knowledge triplet set training TransE model, outputting an entity semantic vector matrix and a relation semantic vector matrix, calculating to obtain an entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the whole semantic vector of the post demand text by utilizing the entity semantic vector matrix; the candidate entity alias set includes candidate entity aliases;
The clustering and post image construction module is used for clustering all post demand texts by taking the whole semantic vector as a classification characteristic and adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster and constructing a personalized post image for each post demand cluster;
The job seeker post recommending module is used for constructing a job seeker image according to the job seeker resume, acquiring post demand texts which are historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommending weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommending weight to the job seeker.
Example 3
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 3, the present application further provides an electronic device. The electronic device may include one or more processors and one or more memories. In which a memory has stored therein computer readable code which, when executed by one or more processors, may perform the data fusion based information management method as described above.
The method or system according to embodiments of the application may also be implemented by means of the architecture of the electronic device shown in fig. 3. As shown in fig. 3, the electronic device may include a bus, one or more CPUs, read Only Memory (ROM), random Access Memory (RAM), a communication port connected to a network, an input/output component, a hard disk, and the like. A storage device, such as a ROM or hard disk, in an electronic device may store the data fusion-based information management method provided by the present application. The data fusion-based information management method may include, for example: collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types; generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes; the candidate entity alias set includes candidate entity aliases; taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster; building a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker. Further, the electronic device may also include a user interface. Of course, the architecture shown in fig. 3 is merely exemplary, and one or more components of the electronic device shown in fig. 3 may be omitted as may be practical in implementing different devices.
Example 4
Fig. 4 is a schematic diagram of a readable storage medium according to an embodiment of the present application. As shown in fig. 4, is a readable storage medium according to one embodiment of the present application. The computer readable storage medium has computer readable instructions stored thereon. The information management method based on data fusion according to the embodiment of the present application described with reference to the above drawings may be performed when computer readable instructions are executed by a processor. Storage media include, but are not limited to, for example, volatile memory and/or nonvolatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like.
In addition, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, the present application provides a non-transitory machine-readable storage medium storing machine-readable instructions executable by a processor to perform instructions corresponding to the method steps provided by the present application, such as: collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types; generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes; the candidate entity alias set includes candidate entity aliases; taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster; building a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU).
The methods and apparatus, devices of the present application may be implemented in numerous ways. For example, the methods and apparatus, devices of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present application are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present application may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.
In addition, in the foregoing technical solutions provided in the embodiments of the present application, parts consistent with implementation principles of corresponding technical solutions in the prior art are not described in detail, so that redundant descriptions are avoided.
The purpose, technical scheme and beneficial effects of the invention are further described in detail in the detailed description. It is to be understood that the above description is only of specific embodiments of the present invention and is not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. The information management method based on data fusion is characterized by comprising the following steps:
Collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts;
Generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes;
taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster;
constructing a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker;
Generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, utilizing a knowledge triplet set training TransE model, outputting an entity semantic vector matrix and a relation semantic vector matrix, calculating to obtain an entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and utilizing the entity semantic vector matrix to calculate the overall semantic vector of the post demand text, wherein the concrete method comprises the following steps:
Defining entity alias dictionary Wherein, the method comprises the steps of, wherein,An ith entity type representing a set of entity types in the domain knowledge graph,An A-th entity alias representing an i-th entity type; a represents the total number of entity aliases;
For the post demand text, generating a candidate entity alias set by adopting a dictionary matching method, and for each candidate entity alias Searching all entity types matched with the entity alias dictionary from the entity alias dictionary to obtain a candidate entity set, and constructing an entity link;
A negative sample triplet is obtained from the knowledge triplet set by adopting a negative sampling method, a TransE model is obtained by training the knowledge triplet set and the negative sample triplet, and an entity semantic vector matrix and a relationship semantic vector matrix are obtained by output;
Entity semantic vector matrix output according to TransE model, for candidate entity aliases Retrieving entity types from entity semantic vector matricesEntity semantic vector of (2)Calculating candidate entity aliasesSemantic relevance scores of the entity types in the candidate entity set are compared with the semantic relevance scores of the entity types in the candidate entity set, and the entity type with the highest semantic relevance score is selected as a result entity of the candidate entity alias;
The candidate entity aliasThe calculation formula of the semantic relevance score with the i-th entity type is: Wherein, the method comprises the steps of, wherein, A semantic vector representing the alias of the candidate entity,Representing a cosine similarity function,AndRepresenting entity semantic vectors, respectivelySemantic vector with candidate entity aliasesIs a die length of (2);
for each candidate entity alias in the post demand text, acquiring the entity type with the highest semantic relevance score in the candidate entity set, and acquiring the entity semantic vector of each entity type in the candidate entity set from the entity semantic vector matrix to acquire the entity semantic vector set of the post demand text ,For the entity semantic vector of the D entity type, calculating the overall semantic vector of the entity semantic vector set, wherein the calculation formula is as follows: d is the total number of entity types in the candidate entity set;
Calculating the whole semantic vector of all post demand texts to obtain a whole semantic vector set ,And the whole semantic vector of the E-th post demand text is represented.
2. The information management method based on data fusion according to claim 1, wherein the steps of collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts are as follows:
collecting post demand texts of various posts from an enterprise trans-national recruitment management database, wherein the post demand texts comprise the following contents: post description, job requirements and working experience;
A set of entity types ST is defined, ,Representing a B-th entity type in the entity type set; b represents the total number of entity types;
a set of relationship types GX of entity types is defined, ,Representing a C-th relationship type in the set of relationship types; c represents the total number of relationship types;
Establishing knowledge triplet set between entity type and relation type according to post demand text ,Representing the type of the entity of the header,Representing the type of tail entity,Representing the C-th relationship type in the set of relationship types, C ε {1, … …, C }; constructing a domain knowledge graph among entity type sets, relationship type sets and knowledge triplet sets。
3. The information management method based on data fusion according to claim 2, wherein the construction method of the domain knowledge graph is as follows: taking entity types as nodes and relation types as edges, and constructing a knowledge triplet set according to the relation types between the entity types extracted from the post demand textThe knowledge triplet set comprises two entity types、And a relationship type connecting the two entity typesHead entity typeThrough the c-th relationship typeConnection to tail entity typeDividing the two entity types into head entity types according to the relation typesAnd tail entity typeConstructing and obtaining a domain knowledge graph。
4. The information management method based on data fusion according to claim 3, wherein the specific steps of clustering all post demand texts by using the whole semantic vector as a classification feature and adopting a K-means clustering algorithm to obtain K post demand clusters are as follows:
whole semantic vector set for all post demand text Taking the whole semantic vectors as classification features, randomly selecting H whole semantic vectors as clustering centers of initial post demand clusters;
For each whole semantic vector, calculating the Euclidean distance from the whole semantic vector to the clustering center, and distributing the whole semantic vector to the post demand clustering cluster corresponding to the nearest clustering center until each whole semantic vector is distributed;
Calculating the average value of all the whole semantic vectors in each post demand cluster, taking the average value as a new cluster center, and repeating the steps until the whole semantic vectors in each post demand cluster are not changed any more, thereby obtaining K post demand clusters ,K=H。
5. The information management method based on data fusion according to claim 4, wherein the specific method for extracting the key entity in each post demand cluster and constructing the personalized post image for each post demand cluster is as follows:
For the kth post demand cluster Counting the frequency of result entities of all post demand texts, and extracting F result entities with highest frequency as a key entity set of the post cluster,Representing a post demand class clusterIs the F-th key entity of (a);;
for the f-th key entity In the domain knowledge graphAs a central node, dig its adjacent one-hop subgraphs as followsKey entity subgraph for centerObtaining a post demand clusterAll key entity subgraphs of (a)Merging all the key entity subgraphs to obtain a post demand class clusterIs a personalized post image of (a);;
Calculating key entitiesImportance weight of corresponding nodeThe calculation formula is as follows: Wherein, the method comprises the steps of, wherein, For key entities in all post requirement textsIs used for the frequency of (a),Is the sum of the frequencies of all key entities;
The relation type between the key entities is the weight of the edge between the f key entity and the h key entity The calculation formula of (2) is as follows: Wherein, the method comprises the steps of, wherein, Entity semantic vectors for the f-th key entity,Entity semantic vectors that are the h-th key entity,In order to connect the relation semantic vector of the edge between the f key entity and the h key entity, the relation semantic vector is obtained by a relation semantic vector matrix output by TransE models,The sign is calculated for the modulo length.
6. The information management method based on data fusion according to claim 5, wherein the specific method for constructing job seeker images according to job seeker resume, obtaining post demand texts historically delivered by job seekers to obtain post demand clustering clusters of greatest interest, calculating recommendation weights of each key entity by using personalized post images of the post demand clustering clusters, and recommending posts corresponding to M key entities with highest recommendation weights to the job seekers is as follows:
Extracting attribute text in job seeker resume, including: the professional skills, the academic level and the working experience, the attribute text is expressed as the corresponding entity type and relation type in the domain knowledge graph, the entity type of the job seeker and the relation type of the job seeker are obtained, and the image of the job seeker is constructed ;
Acquiring information of post demand text which is historically delivered by a job seeker, counting the times of historical delivery of each post demand cluster by the job seeker, and recording the times of historical delivery of the kth post demand cluster by the job seeker asCalculating the interest degree of the job seeker on the kth post demand cluster, wherein the calculation formula is as follows:, the total number of historical delivery times for job seekers;
Selecting the position demand cluster with the highest interest as the position demand cluster with the highest interest of job seekers, and marking the position demand cluster as the position demand cluster with the highest interest of job seekers ;
Position demand cluster using most interesting job seekersCorresponding personalized post imagePost demand clusterThe f-th key entity in (a) isCalculating the entity type of each job seekerAnd key entitySemantic relevance scores between the two, and a calculation formula is as follows: Wherein, the method comprises the steps of, wherein, Respectively represent the entity types of job seekersAnd key entitiesIs a function of the entity semantic vector of (a),Respectively represent the entity types of job seekersAnd key entitiesIs a modular length of the entity semantic vector;
Scoring semantic relevance above a score threshold Is taken as a key entitySemantic-related job seeker entity types to obtain related entity setsJ is the key entityA total number of semantically related job seeker entity types; For the J-th and key entity in the related entity set Semantic-related job seeker entity types;
calculating job seeker image and personalized post image The recommendation weight of each key entity in the system is calculated according to the following formula: Wherein, the method comprises the steps of, wherein, Clustering for post demandsImportance weight of the f-th key entity of (c),Is the key entityImportance weight of semantically related jth job seeker entity type,Respectively, post demand clusterThe f-th key entity of (2)Entity semantic vectors of the j-th job seeker entity types related to semantics;;
Will personalize the post image The key entities in the list are ranked from high to low according to recommendation weights, the first M key entities are selected, and posts corresponding to the M key entities are recommended to job seekers.
7. An information management system based on data fusion, the system being configured to implement the information management method based on data fusion according to any one of claims 1 to 6, comprising:
The triplet and atlas construction module is used for collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between the entity type and the relation type, and constructing a domain knowledge atlas according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types;
The whole semantic vector calculation module is used for generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, utilizing a knowledge triplet set training TransE model, outputting an entity semantic vector matrix and a relation semantic vector matrix, calculating to obtain an entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the whole semantic vector of the post demand text by utilizing the entity semantic vector matrix; the candidate entity alias set includes candidate entity aliases;
The clustering and post image construction module is used for clustering all post demand texts by taking the whole semantic vector as a classification characteristic and adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster and constructing a personalized post image for each post demand cluster;
The job seeker post recommending module is used for constructing a job seeker image according to the job seeker resume, acquiring post demand texts which are historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommending weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommending weight to the job seeker.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, when executing the program, to implement the steps in the data fusion based information management method according to any of claims 1-6.
9. A readable storage medium, characterized in that it stores a computer program adapted to be loaded by a processor for performing the steps of the data fusion based information management method according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410897206.2A CN118446662B (en) | 2024-07-05 | 2024-07-05 | Information management method and system based on data fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410897206.2A CN118446662B (en) | 2024-07-05 | 2024-07-05 | Information management method and system based on data fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118446662A CN118446662A (en) | 2024-08-06 |
CN118446662B true CN118446662B (en) | 2024-10-01 |
Family
ID=92314734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410897206.2A Active CN118446662B (en) | 2024-07-05 | 2024-07-05 | Information management method and system based on data fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118446662B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113886604A (en) * | 2021-10-20 | 2022-01-04 | 前锦网络信息技术(上海)有限公司 | Job knowledge map generation method and system |
CN116127186A (en) * | 2022-12-09 | 2023-05-16 | 之江实验室 | Knowledge-graph-based individual matching recommendation method and system for person sentry |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115018453B (en) * | 2022-05-23 | 2024-04-09 | 电子科技大学 | Automatic post talent portrait generation method |
US20240020645A1 (en) * | 2022-07-12 | 2024-01-18 | iCIMS, Inc. | Methods and apparatus for generating behaviorally anchored rating scales (bars) for evaluating job interview candidate |
CN115098791B (en) * | 2022-08-24 | 2023-01-10 | 中建电子商务有限责任公司 | Real-time post recommendation method and system |
CN116028722B (en) * | 2023-03-31 | 2023-06-16 | 广州南方学院 | Post recommendation method and device based on word vector and computer equipment |
CN117541202A (en) * | 2023-11-14 | 2024-02-09 | 中电科新型智慧城市研究院有限公司 | Employment recommendation system based on multi-mode knowledge graph and pre-training large model fusion |
-
2024
- 2024-07-05 CN CN202410897206.2A patent/CN118446662B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113886604A (en) * | 2021-10-20 | 2022-01-04 | 前锦网络信息技术(上海)有限公司 | Job knowledge map generation method and system |
CN116127186A (en) * | 2022-12-09 | 2023-05-16 | 之江实验室 | Knowledge-graph-based individual matching recommendation method and system for person sentry |
Also Published As
Publication number | Publication date |
---|---|
CN118446662A (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110502621B (en) | Question answering method, question answering device, computer equipment and storage medium | |
CN110162593B (en) | Search result processing and similarity model training method and device | |
CN106202256B (en) | Web image retrieval method based on semantic propagation and mixed multi-instance learning | |
CN111444320B (en) | Text retrieval method and device, computer equipment and storage medium | |
WO2022142027A1 (en) | Knowledge graph-based fuzzy matching method and apparatus, computer device, and storage medium | |
CN108038183B (en) | Structured entity recording method, device, server and storage medium | |
US11768869B2 (en) | Knowledge-derived search suggestion | |
Bian et al. | Multimedia summarization for trending topics in microblogs | |
CN110059160B (en) | End-to-end context-based knowledge base question-answering method and device | |
CN104615767B (en) | Training method, search processing method and the device of searching order model | |
WO2020062770A1 (en) | Method and apparatus for constructing domain dictionary, and device and storage medium | |
Kuo et al. | Unsupervised semantic feature discovery for image object retrieval and tag refinement | |
CN112819023B (en) | Sample set acquisition method, device, computer equipment and storage medium | |
CN107180045A (en) | A kind of internet text contains the abstracting method of geographical entity relation | |
CN111310023B (en) | Personalized search method and system based on memory network | |
CN111400584A (en) | Association word recommendation method and device, computer equipment and storage medium | |
CN111291177A (en) | Information processing method and device and computer storage medium | |
CN110647904A (en) | Cross-modal retrieval method and system based on unmarked data migration | |
CN109145143A (en) | Sequence constraints hash algorithm in image retrieval | |
CN108595546B (en) | Semi-supervision-based cross-media feature learning retrieval method | |
CN109948140B (en) | Word vector embedding method and device | |
CN106570196B (en) | Video program searching method and device | |
CN112434533A (en) | Entity disambiguation method, apparatus, electronic device, and computer-readable storage medium | |
CN118503547B (en) | Artificial intelligence-based person post intelligent matching method and system | |
CN114491079A (en) | Knowledge graph construction and query method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |