CN118446662B

CN118446662B - Information management method and system based on data fusion

Info

Publication number: CN118446662B
Application number: CN202410897206.2A
Authority: CN
Inventors: 郭伟; 王闽东
Original assignee: Hangzhou Jingjia Technology Co ltd
Current assignee: Hangzhou Jingjia Technology Co ltd
Priority date: 2024-07-05
Filing date: 2024-07-05
Publication date: 2024-10-01
Anticipated expiration: 2044-07-05
Also published as: CN118446662A

Abstract

The application relates to an information management method and system based on data fusion, which relate to the technical field of recruitment information management and comprise the steps of establishing a knowledge triplet set, constructing a domain knowledge map, generating a candidate entity alias set for a position demand text, constructing a candidate entity set for each candidate entity alias, outputting an entity semantic vector matrix and a relation semantic vector matrix by utilizing a TransE model, calculating the entity type with the highest semantic relevance score in the candidate entity set as a result entity, calculating the whole semantic vector of the position demand text, clustering all position demand texts to obtain a position demand cluster, extracting key entities of the position demand cluster, constructing a personalized position image, constructing a job seeker image, acquiring the position demand cluster which is most interested by the job seeker, calculating the recommendation weight of each key entity, recommending positions corresponding to M key entities with the highest recommendation weight to the job seeker, and realizing the nationwide recruitment.

Description

Information management method and system based on data fusion

Technical Field

The application relates to the technical field of recruitment information management, in particular to an information management method and system based on data fusion.

Background

With the continuous development of economic globalization and the internet, the development of the internet has promoted explosive growth of information and also has promoted the development of economic globalization, wherein an important aspect of economic globalization is the continuous growth of the number and scale of nationwide enterprises. With the growth of nationwide enterprises, how to effectively manage mass recruitment data from different countries and regions has become an important topic of nationwide enterprise human resource management. The traditional recruitment information management method is difficult to adapt to the requirement of the big data age, and innovative technical means are needed to promote the informatization and intelligence level of the national recruitment.

The technical scheme of the conventional transnational recruitment information management mainly comprises the following steps: traditional machine learning models such as collaborative filtering, matrix decomposition, etc. can only make recommendations using shallow features of candidates or positions, ignoring deep associations between multidimensional information such as skills, industry, enterprise contexts, etc.

The requirements of different types of recruitment posts on candidates are quite different, how to deeply mine the matching degree between the candidates and enterprises according to post descriptions and enterprise requirements, automatically construct a personalized candidate evaluation index system, dynamically optimize index weights by using a machine learning algorithm, and improve the accuracy of human post matching, and is a problem with practical application value.

The Chinese patent name with publication number CN117893184A is a human resource information management method based on big data, which discloses the following steps: establishing a human resource information management database according to recruitment requirements of the recruitment company and personal resume data; presetting an input matching data standard of recruitment requirements of a recruitment company and personal resume data; retrieving corresponding data in a human resource information management database according to the recruitment requirement and the personal resume data which meet the input standards; the human resource information management database adopts various data to carry out correlation analysis, and comprehensively judges the human resource information data with high matching degree. The correlation of various data can be rapidly analyzed in the prior art, and then the most desirable human resource information of a user is obtained, but the patent adopts a keyword retrieval mode to match recruitment requirements and resume, ignores the correlation of a semantic layer, and has limited matching accuracy and diversity.

Disclosure of Invention

The present application aims to solve at least one of the technical problems in the related art to some extent. Therefore, the application aims to provide the information management method and the system based on data fusion, which realize the intellectualization and refinement of the transnational recruitment.

One aspect of the present application provides an information management method based on data fusion, including:

Step S100: collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types;

the post demand text of various posts is collected, an entity type set ST and a relation type set GX are defined, the specific method for establishing the domain knowledge graph according to the post demand text comprises the following steps of:

step S110: collecting post demand texts of various posts from an enterprise trans-national recruitment management database, wherein the post demand texts comprise the following contents: post description, job requirements and working experience;

Step S120: a set of entity types ST is defined, ，Representing a B-th entity type in the entity type set; b represents the total number of entity types;

Step S130: a set of relationship types GX of entity types is defined, ，Representing a C-th relationship type in the set of relationship types; c represents the total number of relationship types;

Step S140: establishing knowledge triplet set between entity type and relation type according to post demand text ，Representing the type of the entity of the header,Representing the type of tail entity,Representing the C-th relationship type in the set of relationship types, C ε {1, … …, C }; constructing a domain knowledge graph among entity type sets, relationship type sets and knowledge triplet sets；

The construction method of the domain knowledge graph comprises the following steps: taking entity types as nodes and relation types as edges, and constructing a knowledge triplet set according to the relation types between the entity types extracted from the post demand textThe knowledge triplet set comprises two entity types、And a relationship type connecting the two entity typesHead entity typeThrough the c-th relationship typeConnection to tail entity typeDividing the two entity types into head entity types according to the relation typesAnd tail entity typeConstructing and obtaining a domain knowledge graph；

Step S200: generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes; the candidate entity alias set includes candidate entity aliases;

Generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, utilizing a knowledge triplet set training TransE model, outputting an entity semantic vector matrix and a relation semantic vector matrix, calculating to obtain an entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and utilizing the entity semantic vector matrix to calculate the overall semantic vector of the post demand text, wherein the concrete method comprises the following steps:

step S210: defining entity alias dictionary Wherein, the method comprises the steps of, wherein,An ith entity type representing a set of entity types in the domain knowledge graph,An A-th entity alias representing an i-th entity type; a represents the total number of entity aliases;

step S220: for the post demand text, generating a candidate entity alias set by adopting a dictionary matching method, and for each candidate entity alias Searching all entity types matched with the entity alias dictionary from the entity alias dictionary to obtain a candidate entity set, and constructing an entity link [ ]，)；

Step S230: a negative sample triplet is obtained from the knowledge triplet set by adopting a negative sampling method, a TransE model is obtained by training the knowledge triplet set and the negative sample triplet, and an entity semantic vector matrix and a relationship semantic vector matrix are obtained by output;

step S240: entity semantic vector matrix output according to TransE model, for candidate entity aliases Retrieving entity types from entity semantic vector matricesEntity semantic vector of (2)Calculating candidate entity aliasesSemantic relevance scores of the entity types in the candidate entity set are compared with the semantic relevance scores of the entity types in the candidate entity set, and the entity type with the highest semantic relevance score is selected as a result entity of the candidate entity alias；

The candidate entity aliasThe calculation formula of the semantic relevance score with the i-th entity type is: Wherein, the method comprises the steps of, wherein, A semantic vector representing the alias of the candidate entity,Representing a cosine similarity function,AndRepresenting entity semantic vectors, respectivelySemantic vector with candidate entity aliasesIs a die length of (2);

Step S250: for each candidate entity alias in the post demand text, acquiring the entity type with the highest semantic relevance score in the candidate entity set, and acquiring the entity semantic vector of each entity type in the candidate entity set from the entity semantic vector matrix to acquire the entity semantic vector set of the post demand text ，For the entity semantic vector of the D entity type, calculating the overall semantic vector of the entity semantic vector set, wherein the calculation formula is as follows: d is the total number of entity types in the candidate entity set;

Step S260: calculating the whole semantic vector of all post demand texts to obtain a whole semantic vector set ，An overall semantic vector representing the E-th post demand text;

Step S300: taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster;

the specific method for clustering all post demand texts by using the whole semantic vector as a classification feature and adopting a K-means clustering algorithm to obtain K post demand clusters comprises the following steps:

step S310: whole semantic vector set for all post demand text Taking the whole semantic vectors as classification features, randomly selecting H whole semantic vectors as clustering centers of initial post demand clusters;

step S320: for each whole semantic vector, calculating the Euclidean distance from the whole semantic vector to the clustering center, and distributing the whole semantic vector to the post demand clustering cluster corresponding to the nearest clustering center until each whole semantic vector is distributed;

step S330: calculating the average value of all the whole semantic vectors in each post demand cluster, taking the average value as a new cluster center, and repeating the steps S320-S330 until the whole semantic vectors in each post demand cluster are not changed any more, thereby obtaining K post demand clusters ，K=H；

The specific method for extracting the key entity in each post demand cluster and constructing the personalized post image for each post demand cluster comprises the following steps:

Step S340: for the kth post demand cluster Counting the frequency of result entities of all post demand texts, and extracting F result entities with highest frequency as a key entity set of the post cluster，Representing a post demand class clusterIs the F-th key entity of (a);；

Step S350: for the f-th key entity In the domain knowledge graphAs a central node, dig its adjacent one-hop subgraphs as followsKey entity subgraph for centerObtaining a post demand clusterAll key entity subgraphs of (a)Merging all the key entity subgraphs to obtain a post demand class clusterIs a personalized post image of (a)；；

Step S360: calculating key entitiesImportance weight of corresponding nodeThe calculation formula is as follows: Wherein, the method comprises the steps of, wherein, For key entities in all post requirement textsIs used for the frequency of (a),Is the sum of the frequencies of all key entities;

Step S370: the relation type between the key entities is the weight of the edge between the f key entity and the h key entity The calculation formula of (2) is as follows: Wherein, the method comprises the steps of, wherein, Entity semantic vectors for the f-th key entity,Entity semantic vectors that are the h-th key entity,In order to connect the relation semantic vector of the edge between the f key entity and the h key entity, the relation semantic vector is obtained by a relation semantic vector matrix output by TransE models,Calculating a symbol for the modulo length;

Step S400: constructing a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker;

The job seeker image is constructed according to the job seeker resume, the post demand text which is historically delivered by the job seeker is obtained to obtain the post demand cluster which is most interesting, the recommendation weight of each key entity is calculated by utilizing the personalized post image of the post demand cluster, and the posts corresponding to M key entities with the highest recommendation weights are recommended to job seekers by the specific method comprising the following steps:

step S410: extracting attribute text in job seeker resume, including: the professional skills, the academic level and the working experience, the attribute text is expressed as the corresponding entity type and relation type in the domain knowledge graph, the entity type of the job seeker and the relation type of the job seeker are obtained, and the image of the job seeker is constructed ；

Step S420: acquiring information of post demand text which is historically delivered by a job seeker, counting the times of historical delivery of each post demand cluster by the job seeker, and recording the times of historical delivery of the kth post demand cluster by the job seeker asCalculating the interest degree of the job seeker on the kth post demand cluster, wherein the calculation formula is as follows:， the total number of historical delivery times for job seekers;

step S430: selecting the position demand cluster with the highest interest as the position demand cluster with the highest interest of job seekers, and marking the position demand cluster as the position demand cluster with the highest interest of job seekers ；

Step S440: position demand cluster using most interesting job seekersCorresponding personalized post imagePost demand clusterThe f-th key entity in (a) isCalculating the entity type of each job seekerAnd key entitySemantic relevance scores between the two, and a calculation formula is as follows: Wherein, the method comprises the steps of, wherein, Respectively represent the entity types of job seekersAnd key entitiesIs a function of the entity semantic vector of (a),Respectively represent the entity types of job seekersAnd key entitiesIs a modular length of the entity semantic vector;

step S450: scoring semantic relevance above a score threshold Is taken as a key entitySemantic-related job seeker entity types to obtain related entity setsJ is the key entityA total number of semantically related job seeker entity types; For the J-th and key entity in the related entity set Semantic-related job seeker entity types;

Step S460: calculating job seeker image and personalized post image The recommendation weight of each key entity in the system is calculated according to the following formula: Wherein, the method comprises the steps of, wherein, Clustering for post demandsImportance weight of the f-th key entity of (c),Is the key entityImportance weight of semantically related jth job seeker entity type,Respectively, post demand clusterThe f-th key entity of (2)Entity semantic vectors of the j-th job seeker entity types related to semantics;；

Step S470: will personalize the post image The key entities in the list are ranked from high to low according to recommendation weights, the first M key entities are selected, and posts corresponding to the M key entities are recommended to job seekers.

One aspect of the present application provides an information management system based on data fusion, including:

The triplet and atlas construction module is used for collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between the entity type and the relation type, and constructing a domain knowledge atlas according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types;

The whole semantic vector calculation module is used for generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, utilizing a knowledge triplet set training TransE model, outputting an entity semantic vector matrix and a relation semantic vector matrix, calculating to obtain an entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the whole semantic vector of the post demand text by utilizing the entity semantic vector matrix; the candidate entity alias set includes candidate entity aliases;

The clustering and post image construction module is used for clustering all post demand texts by taking the whole semantic vector as a classification characteristic and adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster and constructing a personalized post image for each post demand cluster;

The job seeker post recommending module is used for constructing a job seeker image according to the job seeker resume, acquiring post demand texts which are historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommending weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommending weight to the job seeker.

An aspect of the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs steps in a data fusion based information management method.

An aspect of the present application provides a readable storage medium storing a computer program adapted to be loaded by a processor for performing steps in a data fusion based information management method.

Compared with the prior art, the information management method based on data fusion provided by the application has the following advantages:

And defining entity types and relation types by using post demand texts in an enterprise trans-national recruitment management database, and constructing a knowledge triplet, so that a domain knowledge map which comprehensively reflects post demand characteristics is established. Compared with the traditional post recommendation method based on keyword matching, the knowledge graph can more deeply and accurately represent semantic information of post requirements.

For the candidate entity aliases in the post demand text, the application generates a candidate entity set by using the entity aliases dictionary, and calculates the semantic relevance between the candidate entity aliases and the candidate entity by using TransE model, thereby realizing the link and disambiguation of the candidate entity. Compared with a rule or statistics-based entity linking method, the method can measure the correlation between the entity aliases and the entities in the semantic space of the knowledge graph, and the link and disambiguation accuracy is higher.

The application utilizes the whole semantic vector of the post demand text to carry out K-means clustering, and obtains the post demand cluster with similar semantic. On the basis, the key entity of each cluster is extracted, and a personalized post image is constructed by utilizing the relation information in the knowledge graph, so that the key characteristics of each type of post requirements are described. Compared with simple keyword or topic clustering, the method has great advantages in semantic similarity and knowledge richness, so that the clustering result is more accurate, and the personalized post image contains more valuable information.

According to the application, a job seeker image is constructed according to the job seeker resume, the interested post demand cluster is determined according to the history delivery behavior of the job seeker image, and the personalized post image of the cluster is utilized for post recommendation. In the recommending process, semantic relativity, entity importance and structural relation between the job seeker entity and the post key entity are comprehensively considered, so that the recommending result is ensured in individuation and relativity. Compared with collaborative filtering or rule-based recommendation, the method disclosed by the application fuses the multi-aspect information of the image, the post characteristics and the knowledge graph of the job seeker, and has a better recommendation effect.

Drawings

FIG. 1 is a flow chart of a method for data fusion-based information management method provided by the application;

FIG. 2 is a functional block diagram of an information management system based on data fusion according to the present application;

Fig. 3 is a schematic structural diagram of an electronic device according to the present application;

fig. 4 is a schematic structural diagram of a readable storage medium according to the present application.

Detailed Description

For a better understanding of the application, various aspects of the application will be described in more detail with reference to the accompanying drawings. It should be understood that the detailed description is merely illustrative of exemplary embodiments of the application and is not intended to limit the scope of the application in any way. Like reference numerals refer to like elements throughout the specification. The expression "and/or" includes any and all combinations of one or more of the associated listed items.

In the drawings, the size, dimensions and shape of elements have been slightly adjusted for convenience of description. The figures are merely examples and are not drawn to scale. As used herein, the terms "about," "approximately," and similar terms are used as terms of a table approximation, not as terms of a table degree, and are intended to account for inherent deviations in measured or calculated values that will be recognized by one of ordinary skill in the art. In addition, in the present application, the order in which the steps are described does not necessarily indicate the order in which the steps occur in actual practice unless explicitly defined otherwise or the context may be inferred.

It will be further understood that terms such as "comprises," "comprising," "includes," "including," "having," "contains," and/or "containing" are open-ended, rather than closed-ended, terms that specify the presence of the stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof. Furthermore, when a statement such as "at least one of the following" appears after a list of features listed, it modifies the entire list of features rather than just modifying the individual elements in the list. Furthermore, when describing embodiments of the application, use of "may" means "one or more embodiments of the application. Also, the term "exemplary" is intended to refer to an example or illustration.

Unless otherwise defined, all terms (including engineering and technical terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present application pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In addition, the embodiments of the present application and the features of the embodiments may be combined with each other without collision. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Example 1

As shown in fig. 1, the information management method based on data fusion provided by the application includes:

The entity type set refers to a set covering all categories in the enterprise transnational recruitment information, and the entity types can include, for example: post name, expertise, and academic level;

The set of relationship types is a set that characterizes semantic associations between entities, and the relationship types may include, for example: requirements, belongings, correlations;

The candidate entity alias refers to an entity which is automatically extracted from the post requirement text and possibly has ambiguity or uncertainty, and needs to be disambiguated and linked to determine the reference of the entity in the domain knowledge graph;

The candidate entity refers to entity name representation in the domain knowledge graph;

for example, a dictionary matching method is adopted to match a candidate entity alias with a software engineer from a post requirement text, and a candidate entity set { "software engineer", "software development engineer", "software test engineer" } matched with the software engineer "can be obtained from an entity alias dictionary according to the relation between the candidate entity alias and an entity type;

The negative sample triplet is obtained by a negative sampling method from the knowledge triplet set, and the concrete process of obtaining TransE model by training the knowledge triplet set and the negative sample triplet is as follows:

Step S231: utilizing knowledge triplet sets As training samples, for each knowledge tripletGenerating corresponding negative sampling triples by adopting a negative sampling methodConstructing a negative sample set；

Step S232: initializing semantic vectors of each entity type and relation type in the knowledge triplet set, and defining TransE a loss function of the model as follows:

，

Wherein, gamma is the interval super parameter, Representing the function of the loss of the hinge,Representing a comparison of 0 andTo output a larger value,A scoring function representing a training sample is presented,A scoring function representing a negative sampling triplet; l is a loss function;

the calculation formula of the scoring function of the training sample is as follows:

；

The calculation formula of the scoring function of the negative sampling triplet is as follows:

；

step S233: taking the minimized loss function L as a training target of TransE model, enabling the scoring function of the training sample to be minimum and the scoring function of the negative sampling triplet to be maximum, and outputting to obtain an entity semantic vector matrix and a relationship semantic vector matrix;

the entity semantic vector matrix represents the distribution of all entity types in the domain knowledge graph in a low-dimensional semantic space;

The relation semantic vector matrix represents the distribution of all relation types in the domain knowledge graph in a low-dimensional semantic space;

Step S330: calculating the average value of all the whole semantic vectors in each post demand cluster, taking the average value as a new cluster center, and repeating the steps until the whole semantic vectors in each post demand cluster are not changed any more, thereby obtaining K post demand clusters ，K=H；

Each post requirement cluster represents a category of post requirements with similar semanteme;

The adjacent one-hop subgraphs refer to all nodes which can be reached by one side in the domain knowledge graph from the central node and the relationship types represented by the sides, and represent key entities and directly related knowledge structures thereof;

step S360: calculating key entities Importance weight of corresponding nodeThe calculation formula is as follows: Wherein, the method comprises the steps of, wherein, For key entities in all post requirement textsIs used for the frequency of (a),Is the sum of the frequencies of all key entities;

The method for representing the attribute text as the corresponding entity type and the relationship type in the domain knowledge graph is the same as the method for representing the candidate entity alias in the post demand text as the result entity;

The specific method for representing the attribute text as the corresponding entity type and relationship type in the domain knowledge graph comprises the following steps: obtaining a candidate attribute entity alias set according to the attribute text, wherein the candidate attribute entity alias set is a set containing all entity aliases in the attribute text, which is directly obtained according to the attribute text; finding out the entity type corresponding to each candidate attribute entity alias according to the entity alias dictionary, and generating a candidate attribute entity set, wherein the candidate attribute entity set is a set formed by all entity types corresponding to the candidate attribute entity alias in the entity alias dictionary; obtaining entity semantic vectors corresponding to each entity type in the candidate attribute entity set from an entity semantic vector matrix output by the TransE model, calculating semantic relevance scores of each candidate attribute entity alias and each entity type, selecting the entity type with the highest semantic relevance score as the corresponding entity type of the candidate attribute entity alias in the domain knowledge graph, and taking the corresponding edge of the entity type corresponding to the candidate attribute entity alias in the domain knowledge graph as the corresponding relation type of the attribute text in the domain knowledge graph;

Step S420: acquiring information of post demand text which is historically delivered by a job seeker, counting the times of historical delivery of each post demand cluster by the job seeker, and recording the times of historical delivery of the kth post demand cluster by the job seeker as Calculating the interest degree of the job seeker on the kth post demand cluster, wherein the calculation formula is as follows:， the total number of historical delivery times for job seekers;

The said Obtaining a relation semantic vector matrix output by the TransE model;

The score threshold Setting is performed by those skilled in the art according to actual needs and experience;

The said AndObtaining importance weights of corresponding nodes in the personalized post images and the job seeker images;

Example 2

As shown in fig. 2, the information management system based on data fusion provided by the present application includes:

Example 3

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 3, the present application further provides an electronic device. The electronic device may include one or more processors and one or more memories. In which a memory has stored therein computer readable code which, when executed by one or more processors, may perform the data fusion based information management method as described above.

The method or system according to embodiments of the application may also be implemented by means of the architecture of the electronic device shown in fig. 3. As shown in fig. 3, the electronic device may include a bus, one or more CPUs, read Only Memory (ROM), random Access Memory (RAM), a communication port connected to a network, an input/output component, a hard disk, and the like. A storage device, such as a ROM or hard disk, in an electronic device may store the data fusion-based information management method provided by the present application. The data fusion-based information management method may include, for example: collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types; generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes; the candidate entity alias set includes candidate entity aliases; taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster; building a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker. Further, the electronic device may also include a user interface. Of course, the architecture shown in fig. 3 is merely exemplary, and one or more components of the electronic device shown in fig. 3 may be omitted as may be practical in implementing different devices.

Example 4

Fig. 4 is a schematic diagram of a readable storage medium according to an embodiment of the present application. As shown in fig. 4, is a readable storage medium according to one embodiment of the present application. The computer readable storage medium has computer readable instructions stored thereon. The information management method based on data fusion according to the embodiment of the present application described with reference to the above drawings may be performed when computer readable instructions are executed by a processor. Storage media include, but are not limited to, for example, volatile memory and/or nonvolatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like.

In addition, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, the present application provides a non-transitory machine-readable storage medium storing machine-readable instructions executable by a processor to perform instructions corresponding to the method steps provided by the present application, such as: collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts; the set of entity types includes entity types, and the set of relationship types includes relationship types; generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes; the candidate entity alias set includes candidate entity aliases; taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster; building a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU).

The methods and apparatus, devices of the present application may be implemented in numerous ways. For example, the methods and apparatus, devices of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present application are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present application may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.

In addition, in the foregoing technical solutions provided in the embodiments of the present application, parts consistent with implementation principles of corresponding technical solutions in the prior art are not described in detail, so that redundant descriptions are avoided.

The purpose, technical scheme and beneficial effects of the invention are further described in detail in the detailed description. It is to be understood that the above description is only of specific embodiments of the present invention and is not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The information management method based on data fusion is characterized by comprising the following steps:

Collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts;

Generating a candidate entity alias set for the post demand text, constructing a candidate entity set for each candidate entity alias, training TransE models by using the knowledge triplet set, outputting entity semantic vector matrixes and relation semantic vector matrixes, calculating to obtain the entity type with the highest semantic relevance score in the candidate entity set as a result entity of the candidate entity alias, and calculating the overall semantic vector of the post demand text by using the entity semantic vector matrixes;

taking the whole semantic vector as a classification feature, clustering all post demand texts by adopting a K-means clustering algorithm to obtain K post demand clusters, extracting key entities in each post demand cluster, and constructing a personalized post image for each post demand cluster;

constructing a job seeker image according to the job seeker resume, acquiring a post demand text which is historically delivered by the job seeker to obtain a post demand cluster which is most interesting, calculating the recommended weight of each key entity by utilizing the personalized post image of the post demand cluster, and recommending posts corresponding to M key entities with the highest recommended weights to the job seeker;

Defining entity alias dictionary Wherein, the method comprises the steps of, wherein,An ith entity type representing a set of entity types in the domain knowledge graph,An A-th entity alias representing an i-th entity type; a represents the total number of entity aliases;

For the post demand text, generating a candidate entity alias set by adopting a dictionary matching method, and for each candidate entity alias Searching all entity types matched with the entity alias dictionary from the entity alias dictionary to obtain a candidate entity set, and constructing an entity link；

A negative sample triplet is obtained from the knowledge triplet set by adopting a negative sampling method, a TransE model is obtained by training the knowledge triplet set and the negative sample triplet, and an entity semantic vector matrix and a relationship semantic vector matrix are obtained by output;

Entity semantic vector matrix output according to TransE model, for candidate entity aliases Retrieving entity types from entity semantic vector matricesEntity semantic vector of (2)Calculating candidate entity aliasesSemantic relevance scores of the entity types in the candidate entity set are compared with the semantic relevance scores of the entity types in the candidate entity set, and the entity type with the highest semantic relevance score is selected as a result entity of the candidate entity alias；

for each candidate entity alias in the post demand text, acquiring the entity type with the highest semantic relevance score in the candidate entity set, and acquiring the entity semantic vector of each entity type in the candidate entity set from the entity semantic vector matrix to acquire the entity semantic vector set of the post demand text ，For the entity semantic vector of the D entity type, calculating the overall semantic vector of the entity semantic vector set, wherein the calculation formula is as follows: d is the total number of entity types in the candidate entity set;

Calculating the whole semantic vector of all post demand texts to obtain a whole semantic vector set ，And the whole semantic vector of the E-th post demand text is represented.

2. The information management method based on data fusion according to claim 1, wherein the steps of collecting post demand texts of various posts, defining an entity type set ST and a relation type set GX, establishing a knowledge triplet set between entity types and relation types, and constructing a domain knowledge graph according to the post demand texts are as follows:

collecting post demand texts of various posts from an enterprise trans-national recruitment management database, wherein the post demand texts comprise the following contents: post description, job requirements and working experience;

A set of entity types ST is defined, ，Representing a B-th entity type in the entity type set; b represents the total number of entity types;

a set of relationship types GX of entity types is defined, ，Representing a C-th relationship type in the set of relationship types; c represents the total number of relationship types;

Establishing knowledge triplet set between entity type and relation type according to post demand text ，Representing the type of the entity of the header,Representing the type of tail entity,Representing the C-th relationship type in the set of relationship types, C ε {1, … …, C }; constructing a domain knowledge graph among entity type sets, relationship type sets and knowledge triplet sets。

3. The information management method based on data fusion according to claim 2, wherein the construction method of the domain knowledge graph is as follows: taking entity types as nodes and relation types as edges, and constructing a knowledge triplet set according to the relation types between the entity types extracted from the post demand textThe knowledge triplet set comprises two entity types、And a relationship type connecting the two entity typesHead entity typeThrough the c-th relationship typeConnection to tail entity typeDividing the two entity types into head entity types according to the relation typesAnd tail entity typeConstructing and obtaining a domain knowledge graph。

4. The information management method based on data fusion according to claim 3, wherein the specific steps of clustering all post demand texts by using the whole semantic vector as a classification feature and adopting a K-means clustering algorithm to obtain K post demand clusters are as follows:

whole semantic vector set for all post demand text Taking the whole semantic vectors as classification features, randomly selecting H whole semantic vectors as clustering centers of initial post demand clusters;

For each whole semantic vector, calculating the Euclidean distance from the whole semantic vector to the clustering center, and distributing the whole semantic vector to the post demand clustering cluster corresponding to the nearest clustering center until each whole semantic vector is distributed;

Calculating the average value of all the whole semantic vectors in each post demand cluster, taking the average value as a new cluster center, and repeating the steps until the whole semantic vectors in each post demand cluster are not changed any more, thereby obtaining K post demand clusters ，K=H。

5. The information management method based on data fusion according to claim 4, wherein the specific method for extracting the key entity in each post demand cluster and constructing the personalized post image for each post demand cluster is as follows:

For the kth post demand cluster Counting the frequency of result entities of all post demand texts, and extracting F result entities with highest frequency as a key entity set of the post cluster，Representing a post demand class clusterIs the F-th key entity of (a);；

for the f-th key entity In the domain knowledge graphAs a central node, dig its adjacent one-hop subgraphs as followsKey entity subgraph for centerObtaining a post demand clusterAll key entity subgraphs of (a)Merging all the key entity subgraphs to obtain a post demand class clusterIs a personalized post image of (a)；；

Calculating key entitiesImportance weight of corresponding nodeThe calculation formula is as follows: Wherein, the method comprises the steps of, wherein, For key entities in all post requirement textsIs used for the frequency of (a),Is the sum of the frequencies of all key entities;

The relation type between the key entities is the weight of the edge between the f key entity and the h key entity The calculation formula of (2) is as follows: Wherein, the method comprises the steps of, wherein, Entity semantic vectors for the f-th key entity,Entity semantic vectors that are the h-th key entity,In order to connect the relation semantic vector of the edge between the f key entity and the h key entity, the relation semantic vector is obtained by a relation semantic vector matrix output by TransE models,The sign is calculated for the modulo length.

6. The information management method based on data fusion according to claim 5, wherein the specific method for constructing job seeker images according to job seeker resume, obtaining post demand texts historically delivered by job seekers to obtain post demand clustering clusters of greatest interest, calculating recommendation weights of each key entity by using personalized post images of the post demand clustering clusters, and recommending posts corresponding to M key entities with highest recommendation weights to the job seekers is as follows:

Extracting attribute text in job seeker resume, including: the professional skills, the academic level and the working experience, the attribute text is expressed as the corresponding entity type and relation type in the domain knowledge graph, the entity type of the job seeker and the relation type of the job seeker are obtained, and the image of the job seeker is constructed ；

Acquiring information of post demand text which is historically delivered by a job seeker, counting the times of historical delivery of each post demand cluster by the job seeker, and recording the times of historical delivery of the kth post demand cluster by the job seeker asCalculating the interest degree of the job seeker on the kth post demand cluster, wherein the calculation formula is as follows:， the total number of historical delivery times for job seekers;

Selecting the position demand cluster with the highest interest as the position demand cluster with the highest interest of job seekers, and marking the position demand cluster as the position demand cluster with the highest interest of job seekers ；

Position demand cluster using most interesting job seekersCorresponding personalized post imagePost demand clusterThe f-th key entity in (a) isCalculating the entity type of each job seekerAnd key entitySemantic relevance scores between the two, and a calculation formula is as follows: Wherein, the method comprises the steps of, wherein, Respectively represent the entity types of job seekersAnd key entitiesIs a function of the entity semantic vector of (a),Respectively represent the entity types of job seekersAnd key entitiesIs a modular length of the entity semantic vector;

Scoring semantic relevance above a score threshold Is taken as a key entitySemantic-related job seeker entity types to obtain related entity setsJ is the key entityA total number of semantically related job seeker entity types; For the J-th and key entity in the related entity set Semantic-related job seeker entity types;

calculating job seeker image and personalized post image The recommendation weight of each key entity in the system is calculated according to the following formula: Wherein, the method comprises the steps of, wherein, Clustering for post demandsImportance weight of the f-th key entity of (c),Is the key entityImportance weight of semantically related jth job seeker entity type,Respectively, post demand clusterThe f-th key entity of (2)Entity semantic vectors of the j-th job seeker entity types related to semantics;；

Will personalize the post image The key entities in the list are ranked from high to low according to recommendation weights, the first M key entities are selected, and posts corresponding to the M key entities are recommended to job seekers.

7. An information management system based on data fusion, the system being configured to implement the information management method based on data fusion according to any one of claims 1 to 6, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, when executing the program, to implement the steps in the data fusion based information management method according to any of claims 1-6.

9. A readable storage medium, characterized in that it stores a computer program adapted to be loaded by a processor for performing the steps of the data fusion based information management method according to any of claims 1-6.