CN111737495B

CN111737495B - Middle-high-end talent intelligent recommendation system and method based on domain self-classification

Info

Publication number: CN111737495B
Application number: CN202010595817.3A
Authority: CN
Inventors: 黄丽丽; 姚智振; 游河仁; 石宝玉; 王绍兰
Original assignee: Fuzhou Institute Of Data Technology Co ltd
Current assignee: Fuzhou Institute Of Data Technology Co ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2022-12-06
Anticipated expiration: 2040-06-28
Also published as: CN111737495A

Abstract

The invention discloses a middle-high-end talent intelligent recommendation system and a method thereof based on domain self-classification, which adopt knowledge map technology, establish the automatic association relationship between middle-high-end talent images and multilevel science and technology knowledge maps through the key steps of mining and analyzing the science and technology big data, constructing large-scale multi-domain science and technology knowledge maps, matching the fields based on the knowledge maps and the like, thereby realizing the automatic classification of the fields of massive talent data and the retrieval and recommendation of talents based on 'small fields' and providing an effective tool for talent introduction, talent information gathering and retrieval of talents for talent introduction institutions.

Description

Middle-high-end talent intelligent recommendation system and method based on domain self-classification

Technical Field

The invention relates to the field of talent recommendation systems, in particular to a middle-high-end talent intelligent recommendation system and a method thereof based on domain self-classification.

Background

The scientific talents are the primary driving force for the economic development of scientific innovation. How to discover, cultivate, retain, attract and use talents is an important subject which is superior in the today's increasingly severe international competition. Along with the industrial structure change caused by industrial adjustment, the social demand for talents has structural change, and under the conditions of insufficient supply and relative lag of talents, the demand for talents in a specific field is increased, so that the scarcity degree of talents is increased; the economic development has higher requirements on the quality of talents, so that the high-school talents with high school calendar, high quality and high skill are more difficult to obtain.

As talent competition is aggravated, various ways such as hunting, recruitment and college cooperation are adopted by each talent introduction mechanism to reach target talents, so that the talent introduction cost is high, medium-high-end talents in various fields are explored in advance by means of big data analysis and mining, medium-high-end talent introduction reserves are established, and the cost required for realizing talent recommendation in the fields is much lower. At present, the invention patents related to the system are recommended by people in China.

For example, patent No. 201510109074.3 discloses a "job recommendation system based on a knowledge base", which constructs a human intelligence source proprietary knowledge base by crawling and entity analysis of human resource-related knowledge, simultaneously combines internet information extraction and fusion to construct a talent proprietary file and job information, and maps and models based on the job and user requirements, thereby realizing recommendation. The human resource knowledge base of the scheme is mainly used for supplementing talents and position information, lacks accurate analysis and positioning in the talent field, and cannot establish quick and effective specific small-field talent extension and matching recommendation paths.

For example, patent No. 201610329208.7 discloses "an industrial design talent level evaluation method and system", which obtains talent attribute features from a talent resume, a talent questionnaire, and a talent system log-in log by using a text word segmentation technique, performs field division on talents by using an automatic classification algorithm based on a set industrial design field, and performs talent price and capability calculation according to a classification result and attributes, thereby realizing recommendation. The scheme is only based on the most basic information of the talent resume, does not mine and fuse internet information such as literature achievements, news blogs and the like, and the field division is based on the set categories, so that the requirement of a user on talents in the fine-grained field cannot be met.

Disclosure of Invention

The invention aims to provide a middle-high-end talent intelligent recommendation system and method based on domain self-classification.

The technical scheme adopted by the invention is as follows:

middle and high-end talent intelligent recommendation system based on domain self-classification comprises the following modules:

talent information mining and fusion module: acquiring and obtaining multi-source heterogeneous talent data, and performing talent data fusion and ranking;

science and technology field automatic classification module: constructing a full-field science and technology knowledge map, and automatically classifying the talent data in the expert field by using machine learning on the basis of the full-field science and technology knowledge map;

talent ability post evaluation portrait module: establishing rich and detailed talent evaluation images for each expert to form ability analysis and evaluation in the talent field, analyzing and evaluating the safety of talents to be introduced by using public sentiment big data to form evaluation based on the public sentiment big data, and comprehensively establishing talent-post matching evaluation indexes according to talent information to form talent post suitability evaluation;

talent retrieval and intelligent recommendation module: the method provides retrieval and query based on the knowledge graph, carries out domain mining on the subdivided domains according to the requirements of users, recommends expert talents of the subdivided domains, recommends high-end talent subscription information in the domains to talent introduction mechanisms and the latest dynamics of domain center experts.

Furthermore, the talent information mining fusion module adopts a minimal risk-based ontology mapping model RiMOM model to carry out data integration. The model integrates various mapping strategies including a mapping strategy based on name similarity, a machine learning strategy based on an example, a mapping strategy based on a structure and the like, realizes metadata mapping, and further achieves expert information fusion.

The intelligent medium-high-end talent recommendation method based on the domain self-classification comprises the following steps:

step 1: and (3) talent information mining and fusion: acquiring and obtaining multi-source heterogeneous talent data, and performing fusion and name arrangement on the talent data to form a middle-high-end talent database;

and 2, step: science and technology field automatic classification: constructing a full-field scientific knowledge map, and automatically classifying the human talent data in the expert field by using machine learning on the basis of the full-field scientific knowledge map;

and 3, step 3: evaluation portrait of talent ability post: establishing rich and detailed talent evaluation images for each expert to form talent field capability analysis and evaluation, analyzing and evaluating the safety of talents to be introduced by using public sentiment big data to form evaluation based on the public sentiment big data, and comprehensively establishing talent-post matching evaluation indexes according to talent information to form talent post suitability evaluation;

and 4, step 4: talent retrieval and intelligent recommendation: acquiring a retrieval text input by a talent introduction mechanism and performing associated expansion on a knowledge graph of the whole subject field so as to correct a field query result; and simultaneously, mining a subdivision field according to the post requirement text, recommending expert talents of the subdivision field from a talent stock library, and simultaneously pushing high-end talent subscription information and the latest dynamics of field top experts to a talent introduction mechanism.

The method provides retrieval and query based on the knowledge graph, performs field segmentation mining according to the requirements of users, recommends expert talents in the field segmentation, recommends high-end talent subscription information in the field to talent introduction mechanisms, and provides the latest dynamics of field top experts.

Further, step 1, crawler and parallel strategies are applied to obtain basic information of field academic experts from top-level periodical conferences of various fields; acquiring basic information of domain industry experts from domain famous websites (famous enterprises, colleges and associations official websites); in order to enrich the dimensionality of talent portraits, the dynamic information of expert project achievements, prize winning titles, science and technology news and the like are acquired from various channels such as open knowledge bases, science and technology forums, news blogs and the like, and multi-dimensional domain expert basic data are established.

Further, a semantic information extraction method of a conditional random field model based on a dynamic graph structure is adopted in the step 1, background information of the conditional random field model is extracted from personal homepages of experts or scholars, and expert cooperation relation and achievement text information are extracted from achievement information; and according to different labeling results of the instance nodes, a dependent edge is dynamically generated, the prior knowledge of the user is effectively fused, and the labeling precision of semantic information is improved, so that the defects of the traditional manual labeling and semi-automatic labeling are overcome.

Further, the method for constructing the scientific and technological knowledge graph in the whole field in the step 2 comprises the following steps: extracting text keywords, terms, concepts, entity names and the like by using scientific and technological big data such as scientific and technological talents, scientific and technological documents, active news and the like, extracting a concept classification system, concepts and entity relations, establishing an incidence relation among the data by combining machine learning and natural language processing technology research, constructing a data network node diagram, and further generating a knowledge graph which is ultra-large-scale (in the order of tens of millions), covers the natural fund subjects, is balanced in Chinese and English data and is aligned; the method comprises the following specific steps:

step 2-11, extracting large-scale subject keywords: extracting large-scale scientific and technological keywords from large-scale scientific and technological documents by using a keyword extraction tool for unsupervised learning;

step 2-12, keyword relation extraction: obtaining vector representation of the keywords by using Word Embedding technology, and further performing keyword semantic association and cluster analysis to generate quantitative semantic relation of the large-scale keywords; as one of the embodiments, the word2vec method is used for keyword association, and similarly, hierarchical clustering (clustering) may be used for clustering of keywords.

2-13, automatically expanding the term concept based on the keyword semantic relation extraction result and an external knowledge source to discover more concept terms and semantic relations among concepts, thereby realizing automatic expansion and updating of the map;

and 2-14, performing large-scale cross-learning on the expanded map by using the existing map representation learning method, and providing support for a link prediction task on the map by using a learning result.

Further, the specific steps of the automatic classification of the domain based on machine learning in the step 2 are as follows:

step 2-21, pretreatment: carrying out text word segmentation processing on the human talent data text, and removing stop words and meaningless characters;

step 2-22, labeling the expert information text: and annotating the expert information text for the subclass disciplines in the technical knowledge graph in the whole field. In the specific labeling process, the major disciplines of the experts are determined, and then the associated minor disciplines are determined based on the knowledge graph. However, only the subclass disciplines are recorded as the tags when the tags are saved.

Step 2-23, vectorizing the expert information text: the text data is subjected to text vectorization operation before being input into the model, so that the text data is converted into numerical data; text vectorization consists of two stages of text word segmentation processing and word vector conversion. The module uses the TorchText software library to perform word vector translation operations.

Step 2-24, constructing a domain classification model: constructing a classification model based on a convolutional neural network, wherein the classification model is divided into four layers: an input layer, a convolution layer, a pooling layer, and a full-connection layer;

the input layer is essentially a lookup table, and the position subscripts of the vocabulary in the dictionary are input to obtain the word vectors corresponding to the vocabulary. Input layers are implemented by torch. Where V is the number of words in the dictionary and D is the dimension of the word vector, and the pre-trained word vector is imported from _ predicted (vectors).

The convolution layer is realized by torch, nn, conv2D (Ci, co, (K, D)). Ci is the number of input channels, co is the number of output channels, corresponding to the number of convolution kernels. K is the size of the convolution kernel. D is the dimension of the word vector. After the convolution operation, a Relu linear rectification unit is used as the activation function.

The pooling layer is implemented using a max _ pool1d () function, which is essentially one-dimensional pooling, i.e., selecting the maximum value for each row to express the characteristics of that row.

Combining the results of the three convolutional layers into one layer, connecting the layer to the last layer through a full-link layer, and predicting the field discipline of the text, wherein the number of the neurons is the predicted category number (knowledge map subclass discipline). The fully connected layer is implemented using torch.nn.linear () and the dropout is implemented using torch.nn.dropout (). And finally forming a convolutional neural network model.

Step 2-25, training a classification model: importing batch sample data, respectively setting a training set and a verification set, using an Adam optimizer to obtain an optimized classification model through forward propagation and backward propagation training, and using the optimized classification model to perform field automatic classification.

Furthermore, the talent evaluation image in step 3 includes basic information of experts, education background, work experience, research field and interest, prize obtaining list, result information (thesis, patent, project, etc.), academic evaluation, relationship network and real-time dynamic, the experts dynamically display the expert real-time news dynamic information, the expert real-time news dynamic information includes scientific and technological information dynamic such as academic activities, conference forums and important result prize obtaining, etc. which participate in the experts, master the latest information of the experts, and track the development path of the experts.

Furthermore, in step 3, public opinion big data is used for analyzing and evaluating the safety of talents to be introduced (particularly foreign talents), information such as crime records of the introduced people and violent statements and cultural backgrounds published by a social platform are monitored, the conditions such as politics, religions and law violation of the foreign talents to be introduced are found, risks such as crimes, cheats, secret leakage and leaves of the foreign talents are early warned, the risks are excluded from talent introduction lists, talent risk level indexes are established, an early warning mechanism is formed, and deep insight of talent information is achieved.

Furthermore, talent-post matching evaluation indexes are comprehensively established in step 3 according to information such as talent research fields, work skills, work years, work units, project achievements, cultural differences, award titles and the like, and mainly comprise post matching degree analysis indexes and introduction difficulty analysis indexes.

Further, in step 4, the text information input by the talent introduction mechanism is associated and expanded with the full disciplinary domain knowledge graph so as to correct the domain query result.

Further, the step 4 of recommending experts in the subdivided field for the post requirement text specifically comprises the following steps:

step 4-1, constructing a representation model representing the content of the post core problem or technical meaning in a cross-domain computable semantic space from the post requirement text, and realizing accurate extraction of semantic representation of a single post requirement text;

step 4-2, rapidly determining the field and skill of the expert according to the scientific knowledge map and the automatic classification result of the expert field,

step 4-3, vectorizing the post requirement text subject keywords and the expert file, calculating the cosine similarity in the same vector space, and quantitatively sorting the similarity of the candidate experts from high to low to form an expert recommendation list;

and automatically matching related experts according to the map and the keyword vector to generate an expert recommendation list. The list may be further screened, with the screening conditions including: the priority is sorted, the filter condition and the constraint condition are displayed, and statistics are carried out according to the filter condition and the constraint condition. After the screening is completed, a final list of recommended experts may be generated.

Further, the concrete steps of extracting the keywords of the post requirement text in the step 4-1 are as follows:

and 4-1-1, firstly, performing word segmentation processing on the required text. For each n-gram (n is 3-10), extracting the n-gram if the n-gram is in the knowledge graph entity library;

step 4-1-2, giving different weights to the keywords extracted from different parts of the required text, merging the keywords extracted according to different weights, and then analyzing a keyword-subject mapping relation to generate subject probability distribution; the concrete expression formula is as follows:

wherein D is a requirement document library, dj is a specific requirement text, wi is an extracted keyword, k is a subject category number, and k is generally set to be 4/5 of the total number of the keyword in specific operation.

By adopting the technical scheme, the invention aims to solve the problems of rapid construction of a high-end talent base in the whole field and intelligent retrieval and recommendation of talents in small fields, adopts the knowledge map technology, establishes the automatic association relationship between a high-end talent portrait and a multi-level science and technology knowledge map through the key steps of mining and analyzing big science and technology data, constructing a large-scale multi-field science and technology knowledge map, matching the fields based on the knowledge map and the like, thereby realizing the automatic classification of the fields of massive talent data and the retrieval and recommendation of talents based on the small fields, and providing an effective tool for talent introduction, talent information aggregation and retrieval of talent introduction mechanisms.

Drawings

The invention is described in further detail below with reference to the accompanying drawings and the detailed description;

FIG. 1 is a schematic structural diagram of a middle-high-end talent intelligent recommendation system based on domain self-classification according to the present invention;

FIG. 2 is a schematic diagram of a dependency graph structure of a dynamic conditional random field model;

FIG. 3 is a schematic diagram of the operation flow of text vectorization using torchtext;

FIG. 4 is an exemplary diagram of a convolutional neural network model;

FIG. 5 is a diagram of an expert multi-dimensional image;

FIG. 6 is a diagram of domain expansion based on knowledge-graphs;

FIG. 7 is a diagram illustrating expert intelligent recommendation based on semantic similarity.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

In order to solve the problems of rapid construction of a high-end talent base in the whole field and intelligent retrieval and recommendation of talents in small fields, the knowledge map technology is adopted, and the automatic association relationship between the high-end talent portrait and the multi-level science knowledge map is established through the key steps of mining and analyzing big science and technology data, constructing a large-scale multi-field science and technology knowledge map, matching the fields based on the knowledge map and the like, so that the automatic classification of the fields of massive talent data and talent retrieval and recommendation based on the small fields are realized, and an effective tool is provided for talent introduction, talent information gathering and retrieval of talent introduction organizations. As shown in one of fig. 1 to 7, the invention discloses a middle and high-end talent intelligent recommendation system and a method thereof based on domain self-classification,

in order to achieve the above object, the middle and high-end talent intelligence system based on domain self-classification proposed by the present invention comprises the following modules, and the system flow chart is shown in fig. 1:

1. talent information mining and fusion module: the following functions are realized;

1) Multi-source heterogeneous talent information data acquisition

The middle-high-end talent introduction reserve library is a data set of field expert information, and acquires field academic expert basic information from top-level periodical conferences of various fields by applying efficient and accurate crawlers and parallel strategies, and acquires field industry expert basic information from the network of field well-known enterprises, colleges and officers; in order to enrich the dimensionality of talent portraits, the dynamic information of expert project achievements, winning awards, scientific news and the like is acquired from various channels such as open knowledge bases, scientific forums, news spokesman and the like, and multi-dimensional field expert basic data is established.

The module specifically adopts a semantic information extraction method of a conditional random field model based on a dynamic graph structure, extracts relevant background information from personal homepages of experts or scholars, and extracts expert cooperation relationship and result text information from result information. The model dynamically generates the dependence edges according to different labeling results of the instance nodes, can effectively fuse the prior knowledge of users, and improves the labeling precision of semantic information so as to overcome the defects of traditional manual labeling and semi-automatic labeling, and the structure is shown in FIG. 2.

2) Data fusion and synonymy disambiguation

The different internet data sources obtain expert information, so that the formed expert data has the characteristic of multiple sources, how to integrate the multi-source heterogeneous expert information into a complete portrait and how to realize knowledge fusion on knowledge and experts are the key problems to be solved firstly.

Metadata integration is the sharing and interaction between heterogeneous knowledge of different metadata descriptions, and metadata mapping is the key to knowledge fusion. The module adopts a body mapping model RiMOM model based on minimum risk to carry out data integration. The model integrates various mapping strategies, including a mapping strategy based on name similarity, a machine learning strategy based on an example, a mapping strategy based on a structure and the like, and realizes metadata mapping so as to achieve expert information fusion.

Meanwhile, extracting talent-related background information from talent basic information, extracting talent project result information from thesis patents, and extracting talent news dynamic information from media information, it is necessary to use network semantic relationship information to solve ambiguity of entities of the same name (entities of the same name obtained from different data sources, but having different meanings respectively). As a key link and a core technology in a data extraction layer, the module performs homonymy disambiguation on expert information by adopting a probabilistic image model-based method to obtain an accurate talent picture database.

2. Science and technology field automatic classification module: the following functions are realized;

2-1) construction of scientific knowledge map in the whole field

The construction of the technical knowledge graph in the whole field is to utilize technical big data such as technical talents, technical documents, active news and the like, complete the extraction of text keywords, terms, concepts, entity names and the like, complete the extraction of a concept classification system, concepts and entity relations, establish an incidence relation between data by combining machine learning and natural language processing technology research, construct a data network node graph, and further generate the knowledge graph which is ultra-large-scale (in the order of tens of millions), covers the natural fund subjects, is balanced in Chinese and English data and is aligned. The scientific and technological knowledge graph is constructed and mined with fine-grained all-field subject classification, supports expansion of child nodes and edges, can dynamically expand the fine-grained molecular field, and realizes fine-grained depiction of the scientific and technological field. The method comprises the following specific steps:

2-11. Large-scale subject keyword extraction

The method utilizes a keyword extraction tool of unsupervised learning to extract large-scale scientific and technical keywords from large-scale scientific and technical documents, and realizes discovery and extraction of complex phrases, entity long words and advanced new words on the basis of common keyword extraction.

2-12. Extraction of keyword relationship

Vector representations of these keywords are obtained using Word Embedding techniques. On the basis, the functions of key word meaning association, cluster analysis and the like are realized, and the quantitative semantic relation of large-scale key words is provided.

The keyword association adopts a word2vec method: and converting each keyword into a word vector of an N-dimensional space by using a deep learning tool word2vec, and then taking the cosine value size of the two vectors in the N-dimensional space as the measure of the similarity of the two vectors. Word2vec adopts a hierarchical Log-Bilinear language model, one of which is a CBOW model. The formula for predicting the next word as w _ t according to the context can be efficiently calculated by combining a hierarchical softmax algorithm as follows.

p(w _t |context)＝p(w _t |w _t-k ,w _t-k+1 ,…,w _t-1 ,w _t+1 ,…,w _t+k )

Clustering of keywords uses Hierarchical clustering (this method is described as follows:

inputting: class number K, key phrase W

I. Initial state sets each node (keyword) as a class

Finding out the pair with highest similarity in the current class, and merging the pair

Calculating the similarity between the newly generated class and the rest classes

Checking the current class number, if the current class number is less than or equal to K, ending, otherwise, circulating II and III

Wherein, the similarity calculation method between classes adopts Average Linkage clustering, namely

And the node similarity adopts the keyword similarity calculated by the word2vec method.

In practical applications, the value of K is about 4/5 of the total number of keywords.

2-13. Term concept extension

And realizing automatic expansion of term concepts based on the keyword relation extraction result and an external knowledge source (such as Wikipedia, a search engine and the like) so as to discover more concept terms and semantic relations among the concepts, thereby realizing automatic expansion and updating of the graph.

2-14, atlas representation learning

Based on the current graph representation learning methods such as TransE series, multDist, convE and the like, a learning algorithm tool for large-scale cross-disciplinary knowledge graph representation learning is developed, and learning results are utilized to provide support for tasks such as link prediction on a graph.

2-2) automatic classification of machine learning based domains:

the module utilizes the constructed scientific and technological knowledge map to construct a classification model by adopting an algorithm based on machine learning, thereby realizing the purpose of automatic classification in the expert professional field. The expert information text data obtained after multi-source heterogeneous talent data acquisition, cleaning, fusion and disambiguation are subjected to preprocessing, text labeling, text vectorization, classification model construction and training and other steps, are intelligently trained to match the fields in the scientific knowledge map, and are automatically classified. The method comprises the following specific steps:

2-21 pretreatment Process

Text segmentation processing is required as well as removal of stop words and removal of nonsense characters. The used text word segmentation tool is a jieba word segmentation tool, can perform the functions of word segmentation, part of speech tagging, keyword extraction and the like on the Chinese text, and supports a user-defined dictionary. Segmenting the text results in many nonsense words such as "main", "and", "etc". The module adopts a Ha worker large stop word list, a Baidu stop word list and a Sichuan university machine intelligent laboratory stop word library as stop word lists, judges whether the word segmentation result is in the stop word lists, and removes the words if the stop words are matched.

2-22, labeling expert information text

Convolutional neural networks are supervised learning algorithms that require a large number of labeled data sets as training data. The data crawled by the network crawler are only simple texts and do not have labels, and manual labeling is needed. The module marks part of expert information texts by applying the subclass disciplines in the constructed full-field scientific and technological knowledge graph. In the specific labeling process, the major disciplines of the experts are determined, and then the associated minor disciplines are determined based on the knowledge graph. However, only the subclass disciplines are recorded as the tags when the tags are saved.

2-23 expert information text vectorization

Before the text data is input into the model, text vectorization operation is required to convert the text data into numerical data. Text vectorization consists of two stages of text word segmentation processing and word vector conversion. The module uses a TorchText software library to perform word vector conversion operation, and the operation flow of using the TorchText to perform text vectorization is shown in figure 3.

2-24. Construction of domain classification model

Pytorech is used to construct a classification model based on a convolutional neural network, and the model is divided into four layers: input layer, convolution layer, pooling layer, and full-connected layer.

This is achieved by torch.nn.conv2d (Ci, co, (K, D)). Ci is the number of input channels, co is the number of output channels, corresponding to the number of convolution kernels. K is the size of the convolution kernel. D is the dimension of the word vector. After the convolution operation, a Relu linear rectification unit is used as the activation function.

The results of the three convolutional layers are merged into one layer, which is connected to the last layer through a full connection layer to predict the field discipline of the text, and the number of the neurons is the predicted category number (the knowledge map subclass discipline). The fully connected layer is implemented using torch.nn.linear () and the dropout is implemented using torch.nn.dropout (). An example of the resulting convolutional neural network model is shown in fig. 4.

2-25 training classification model

In a pytoch, training data of a model is often transmitted through batchs (batch data) one by one, and a plurality of pieces of data need to be packed into one batch, and the number of data used by one batch is batch size. The model training process roughly comprises three stages of importing batch sample data, forward propagation and backward propagation. When importing the data set, randomly set 90% of the data as the training set and the remaining 10% of the data as the validation set. Only the training set participates in the training of the model, and the verification set is used for evaluating the accuracy of the model. The optimizer used in the model training process is an Adam optimizer, and the process roughly comprises three stages of importing batch sample data, forward propagation and backward propagation.

And 2-26, automatically classifying the field of the expert information text by using the optimized classification model.

3. Talent ability-post evaluation portrait module: the following functions are realized;

1) Talent field ability analysis and evaluation

The talent bank establishes a rich and detailed talent evaluation picture for each expert, wherein the picture comprises basic information of the expert, education background, work experience, research field and interest, prize winning list, result information (thesis, patents, projects and the like), academic evaluation, relationship network, real-time dynamics and the like, and the picture is shown in figure 5. The expert dynamically displays the real-time news dynamic information of the expert, including scientific and technological information dynamics such as academic activities participated by the expert, conference forums, important result winning and the like, masters the latest information of the expert and tracks the development path of the expert.

2) Intelligent evaluation based on public opinion big data

The safety of talents to be introduced (particularly, foreign talents) is analyzed and evaluated by using public opinion big data, the criminal records of the introduced personnel, violent statements published by a social platform, cultural backgrounds and other information are monitored, the conditions of politics, religion, law violation and the like of the foreign talents to be introduced are found, risks such as crimes, fraud, disclosure, deputy and the like are early warned, the risks are excluded from talent introduction lists, talent risk grade indexes are established, an early warning mechanism is formed, and deep insight of talents is realized.

3) Talent-post suitability evaluation

And (3) comprehensively establishing talent-post matching evaluation indexes according to information such as talent research field, work skills, working years, work units, project achievements, cultural differences, award titles and the like, wherein the index mainly comprises a post matching degree analysis index and an introduction difficulty analysis index.

4. Talent retrieval and intelligent recommendation module: the following functions are realized;

1) Knowledge graph-based retrieval query

The input from the user to query the talent is still based on natural language text, which may be one or more keywords, which may be an expert name, etc. It is very critical to understand the query intent of the user at this time. A large-scale knowledge graph covering knowledge in the whole disciplinary field is constructed at the bottom layer of the system. Through the knowledge graph, the query input of the user can be understood by the system, real-time intelligent expansion and prompt are carried out, and the domain query result is associated and aligned with the related knowledge domain, so that the domain query result is corrected. See fig. 6.

2) Intelligent recommendation based on semantic analysis

The medium-high-end talent reserve library of the associated scientific knowledge base is a data base for supporting a recommendation system. The system performs segmentation field mining according to the requirements of the users, so that expert talents in the segmentation field are intelligently recommended. The recommendation can be based on the fact that the user inquires keywords of the subdivided fields to push relevant expert talents, and also can be based on the fact that semantic analysis is carried out on technical requirement texts of the user, the requirement keywords are intelligently mined, the expert talents are automatically matched and recommended according to semantic similarity between the subdivided fields and the keywords, and the efficiency and the objectivity of expert screening are improved.

The post requirement text contains a large amount of semantic information which accurately describes the scientific and technical fields to which the post belongs and the specific technical details. The system can automatically recommend the best matching expert for the talent introduction unit through the core key technologies such as post and expert semantic representation modeling and learning and an expert intelligent recommendation algorithm based on the given post demand text, candidate expert information and other related conditions, and can intelligently avoid the high-risk expert according to public opinion big data analysis results. The method comprises the following specific steps:

4-1, post content semantic representation modeling and learning

Starting from the post requirement text, a representation model capable of representing the content of post core problems or technical meanings is constructed in a cross-domain computable semantic space, and the semantic representation of the single post requirement text is accurately extracted. The specific steps are keyword extraction and subject distribution mapping.

The keyword extraction is divided into two steps: and (5) word segmentation and extraction. Firstly, the word segmentation processing is carried out on the requirement text. For each n-gram (n takes 3-10) in the text, it is extracted if it is in the knowledge-graph entity library. Since it is computationally expensive to extract the optimal keyword combination, n-grams (greedy algorithms) with long lengths are preferentially extracted. Because the long keywords can express the text semantics more clearly, compared with the method that the keywords are directly divided into a series of short words, the text intention can be expressed better by preferentially selecting the long words. For example: and (4) monitoring the vs natural disaster by the natural disaster.

The discipline distribution mapping is to give rough discipline probability distribution according to the content of the required keywords, so as to be convenient for determining the main field disciplines of the required talents. Giving different weights to the keywords extracted from different parts of the required text, merging the keywords extracted according to different weights, and then analyzing the keyword-subject mapping relation to generate subject probability distribution:

wherein D is a requirement document library, dj is a specific requirement text, wi is an extracted keyword, k is a subject category number,

in a particular operation k is typically set to 4/5 of the total number of keywords.

4-2, expert semantic representation modeling and learning

Similar to the task of semantic representation modeling and learning of project content, the difference is that experts have numerous project outcome data in addition to personal profile information, and there may be situations across multiple research domains at the same time. The scientific and technological knowledge map and the expert field automatic classification algorithm constructed by the system can quickly determine the field and related skills of the expert, and are beneficial to realizing accurate screening of the expert.

4-3, expert intelligent recommendation method based on semantic similarity

On the basis of the first two tasks, the post requirement text subject keywords and the expert archive are vectorized, and the cosine similarity in the same vector space is calculated, so that the quantitative ordering of the candidate experts is realized. The higher the similarity, the higher the ranking. And vectorizing each extracted keyword by using word2vec, and averaging all vectors to obtain a vector capable of representing the text. The vectors are generated in the same manner for all the expanded keywords of each expert. The most similar experts can be extracted based on their cosine similarity in vector space, see fig. 7.

3) Domain high-end talent subscription and recommendation

The talent introduction mechanism can acquire high-end talent subscription information in the concerned field by inputting interested field skill keywords and setting screening conditions such as talent working years, academic levels, school schools, titles and the like, and meanwhile, the system intelligently recommends the latest dynamics of the top experts in the field to help the talent introduction mechanism to track the high-end talents in the field and acquire first-hand information.

It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Claims

1. The method for realizing the intelligent recommendation of the middle-high-end talents by the intelligent recommendation system of the middle-high-end talents based on the domain self-classification is characterized by comprising the following steps of: the system comprises the following modules:

talent ability post evaluation portrait module: establishing a talent evaluation image for each expert to form talent field capability analysis evaluation, analyzing and evaluating the safety of talents to be introduced by using public sentiment big data to form evaluation based on the public sentiment big data, and establishing a talent-post matching evaluation index according to talent information to form talent post suitability evaluation;

talent retrieval and intelligent recommendation module: providing retrieval and query based on a knowledge graph, performing field segmentation according to the requirements of users, recommending expert talents of the segmented fields, recommending high-end talent subscription information in the fields to talent introduction mechanisms and the latest dynamics of field top experts; the method comprises the following steps:

step 1: and (3) talent information mining and fusion: acquiring and obtaining multi-source heterogeneous talent data, and performing fusion and ranking on talent data to form a medium-high-end talent database;

and step 3: evaluation portrait of talent ability post: establishing a talent evaluation image for each expert to form talent field capability analysis evaluation, analyzing and evaluating the safety of talents to be introduced by using public sentiment big data to form evaluation based on the public sentiment big data, and establishing a talent-post matching evaluation index according to talent information to form talent post suitability evaluation;

and 4, step 4: talent retrieval and intelligent recommendation: acquiring a retrieval text input by a talent introduction mechanism and performing associated expansion on a knowledge graph of the whole subject field so as to correct a field query result; mining a subdivision field according to the post requirement text and recommending expert talents of the subdivision field from a talent stock library; meanwhile, high-end talent subscription information and the latest dynamics of field center experts are pushed to the talent introduction mechanism; the concrete steps of recommending the experts in the subdivided field for the post requirement text in the step 4 are as follows:

step 4-1, starting from the post requirement text, constructing a representation model representing the post core problem or technical semantic content in a cross-domain computable semantic space, and realizing accurate extraction of semantic representation of a single post requirement text; the concrete steps of extracting the keywords of the post requirement text in the step 4-1 are as follows:

step 4-1-1, firstly, performing word segmentation on the required text, namely extracting n-grams in the text within a value range of 3-10 if the n-grams are in a knowledge graph entity library;

step 4-1-2, giving different weights to the keywords extracted from different parts of the required text, combining the keywords extracted according to different weights, and then analyzing the keyword-subject mapping relation to generate subject probability distribution; the concrete expression formula is as follows:

wherein D is a requirement document library, D _j For a specific requirement text, w _i K is the number of subject categories for the extracted keywords;

step 4-2, rapidly determining the field and skill of the expert through a scientific knowledge map and the automatic classification result of the expert field,

and 4-3, vectorizing the post requirement text subject keywords and the expert file, calculating cosine similarity in the same vector space, and quantitatively sequencing similarity of the candidate experts from high to low to form an expert recommendation list.

2. The method for realizing the intelligent recommendation of the middle-high-end talents by the intelligent recommendation system of the middle-high-end talents based on the domain self-classification according to claim 1, characterized by comprising the following steps: and the talent information mining fusion module adopts a body mapping model RiMOM based on minimum risk to carry out data integration and carries out metadata mapping to achieve expert information fusion.

3. The method for realizing the intelligent recommendation of the middle-high-end talents by the intelligent recommendation system of the middle-high-end talents based on the domain self-classification as claimed in claim 1, is characterized in that: step 1, obtaining field academic expert basic information from top-level journal conferences of various fields by applying a crawler and a parallel strategy, and obtaining field industry expert basic information from a field known website; acquiring expert project achievements, winning titles and dynamic scientific news information from various open knowledge bases, scientific forums and news blog channels to establish multi-dimensional domain expert basic data; and a semantic information extraction method of a conditional random field model based on a dynamic graph structure is adopted to extract background information of the conditional random field model from the personal homepage of an expert or a learner, extract the expert cooperation relationship and the text information of the achievement from the achievement information, further dynamically generate a dependent edge according to different labeling results of instance nodes, and effectively fuse the prior knowledge of the user.

4. The method for realizing the intelligent recommendation of the middle-high-end talents by the intelligent recommendation system of the middle-high-end talents based on the domain self-classification as claimed in claim 1, is characterized in that: the method for constructing the technical knowledge map in the whole field in the step 2 specifically comprises the following steps:

step 2-11, extracting subject keywords: extracting scientific and technological keywords from scientific and technological documents by using a keyword extraction tool for unsupervised learning;

step 2-12, extracting keyword relation: obtaining vector representation of the keywords by using Word Embedding technology, and further performing keyword semantic association and cluster analysis to generate a quantitative semantic relationship of the keywords;

and 2-14, performing interdisciplinary learning on the expanded graph by using a graph representation learning method, and providing support for a link prediction task on the graph by using a learning result.

5. The method for realizing the intelligent recommendation of the middle-high-end talents by the intelligent recommendation system of the middle-high-end talents based on the domain self-classification according to claim 1, characterized by comprising the following steps: the specific steps of the automatic field classification based on machine learning in the step 2 are as follows:

step 2-21, pretreatment: carrying out text word segmentation processing on the human data text, and removing stop words and meaningless characters; step 2-22, labeling the expert information text: labeling the expert information text for the subclass disciplines in the full-field scientific knowledge map;

step 2-23, vectorizing the expert information text: the text data is subjected to text vectorization operation before being input into the model, and the text data is converted into numerical data; the text vectorization comprises two stages of text word segmentation processing and word vector conversion;

step 2-24, constructing a domain classification model: constructing a classification model based on a convolutional neural network, wherein the classification model is divided into four layers which are sequentially arranged: an input layer, a convolution layer, a pooling layer, and a full-connection layer; inputting a position index of a vocabulary in a dictionary at an input layer to obtain a word vector corresponding to the vocabulary; the number of the convolution layers is three, and each convolution layer uses a Relu linear rectification unit as an activation function after passing through convolution operation of the convolution layer; the output of each convolution layer is respectively connected with a pooling layer, and the pooling layer selects the maximum value of each row to express the characteristics of the row; combining the results of the three convolutional layers into one layer, connecting the layer to the last layer through a full-link layer, and predicting the field discipline of the text, wherein the number of the neurons is the number of the predicted class discipline types of the knowledge map;

step 2-25, training a classification model: importing batch sample data, respectively setting a training set and a verification set, using an Adam optimizer to obtain an optimized classification model through forward propagation and backward propagation training,

6. The method for realizing the intelligent recommendation of the middle-high-end talents by the intelligent recommendation system of the middle-high-end talents based on the domain self-classification as claimed in claim 1, is characterized in that: the step 3 specifically comprises the following steps:

the talent evaluation image comprises basic information, education background, work experience, research field and interest, prize winning lists, result information, academic evaluation, relationship network and real-time dynamic state of experts, and displays the dynamic information of the expert real-time news, wherein the dynamic information of the expert real-time news comprises academic activities participated by the experts, conference forums and scientific and technological information dynamic state of important result prize winning;

public opinion big data is used for analyzing and evaluating the safety of talents to be introduced, the criminal records of the introducers and violent statements and cultural background information published by a social platform are monitored, the political, religious and illegal conditions of the talents to be introduced are found, the risks of crimes, frauds, secret leakage and deputy of the talents are early warned, the risk grade index of the talents is established to form a warning mechanism, and the deep insight of talent information is realized; establishing talent-position matching evaluation indexes including position matching degree analysis indexes and introduction difficulty analysis indexes according to the talent research field, the working skills, the working age, the working units, the project achievements, the cultural differences and the winning head information.

7. The method for realizing the intelligent recommendation of the middle-high-end talents by the intelligent recommendation system of the middle-high-end talents based on the domain self-classification as claimed in claim 1, is characterized in that: and 4, performing correlation expansion on the text information input by the talent introduction mechanism and the full subject field knowledge graph to correct the field query result.