CN111242387A - Talent departure prediction method and device, electronic equipment and storage medium - Google Patents
Talent departure prediction method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111242387A CN111242387A CN202010071754.1A CN202010071754A CN111242387A CN 111242387 A CN111242387 A CN 111242387A CN 202010071754 A CN202010071754 A CN 202010071754A CN 111242387 A CN111242387 A CN 111242387A
- Authority
- CN
- China
- Prior art keywords
- prediction
- sample information
- forest model
- historical
- talent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012549 training Methods 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 abstract description 8
- 238000004364 calculation method Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 230000007115 recruitment Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/105—Human resources
- G06Q10/1053—Employment or hiring
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Development Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Game Theory and Decision Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention discloses a talent departure prediction method and device, electronic equipment and a storage medium. The information to be predicted of the current talent is acquired, the information to be predicted is input into a prediction combined forest model after training, and the job leaving prediction result of the current talent is determined according to the output result of the prediction combined forest model, wherein the prediction combined forest model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained through training according to historical original sample information and a first standard prediction result, the problem that talent leave prediction is inaccurate in the prior art is solved through training of the second prediction forest model according to historical key sample information and a second standard prediction result, and the effect of improving the reliability of the job leaving prediction result is achieved.
Description
Technical Field
The embodiment of the invention relates to computer technology, in particular to a talent departure prediction method and device, electronic equipment and a storage medium.
Background
In recent years, Chinese economy is in a steady development stage, talents flow relatively actively, and enterprises in China face new problems in talent management. In a plurality of industries, talent loss shows an increasingly serious trend, and the departure of key talents can cause great negative effects on the development of enterprises. The main factors of talent loss are analyzed, the recruitment cost is reduced, and the talent recruitment method becomes a very slow task for a plurality of enterprises. When key core talents have the intention of leaving work, enterprises predict in advance and make a perfect processing scheme, so that the operation loss is greatly reduced. Enterprise managers need to predict talent departure risks by means of certain tools and methods so as to take measures in time.
In the prior art, generally, generation and analysis are performed on employees, and a generation and analysis result is combined with an ensemble learning algorithm of a random forest, that is, supervised classification problems are adopted to predict employee departure. However, the prediction method does not consider that the number of centralized staff leaving is far smaller than the data of the staff at present, and the imbalance of the number causes inaccurate staff leaving prediction results.
Therefore, the talent leave prediction method adopted in the prior art is inaccurate in prediction result and needs to be improved.
Disclosure of Invention
The embodiment of the invention provides a talent departure prediction method and device, electronic equipment and a storage medium, and aims to achieve the effect of improving accuracy of talent departure prediction.
In a first aspect, an embodiment of the present invention provides a talent departure prediction method, where the method includes:
acquiring information to be predicted of current talents;
inputting the information to be predicted to a trained prediction combination forest model, and determining the departure prediction result of the current talent according to the output result of the prediction combination forest model, wherein the prediction combination forest model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result.
In a second aspect, an embodiment of the present invention further provides a talent departure prediction apparatus, where the apparatus includes:
the information to be predicted acquisition module is used for acquiring information to be predicted of the current talent;
and the job leaving prediction result determining module is used for inputting the information to be predicted to a trained prediction combined forest model and determining a job leaving prediction result of the current talent according to an output result of the prediction combined forest model, wherein the job leaving prediction model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the talent due prediction method according to any one of the first aspect.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions, where the computer-executable instructions, when executed by a computer processor, implement the talent due departure prediction method according to any one of the first aspects.
According to the technical scheme provided by the embodiment of the invention, the information to be predicted of the current talent is acquired, the information to be predicted is input into a prediction combined forest model after training, and the departure prediction result of the current talent is determined according to the output result of the prediction combined forest model, wherein the prediction combined forest model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result. Because the historical key samples can improve the proportion of talents leaving the job, the data balance in the historical key samples can be improved, the problem of inaccurate talent leaving prediction in the prior art is solved, and the effect of improving the reliability of the leaving prediction result is realized.
Drawings
Fig. 1 is a schematic flowchart of a talent due-leave prediction method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating a talent due-leave prediction method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a talent departure prediction apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a schematic flowchart of a talent due prediction method according to an embodiment of the present invention, which is applicable to a case where a due prediction result of a current talent is accurately output, where the apparatus may be implemented by software and/or hardware and is generally integrated in a terminal or an electronic device. Referring specifically to fig. 1, the method may include the steps of:
and S110, acquiring information to be predicted of the current talent.
Wherein, the current talents can be the personnel in the job of any enterprise. The information to be predicted can be understood as human resource information of the current talent, and can include basic information, evaluation data, behavior data, education information, job information and the like of the current talent, wherein the evaluation information can include performance assessment, evaluation assessment and the like, and the behavior data can include attendance data, work data and the like.
And S120, inputting the information to be predicted into the trained prediction combination forest model, and determining the job leaving prediction result of the current talent according to the output result of the prediction combination forest model.
The departure prediction model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result.
Wherein the first predictive forest model is in RF1Denotes that the size is s1The second predictive forest model is based on RF2Denotes that the size is s2And the prediction combination model is expressed by RF, then the prediction combination model RF is equal to RF1+RF2Size of the forest composition is s, RF2The proportion of the combined forest is p (p is more than or equal to 0 and less than or equal to 1), namely s2=p·s,s1Higher p-values indicate application to the sample set TcThe greater the number of decision trees learned, the fewer the number of decision trees learned applied to the data set T.
The historical original sample information can include historical separating sample information of the talents and historical on-duty sample information of the talents, and the historical original sample information can be stored in the data set T. The historical key sample information may include the historical sample information and m pieces of historical sample information that are closest to the historical sample information, and the historical key sample information may be stored in a data set TcBy determining the data set TcAs can be seen, the data set TcHistorical sample information account data set T ofcThe ratio of (2) is greatly improved compared with the ratio of the historical off-duty sample information in the data set T to the data set T.
It can be understood that, because the sample number of talent leaving talents in the data set T of the historical original sample information is far less than the sample number of talents existing at work, that is, the historical information of leaving work and the historical information of existing work in the historical original sample information are not balanced, if the neural network model is trained by directly utilizing the historical original sample information, and then the trained neural network model is utilized to predict talent leaving work, the prediction accuracy is poor. In this embodiment, the historical original sample information may be used to train a first prediction forest model in the prediction combination forest model, and the historical key sample information may be used to train a second prediction forest model in the prediction combination forest model, so that the prediction combination forest model is optimized, and accuracy of talent departure prediction is improved.
Optionally, the first prediction forest model is split by using the minimum kini coefficient of each feature of the historical original sample information as a node, and the second prediction forest model is split by using the minimum kini coefficient of each feature of the historical key sample information as a node.
Wherein the prediction combined forest model may be a set h (x, θ) of decision tree classifiersk) And k is 1,2, …, n. Wherein the base classifier h (x, theta)k) Is a classification regression tree constructed by CART algorithm without pruning, x is input vector, and parameter set thetakThe random vectors are independent and distributed, and are used for determining the growth process of the single tree; and the output of the prediction combined forest model is obtained by integrating the classification result of each tree by a majority voting method. The construction process of the prediction combined forest model can comprise the following steps:
step (a): based on the data set T, sampling is carried out by adopting a self-service method (Bootstrap) to generate N sub-data setsSince Bootstrap is with return samples, there is a set of samples that are not sampled, called out-of-bag samples OOB (out-of-bag) per dataset,
step (b): on a per basisTraining of a non-pruned CART Tree model hi(x) I 1,2, …, N, randomly selecting P features (P) from all P features in the tree generation process<P and the value P is constant), selecting the node with the minimum kini coefficient for splitting by utilizing the pre-calculated kini coefficients of all values of each characteristic;
step (c): each tree grows to the maximum extent, a pruning process is not carried out, the classes of the input talent samples x with multiple feature dimensions are obtained by a majority voting mechanism, and the output sample classes can be obtained by the following formula:
wherein, ORFFor output sample class, O is the output variable, I is the indicator function, yiSample categories derived for the ith tree.
Alternatively, a kini coefficient (Gini) may be used to represent the degree of purity of the prediction combined forest model, the smaller the kini coefficient, the higher the purity of the sample data, the better the features, i.e. the higher the probability that the sample belongs to the same class only. Optionally, assuming that the sample data has K total classes, the probability of the kth class is pkThe Gini calculation formula of the probability distribution is:
it can be understood that talent separation prediction belongs to the classification problem, and the probability output by the first talent sample is set to be p, and the Gini calculation formula of probability distribution is as follows:
Gini(p)=2p(1-p) (3)
optionally, for sample D, the number is | D |, and the number of sample of talent who leaves the job in the dataset is | C1I, talent sample of jobThe quantity is | C2L, Gini calculation formula for sample D is:
optionally, the sample D is divided into samples D according to a certain value a of the characteristic A1And sample D2Sample D1Is | D1L, sample D2Is | D2If the sample D is not identical to the sample D, the Gini calculation formula of the sample D is:
in the embodiment, the prediction combined forest model can obtain the output sample category through a majority voting mechanism so as to obtain the job leaving prediction result, and compared with a method for predicting the job leaving of the staff by using a clustering algorithm in the prior art, the method can achieve the purpose of avoiding falling into a local optimal solution and achieve the effect of improving the accuracy of the prediction result.
According to the technical scheme provided by the embodiment of the invention, the information to be predicted of the current talent is acquired, the information to be predicted is input into a prediction combined forest model after training, and the departure prediction result of the current talent is determined according to the output result of the prediction combined forest model, wherein the prediction combined forest model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result. Because the historical key samples can improve the proportion of talents leaving the job, the data balance in the historical key samples can be improved, the problem of inaccurate talent leaving prediction in the prior art is solved, and the effect of improving the reliability of the leaving prediction result is realized.
Example two
Fig. 2 is a schematic flow chart of a talent due-leave prediction method according to a second embodiment of the present invention. The technical scheme of the embodiment is refined on the basis of the embodiment. Optionally, the training method of the second prediction forest model includes: acquiring the historical key sample information; inputting the historical key sample information into the second original forest model to obtain a first prediction result, adjusting network parameters of the second original forest model according to the first prediction result and the second standard prediction result to obtain the second prediction forest model, and the training method of the first prediction forest model comprises the following steps: acquiring the historical original sample information; and inputting the historical original sample information into the first original forest model to obtain a second prediction result, and adjusting the network parameters of the first original forest model according to the second prediction result and the first standard prediction result to obtain the first prediction forest model. In the method, reference is made to the above-described embodiments for those parts which are not described in detail. Referring specifically to fig. 2, the method may include the steps of:
and S210, acquiring historical original sample information.
S220, inputting the historical original sample information into the first original forest model to obtain a second prediction result, and adjusting the network parameters of the first original forest model according to the second prediction result and the first standard prediction result to obtain a first prediction forest model.
Optionally, the network parameters may include: true positive class ratio TPrateAccuracy AccrateAnd the geometric mean G-mean. Wherein the true positive class ratio TPrateThe method can be used for reflecting the prediction performance of the prediction combination forest model on the talent sample of the deputy, and the calculation formula is as follows:
wherein N isPIs the sample number of talents who leave the job in the sample, NTPNumber of samples that are talents who are out of position and are correctly predicted to be out of position, NFNNumber of samples, TP, that are in-talent but mispredicted as out-of-employmentrateThe larger the model, the more the model is like the talent of the jobThe better the prediction performance of the present is.
Wherein, the accuracy AccrateThe method can be used for evaluating the overall performance of the prediction combination forest model, and the calculation formula is as follows:
wherein N isNThe number of samples of the current talents in the sample, NTNNumber of samples that are present talents and correctly predicted to be present, NFPNumber of samples, Acc, that are job-talent but are mispredicted as off-dutyrateThe larger the size, the better the overall prediction performance of the model.
The geometric mean value G-mean value can be used as an important evaluation index for evaluating the classification prediction performance of the unbalanced data, and the calculation formula is as follows:
the G-mean value is the product of the prediction accuracy of the talent sample of the departmental talent and the prediction accuracy of the non-talent sample of the departmental talent, and when the values of the G-mean value and the non-talent sample of the departmental talent are larger, the G-mean value is increased along with the prediction accuracy, so that the classification prediction performance of the model on the unbalanced data is better.
In this embodiment, when adjusting the network parameter, the direction in which the G-mean value increases is used as the main adjustment direction, and the true positive class ratio TP of the index to be evaluatedrateAccuracy AccrateAnd when the geometric mean value G-mean simultaneously reaches a relatively large value, determining the prediction combination forest model.
And S230, acquiring historical key sample information.
As described in the previous embodiments, the historical key sample may include the historical sample information of the job leaving and at least one piece of historical sample information of the job existing that is most recent to the historical sample information of the job leaving. In this embodiment, the historical key sample information may be thought of to be determined based on a KNN (proximity algorithm) algorithm. Optionally, determining the historical key sample information may be implemented by:
calculating sample distances between the historical separating sample information of each separating talent and the historical on-duty sample information of each on-duty talent in the historical original sample information;
and determining the historical time sample information and at least one target historical time sample information with the sample distance from the historical time sample information within a set distance threshold as the historical key sample information.
Optionally, determining at least one target historical occupational sample information whose sample distance from the historical occupational sample information is within a set distance threshold includes:
inputting the historical original sample information into a set binary tree, and determining a target node of the set binary tree;
calculating a first distance between the target departure node and any one of the existing sub-nodes, and calculating a second distance between the target departure node and a target existing leaf node, wherein the target existing leaf node is the current closest point to the target departure node;
if the first distance is smaller than the second distance, determining the node of the sub-job corresponding to the first distance as the target history sample information of the sub-job; if the first distance is equal to the second distance, determining an under-working child node or the target under-working leaf node corresponding to the first distance as the target history under-working sample information;
and if the first distance is greater than the second distance, determining the target occupational leaf node corresponding to the second distance as the target historical occupational sample information.
Wherein, the set binary tree can be understood as a KD tree (each node is a binary tree of at least one dimension value point); the target departure node is a node corresponding to all the historical departure information, and may be a node with any depth in a set binary tree, such as a root node, a leaf node, other nodes, and the like; the sub-nodes of the job are all nodes corresponding to the historical information of the job, and can be a node with any depth in a set binary tree, such as a root node, leaf nodes and other nodes; the set distance can be understood as m pieces of historical on-duty sample information which are closest to the historical off-duty sample information; the target on-duty leaf node may be understood as a node that is a distance from a certain target off-duty node (i.e., the current closest point of the target off-duty node). Specifically, the determination process of the target historical occupational sample information of the historical occupational sample information can be explained by the following steps:
step (1): inputting a data set T ═ x of historical raw sample information of n dimensions1,x2,x3,…,xN},
Wherein,constructing a root node, wherein the root node corresponds to a hyper-rectangular area of an n-dimensional space containing a data set T, and completing the construction process of the KD tree;
step (2): for nodes with depth j in the KD tree, x is selected(l)For a segmented coordinate axis, the segmentation dimension of a node with depth j is calculated as follows:
l=j(modk)+1 (9)
where l is the segmentation dimension, k is the number of selected neighbors, and j (modk) represents the remainder of j divided by k. The splitting of the node with the depth of j may be understood as taking x of all instances in the node area with the depth of j(l)The median of the coordinate is a dividing point, the hyper-rectangular region corresponding to the node is divided into two sub-regions, and the division is performed by passing through the dividing point and the coordinate axis x(l)A vertical hyperplane implementation;
and (3): repeating the step (2) until no instance exists in the two sub-regions obtained by division, thereby completing the region division of the KD tree;
and (4): starting from a root node, recursively and downwards indexing a KD tree, if the coordinate of the current dimension of a certain child node is smaller than the coordinate of a dividing point, moving to a left child node, otherwise, moving to a right child node until the child node is a leaf node, acquiring the leaf node corresponding to the information of the job in question, taking the leaf node corresponding to the information of the job in question as a target leaf node of the job in question, and determining the target leaf node of the job in question as the current closest point of the target job node;
and (5): if the distance between the existing position child node and the target position node is closer than the distance between the current closest point and the target position node, replacing the current closest point of the target position node by the existing position child node, for example, whether a region corresponding to the existing position child node of the parent node of the index target position leaf node has a point with a closer distance, namely, whether the region corresponding to the existing position child node is intersected with a hyper-sphere taking the target position node as a sphere center and taking the distance from the target position child node to the target position leaf node as a radius is checked;
and (6): if the nodes are intersected, a point which is closer to the target position node may exist in the corresponding area of the other working child node, the node is moved to the other child node, and recursive search is continued;
and (7): if the nodes are not intersected, recursion is carried out upwards, the index is ended when the nodes recurse to the root node, and the current closest point is the closest adjacent point of the target departure node. And repeating the method to search the latest m talents in the historical talent departure sample information of each talent departing. Through the steps, historical key sample information can be obtained to be used for training the second original forest model.
Optionally, the calculation formula of the first distance and the second distance is: suppose there are two pointsX is theni,xjDistance L between two pointspComprises the following steps:
wherein n is the number of dimensions and p is the order,representing the value of the nth dimension of the ith sample,representing the value of the nth dimension of the jth sample.
Optionally, before obtaining the historical key sample information, redundant processing and normalization processing may be performed on the historical original sample information to obtain intermediate sample information, so as to determine the historical key sample information according to the intermediate sample information.
Wherein redundant processing may be understood as processing redundant features. The redundant features can be understood as that the meaning of each feature is observed on an original data set, the features which have small influence on talent separation relation are removed, meanwhile, partial features are closely related to other features, and even can be deduced, so that the features are called as the redundant features. In addition, feature redundancy can increase computational difficulty and affect prediction efficiency and accuracy. In this embodiment, the calculation formula of the correlation p (x, y) between the feature x and the feature y is:
wherein, d (x) and d (y) are the variance of the feature x and the feature y, respectively, Cov (x, y) is the covariance between the feature x and the feature y, the value of p (x, y) is between [0,1], if the value of p is larger, the correlation between the features is stronger, otherwise, the correlation is weaker.
Alternatively, normalization may be understood as processing all data so that all data is within a certain range. If the data are widely distributed on different orders of magnitude, the data with high orders of magnitude generally play a leading role, that is, the data features with low orders of magnitude are not easy to represent, so that the generalization capability of the model is reduced, and the target cannot be accurately fitted. The data standardization is to eliminate the difference between different orders of magnitude, so as to improve the fitting capability of the model, the common method is a maximum and minimum (Max-Min) normalization method, namely, the index data is normalized to be between [0 and 1], the maximum value is 1, the minimum value is 0, the activation function in the training process adopts a Sigmoid function, the value range of the function is [0 and 1], and the specific calculation mode of the normalization is as follows:
wherein, YiFor normalized data, xiFor the actual value of the input vector, xmaxAnd xminRespectively the maximum and minimum values in the input vector.
S240, inputting the historical key sample information into a second original forest model to obtain a first prediction result, and adjusting network parameters of the second original forest model according to the first prediction result and a second standard prediction result to obtain a second prediction forest model.
And S250, determining the forest model comprising the first prediction forest model and the second prediction forest model as a prediction combined forest model after training.
And S260, acquiring information to be predicted of the current talent.
And S270, inputting the information to be predicted to the trained prediction combination forest model, and determining the job leaving prediction result of the current talent according to the output result of the prediction combination forest model.
Optionally, before the information to be predicted is input into the trained prediction combined forest model, redundancy processing and normalization processing may be performed on the information to be predicted, so as to improve the accuracy of the out-of-job prediction result.
According to the technical scheme provided by the embodiment of the invention, the historical key sample information is acquired and input into the second original forest model to obtain a first prediction result, the network parameters of the second original forest model are adjusted according to the first prediction result and the second standard prediction result to obtain the second prediction forest model, and the historical original sample information is acquired; and inputting the historical original sample information into the first original forest model to obtain a second prediction result, and adjusting the network parameters of the first original forest model according to the second prediction result and the first standard prediction result to obtain the first prediction forest model, so that the data balance in the historical key samples can be improved, the generalization capability of the prediction combination forest model can be improved, and the effect of improving the reliability of the deputy prediction result can be realized.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a talent departure prediction apparatus according to a third embodiment of the present invention. Referring to fig. 3, the system includes: a to-be-predicted information acquisition module 31 and a job leaving prediction result determination module 32.
The information to be predicted obtaining module 31 is configured to obtain information to be predicted of a current talent;
and a leave prediction result determining module 32, configured to input the information to be predicted to a trained prediction combined forest model, and determine a leave prediction result of the current talent according to an output result of the prediction combined forest model, where the leave prediction model includes a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result.
On the basis of the above technical solutions, the apparatus further includes: a training module of a second prediction forest model; the obtaining module 51 is a training module of a second prediction forest model, and is used for obtaining the historical key sample information;
and inputting the historical key sample information into the second original forest model to obtain a first prediction result, and adjusting network parameters of the second original forest model according to the first prediction result and the second standard prediction result to obtain the second prediction forest model.
On the basis of the technical schemes, the training module of the second prediction forest model is further used for calculating the sample distance between the historical departure sample information of each talent leaving the job and the historical on-job sample information of each talent existing the job in the historical original sample information;
determining the sample information of the departure and sample information of the target history of the present job as the key sample information of the history, wherein the sample information of the target history of the present job comprises at least one sample information of the history of the present job whose distance from each sample information of the departure is within a set distance threshold.
On the basis of the above technical solutions, the training module of the second prediction forest model is further configured to,
inputting the historical original sample information into a set binary tree, and determining a target node of the set binary tree;
calculating a first distance between the target departure node and any one of the existing sub-nodes, and calculating a second distance between the target departure node and a target existing leaf node, wherein the target existing leaf node is the current closest point to the target departure node;
if the first distance is smaller than the second distance, determining the node of the sub-job corresponding to the first distance as the target history sample information of the sub-job;
if the first distance is equal to the second distance, determining an under-working child node or the target under-working leaf node corresponding to the first distance as the target history under-working sample information;
and if the first distance is greater than the second distance, determining the target occupational leaf node corresponding to the second distance as the target historical occupational sample information.
On the basis of the technical schemes, the first prediction forest model takes the minimum kini coefficient of each feature of the historical original sample information as a node to split, and the second prediction forest model takes the minimum kini coefficient of each feature of the historical key sample information as a node to split.
On the basis of the above technical solutions, the network parameters include: and evaluating at least one of the true positive class rate, the accuracy rate and the geometric mean value of the index.
On the basis of the above technical solutions, the apparatus further includes: a preprocessing module; the preprocessing module is used for acquiring the historical original sample information;
and carrying out redundancy processing and normalization processing on the historical original sample information to obtain intermediate sample information so as to determine the historical key sample information according to the intermediate sample information.
According to the technical scheme provided by the embodiment of the invention, the information to be predicted of the current talent is acquired, the information to be predicted is input into a prediction combined forest model after training, and the departure prediction result of the current talent is determined according to the output result of the prediction combined forest model, wherein the prediction combined forest model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result. Because the historical key samples can improve the proportion of talents leaving the job, the data balance in the historical key samples can be improved, the problem of inaccurate talent leaving prediction in the prior art is solved, and the effect of improving the reliability of the leaving prediction result is realized.
Example four
Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 4, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. The memory 28 may include at least one program product having a set of program modules (e.g., a to-be-predicted information acquisition module 31 and a job departure prediction result determination module 32 of the talent departure prediction apparatus) configured to perform the functions of embodiments of the present invention.
A program/utility 44 having a set of program modules 46 (e.g., talent departure prediction apparatus to-be-predicted information acquisition module 31 and departure prediction result determination module 32) may be stored, for example, in memory 28, such program modules 46 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 46 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing a talent due prediction method provided by an embodiment of the present invention, the method including:
acquiring information to be predicted of current talents;
inputting the information to be predicted to a trained prediction combination forest model, and determining the departure prediction result of the current talent according to the output result of the prediction combination forest model, wherein the prediction combination forest model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing a talent due prediction method provided by an embodiment of the present invention.
Of course, those skilled in the art will appreciate that the processor may also implement the solution of the talent due departure prediction method provided in any embodiment of the present invention.
EXAMPLE five
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a talent departure prediction method according to an embodiment of the present invention, where the method includes:
acquiring information to be predicted of current talents;
inputting the information to be predicted to a trained prediction combination forest model, and determining the departure prediction result of the current talent according to the output result of the prediction combination forest model, wherein the prediction combination forest model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result.
Of course, the computer program stored on the computer-readable storage medium provided by the embodiments of the present invention is not limited to the above method operations, and may also perform related operations in a talent departure prediction method provided by any embodiments of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device.
A computer readable signal medium may include, inter alia, information to be predicted, raw sample information, first criteria predictors, historical key sample information, and second criteria predictors, and computer readable program code embodied therewith. The propagated information to be predicted, the original sample information, the first standard prediction result, the historical key sample information, the second standard prediction result and the like. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It should be noted that, in the embodiment of the talent departure prediction apparatus, the modules included in the embodiment are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1. A talent departure prediction method, comprising:
acquiring information to be predicted of current talents;
inputting the information to be predicted to a trained prediction combination forest model, and determining the departure prediction result of the current talent according to the output result of the prediction combination forest model, wherein the prediction combination forest model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result.
2. The method of claim 1, wherein the training of the second predictive forest model comprises:
acquiring the historical key sample information;
and inputting the historical key sample information into the second original forest model to obtain a first prediction result, and adjusting network parameters of the second original forest model according to the first prediction result and the second standard prediction result to obtain the second prediction forest model.
3. The method of claim 2, wherein the obtaining the historical key sample information comprises:
calculating sample distances between the historical separating sample information of each separating talent and the historical on-duty sample information of each on-duty talent in the historical original sample information;
and determining the historical time sample information and at least one target historical time sample information with the sample distance from the historical time sample information within a set distance threshold as the historical key sample information.
4. The method of claim 3, wherein determining at least one target historical occupational sample information having a sample distance from the historical occupational sample information within a set distance threshold comprises:
inputting the historical original sample information into a set binary tree, and determining a target node of the set binary tree;
calculating a first distance between the target departure node and any one of the existing sub-nodes, and calculating a second distance between the target departure node and a target existing leaf node, wherein the target existing leaf node is the current closest point to the target departure node;
if the first distance is smaller than the second distance, determining the node of the sub-job corresponding to the first distance as the target history sample information of the sub-job;
if the first distance is equal to the second distance, determining an under-working child node or the target under-working leaf node corresponding to the first distance as the target history under-working sample information;
and if the first distance is greater than the second distance, determining the target occupational leaf node corresponding to the second distance as the target historical occupational sample information.
5. The method as claimed in claim 2, wherein the first prediction forest model is split with a minimum kini coefficient of each feature of the historical raw sample information as a node, and the second prediction forest model is split with a minimum kini coefficient of each feature of the historical key sample information as a node.
6. The method of claim 2, wherein the network parameters comprise: and evaluating at least one of the true positive class rate, the accuracy rate and the geometric mean value of the index.
7. The method of claim 2, further comprising, prior to said obtaining said historical key sample information:
acquiring the historical original sample information;
and carrying out redundancy processing and normalization processing on the historical original sample information to obtain intermediate sample information so as to determine the historical key sample information according to the intermediate sample information.
8. An apparatus for predicting talent departure, comprising:
the information to be predicted acquisition module is used for acquiring information to be predicted of the current talent;
and the job leaving prediction result determining module is used for inputting the information to be predicted to a trained prediction combined forest model and determining a job leaving prediction result of the current talent according to an output result of the prediction combined forest model, wherein the job leaving prediction model comprises a first prediction forest model and a second prediction forest model, the first prediction forest model is obtained by training according to historical original sample information and a first standard prediction result, and the second prediction forest model is obtained by training according to historical key sample information and a second standard prediction result.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the computer program when executed by the processor implements the talent due departure prediction method according to any one of claims 1-7.
10. A storage medium containing computer-executable instructions which, when executed by a computer processor, implement the talent departure prediction method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010071754.1A CN111242387A (en) | 2020-01-21 | 2020-01-21 | Talent departure prediction method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010071754.1A CN111242387A (en) | 2020-01-21 | 2020-01-21 | Talent departure prediction method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111242387A true CN111242387A (en) | 2020-06-05 |
Family
ID=70874858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010071754.1A Pending CN111242387A (en) | 2020-01-21 | 2020-01-21 | Talent departure prediction method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111242387A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738765A (en) * | 2020-06-23 | 2020-10-02 | 京东数字科技控股有限公司 | Data processing method, device, equipment and storage medium |
CN111798059A (en) * | 2020-07-10 | 2020-10-20 | 河北冀联人力资源服务集团有限公司 | System and method for predicting job leaving |
CN111860299A (en) * | 2020-07-17 | 2020-10-30 | 北京奇艺世纪科技有限公司 | Target object grade determining method and device, electronic equipment and storage medium |
CN113537642A (en) * | 2021-08-20 | 2021-10-22 | 日月光半导体制造股份有限公司 | Product quality prediction method, device, electronic equipment and storage medium |
-
2020
- 2020-01-21 CN CN202010071754.1A patent/CN111242387A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738765A (en) * | 2020-06-23 | 2020-10-02 | 京东数字科技控股有限公司 | Data processing method, device, equipment and storage medium |
CN111798059A (en) * | 2020-07-10 | 2020-10-20 | 河北冀联人力资源服务集团有限公司 | System and method for predicting job leaving |
CN111798059B (en) * | 2020-07-10 | 2023-11-24 | 河北冀联人力资源服务集团有限公司 | Off-duty prediction system and method |
CN111860299A (en) * | 2020-07-17 | 2020-10-30 | 北京奇艺世纪科技有限公司 | Target object grade determining method and device, electronic equipment and storage medium |
CN111860299B (en) * | 2020-07-17 | 2023-09-08 | 北京奇艺世纪科技有限公司 | Method and device for determining grade of target object, electronic equipment and storage medium |
CN113537642A (en) * | 2021-08-20 | 2021-10-22 | 日月光半导体制造股份有限公司 | Product quality prediction method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111242387A (en) | Talent departure prediction method and device, electronic equipment and storage medium | |
CN111343161B (en) | Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment | |
CN110059894B (en) | Equipment state evaluation method, device, system and storage medium | |
CN113238922B (en) | Log analysis method and device, electronic equipment and medium | |
US20080082475A1 (en) | System and method for resource adaptive classification of data streams | |
CN113505537A (en) | Building energy consumption detection method and device, computer equipment and storage medium | |
CN113986674B (en) | Time sequence data abnormality detection method and device and electronic equipment | |
US11640558B2 (en) | Unbalanced sample classification method and apparatus | |
CN113159355A (en) | Data prediction method, data prediction device, logistics cargo quantity prediction method, medium and equipment | |
CN111179055B (en) | Credit line adjusting method and device and electronic equipment | |
CN110599200A (en) | Detection method, system, medium and device for false address of OTA hotel | |
CN110728313A (en) | Classification model training method and device for intention classification recognition | |
CN114327964A (en) | Method, device, equipment and storage medium for processing fault reasons of service system | |
CN117608630A (en) | Code quality detection method, device, equipment and storage medium | |
CN111126629B (en) | Model generation method, brush list identification method, system, equipment and medium | |
CN113538154A (en) | Risk object identification method and device, storage medium and electronic equipment | |
CN113591998A (en) | Method, device, equipment and storage medium for training and using classification model | |
US20240184598A1 (en) | Real-time event status via an enhanced graphical user interface | |
CN110263083B (en) | Knowledge graph processing method, device, equipment and medium | |
CN113780675B (en) | Consumption prediction method and device, storage medium and electronic equipment | |
CN112561179A (en) | Stock tendency prediction method and device, computer equipment and storage medium | |
CN115034762A (en) | Post recommendation method and device, storage medium, electronic equipment and product | |
CN110059180B (en) | Article author identity recognition and evaluation model training method and device and storage medium | |
CN115455142A (en) | Text retrieval method, computer device and storage medium | |
CN114020916A (en) | Text classification method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200605 |