CN111754208A

CN111754208A - Automatic screening method for recruitment resumes

Info

Publication number: CN111754208A
Application number: CN202010619694.2A
Authority: CN
Inventors: 邱继钊; 杨胜华
Original assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Current assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2020-10-09

Abstract

The invention provides an automatic screening method of recruitment resumes, belonging to the technical field of Text extraction. The work and the time for manually screening resume meeting the post from the massive resumes are saved for the recruitment company.

Description

Automatic screening method for recruitment resumes

Technical Field

The invention relates to text extraction in natural language processing, and simultaneously relates to the field of multi-label classification, in particular to an automatic screening method for recruiting resumes.

Background

In the internet era, the social and life positions of internet information are more and more remarkable, and the shopping, communication and life styles of people are changed accordingly. With the rise of the recruitment website, the main recruitment path of the enterprise also turns to the publication of recruitment information by the recruitment website, the original method of delivering resumes by offline by an applicant is replaced by online delivered resumes, and the method of recruiting by the two parties through the recruitment website greatly facilitates the enterprise and the individual.

The multi-label learning problem is a research hotspot in the field of international machine learning, and originally originates from the ambiguity problem encountered in the document classification problem. Under the traditional supervised learning framework, real-world objects and concept labels thereof are in one-to-one correspondence, generally, the learning problem is considered to have no ambiguity, and the learning problem is called as a single label classification problem, namely, a sample only has a single label. However, in real-world problems, ambiguity objects are widely present. Because of the ambiguity problem, one sample may be associated with multiple tokens, a class of problems that is multi-token classification problems. The multi-marker learning has wide application in real life, such as automatic video labeling, bioinformatics, Web mining, information retrieval, personalized recommendation and other real applications.

Association rules (Association rule) are one of the most active research methods in the knowledge discovery field, and are first proposed by Agrawal et al in 1993 for mining Association between different commodities (items) in a customer transaction database, and the rules reflect the purchasing behavior pattern of a user. A typical example of association rule mining is shopping basket analysis. The association rule research is helpful for finding out the association between different commodities (items) in the transaction database and finding out the purchasing behavior pattern of the customer, such as the influence of purchasing a certain commodity on purchasing other commodities. The analysis results may be applied to commodity shelf layouts, inventory arrangements, and to classify users according to purchasing patterns.

The TextRank algorithm is a graph-based ranking algorithm for text. The basic idea is derived from the PageRank algorithm of Google, a text is divided into a plurality of composition units (words and sentences), a graph model is established, important components in the text are sequenced by using a voting mechanism, and keyword extraction and abstract can be realized only by using the information of a single document. Different from models such as LDA and HMM, the TextRank does not need to learn and train a plurality of documents in advance, and is widely applied due to simplicity and effectiveness. The TextRank algorithm firstly carries out word segmentation operation on a provided sentence, the obtained word segmentation is put into a set, the importance degree of the word segmentation mainly refers to the number of neighbors before and after the word segmentation, the more the neighbors are, the more the words are voted for the word segmentation, the higher the weight value is, the more the importance is, and the more the word segmentation occurs continuously, the more the neighbors are; the more in the middle (compared to the beginning and end) this participle, the more its neighbors. The main application of the TextRank algorithm has two aspects, namely extracting important keywords in a text and selecting the keywords in a section of speech with more times

In machine learning, the training algorithm model mainly comprises three steps: firstly, preprocessing data; secondly, selecting a proper algorithm model; and thirdly, training the model based on the training sample and obtaining an optimal algorithm model.

With the popularization and application of the internet, information carriers gradually transit from paper newspapers and periodicals to the internet information. With the wide rise of the recruitment websites, the release of enterprise recruitment information is gradually changed from paper newspapers to various recruitment websites on the internet. At present, a recruitment website becomes a main way for enterprises and applicants to release and acquire recruitment information, and resume delivery and post screening in a recruitment link are completed on line through the Internet. And in order to increase the probability of finding the job, the applicant often delivers resumes in a broadcast network mode. Although the broad net can increase the possibility of finding the job by the applicant, the wide net will undoubtedly increase the workload for the enterprise to screen the resume which is not matched with the job. How to quickly and accurately find talents meeting the requirements of the user from the massive resumes becomes a major problem for enterprises.

Disclosure of Invention

In order to solve the technical problems, the invention provides an automatic screening method of recruitment resumes, which can realize automatic screening of resumes according to information released by both the application and the recruitment parties, greatly save time and consumption cost of enterprises, improve the accuracy of final results and enable the enterprises to find resumes and talents meeting requirements in a short time.

The technical scheme of the invention is as follows:

an automatic screening method of recruitment resumes,

and extracting key words in the recruitment resume information by using a Text-Rank algorithm, classifying the recruitment resumes by using a multi-label classification method ML-KNN, mining the association degree of the resumes and the due post based on association rules, and establishing an automatic screening model of the recruitment resumes.

Comprises that

1) Using a crawler to acquire post and resume data in the recruitment website;

2) extracting keywords in the resume and the post information by using a Text-Rank algorithm, taking the resume keywords as features, and taking the post keywords as marks to generate a training sample;

3) and training by using an ML-KNN algorithm model to obtain a screening model.

Further, in the above-mentioned case,

preprocessing the crawled recruitment resume information;

and performing keyword extraction on the processed recruitment resume data by using a Text-Rank algorithm.

In a still further aspect of the present invention,

the method comprises the following specific steps:

step 1): acquiring data; firstly, crawling an enterprise recruitment information page and corresponding delivery resume information; extracting the post requirement and skill requirement data in the recruitment information and the individual skills in the resume to obtain initial post data and resume data;

step 2): extracting key words; extracting key words in post and resume initial data by using a Text-Rank algorithm;

step 3): verifying the keywords extracted in the step 2), and if the extraction quality is poor or the accuracy is low, performing optimization adjustment on the algorithm model in the step 2), and extracting again;

step 4): processing data; carrying out conversion pretreatment on the acquired post and resume keyword information to obtain training examples required by classification;

step 5): establishing a screening model; learning a classifier by using an ML-KNN algorithm according to the training sample in the step 4), and finally establishing a screening model;

step 6): and the model is screened by training for several times, so that the performance of the model is more stable.

The training sample in step 4 is represented as x1 ═ x11, x12, x13, …, x1n ], and the corresponding result set y ═ L1, L2, …, Lm }, where the value of the label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label.

And (x) establishing a screening model Y ═ f (x), predicting the adaptation position of the unknown sample x according to the model, and calculating the holding probability according to the probability, wherein the larger the holding probability is, the higher the adaptation degree of the resume position is.

The screening model can be trained according to a ten-fold cross validation mode.

The invention has the advantages that

1) The keyword extraction is carried out by using a Text-Rank algorithm, the algorithm is simple and effective, and a plurality of documents do not need to be learned and trained in advance;

2) the method comprises the steps of carrying out classification screening by using an ML-KNN algorithm, searching K neighbor samples by using the idea of KNN through the ML-KNN, and calculating the probability that the current label is 1 and 0 by using Bayesian conditional probability, wherein the label with high probability is determined as the final label of the sample;

3) and 1) and 2) are combined, the automatic resume screening service is provided for enterprise recruitment, the time and the cost of the enterprise are effectively saved, the accuracy of the final result is improved, and the enterprise can find the resumes and talents meeting the requirements in a short time.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

As shown in the figure, the invention defines an automatic screening method of the recruitment resume based on an ML-KNN multi-label learning algorithm, which mainly comprises the following steps:

step 1: and (6) acquiring data. Firstly, an enterprise recruitment information page and corresponding delivery resume information are crawled. Extracting the post requirement and skill requirement data in the recruitment information and the individual skills in the resume to obtain initial post data and resume data;

step 2: and (5) extracting keywords. Extracting key words in post and resume initial data by using a Text-Rank algorithm;

and step 3: and (3) verifying the keywords extracted in the step (2), and if the extraction quality is poor or the accuracy is low, optimizing and adjusting the algorithm model in the step (2) and extracting again.

And 4, step 4: processing data; and performing conversion preprocessing on the acquired position and resume keyword information to obtain training samples required by classification, wherein x1 is [ x11, x12, x13, … and x1n ], and a corresponding result set y is { L1, L2, … and Lm } (the value of the label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label).

And 5: establishing a screening model; and (4) learning the classifier by using an ML-KNN algorithm according to the training sample in the step 4, finally establishing a screening model Y ═ f (x), predicting the adaptive position of the unknown sample x according to the model, and calculating the holding probability according to the possibility, wherein the larger the holding probability is, the higher the position adaptation degree of the resume is.

Step 6: and training and screening the model for multiple times according to a ten-fold cross validation mode, so that the performance of the model is more stable.

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An automatic screening method of recruitment resumes is characterized in that,

2. The method of claim 1, comprising

1) Using a crawler to acquire post and resume data in the recruitment website;

3) and training by using an ML-KNN algorithm model to obtain a screening model.

3. The method of claim 2,

and preprocessing the crawled recruitment resume information.

4. The method of claim 3,

5. The method of claim 4,

the method comprises the following specific steps:

6. The method of claim 1,

the training sample in step 4 is represented as x1 ═ x11, x12, x13, …, x1n ], and the corresponding result set y ═ L1, L2, …, Lm }, where the value of label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label.

7. The method of claim 6,

8. The method of claim 5,

and training a screening model according to a ten-fold cross validation mode.