CN111754208A - Automatic screening method for recruitment resumes - Google Patents
Automatic screening method for recruitment resumes Download PDFInfo
- Publication number
- CN111754208A CN111754208A CN202010619694.2A CN202010619694A CN111754208A CN 111754208 A CN111754208 A CN 111754208A CN 202010619694 A CN202010619694 A CN 202010619694A CN 111754208 A CN111754208 A CN 111754208A
- Authority
- CN
- China
- Prior art keywords
- resume
- recruitment
- model
- post
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/105—Human resources
- G06Q10/1053—Employment or hiring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Strategic Management (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an automatic screening method of recruitment resumes, belonging to the technical field of Text extraction. The work and the time for manually screening resume meeting the post from the massive resumes are saved for the recruitment company.
Description
Technical Field
The invention relates to text extraction in natural language processing, and simultaneously relates to the field of multi-label classification, in particular to an automatic screening method for recruiting resumes.
Background
In the internet era, the social and life positions of internet information are more and more remarkable, and the shopping, communication and life styles of people are changed accordingly. With the rise of the recruitment website, the main recruitment path of the enterprise also turns to the publication of recruitment information by the recruitment website, the original method of delivering resumes by offline by an applicant is replaced by online delivered resumes, and the method of recruiting by the two parties through the recruitment website greatly facilitates the enterprise and the individual.
The multi-label learning problem is a research hotspot in the field of international machine learning, and originally originates from the ambiguity problem encountered in the document classification problem. Under the traditional supervised learning framework, real-world objects and concept labels thereof are in one-to-one correspondence, generally, the learning problem is considered to have no ambiguity, and the learning problem is called as a single label classification problem, namely, a sample only has a single label. However, in real-world problems, ambiguity objects are widely present. Because of the ambiguity problem, one sample may be associated with multiple tokens, a class of problems that is multi-token classification problems. The multi-marker learning has wide application in real life, such as automatic video labeling, bioinformatics, Web mining, information retrieval, personalized recommendation and other real applications.
Association rules (Association rule) are one of the most active research methods in the knowledge discovery field, and are first proposed by Agrawal et al in 1993 for mining Association between different commodities (items) in a customer transaction database, and the rules reflect the purchasing behavior pattern of a user. A typical example of association rule mining is shopping basket analysis. The association rule research is helpful for finding out the association between different commodities (items) in the transaction database and finding out the purchasing behavior pattern of the customer, such as the influence of purchasing a certain commodity on purchasing other commodities. The analysis results may be applied to commodity shelf layouts, inventory arrangements, and to classify users according to purchasing patterns.
The TextRank algorithm is a graph-based ranking algorithm for text. The basic idea is derived from the PageRank algorithm of Google, a text is divided into a plurality of composition units (words and sentences), a graph model is established, important components in the text are sequenced by using a voting mechanism, and keyword extraction and abstract can be realized only by using the information of a single document. Different from models such as LDA and HMM, the TextRank does not need to learn and train a plurality of documents in advance, and is widely applied due to simplicity and effectiveness. The TextRank algorithm firstly carries out word segmentation operation on a provided sentence, the obtained word segmentation is put into a set, the importance degree of the word segmentation mainly refers to the number of neighbors before and after the word segmentation, the more the neighbors are, the more the words are voted for the word segmentation, the higher the weight value is, the more the importance is, and the more the word segmentation occurs continuously, the more the neighbors are; the more in the middle (compared to the beginning and end) this participle, the more its neighbors. The main application of the TextRank algorithm has two aspects, namely extracting important keywords in a text and selecting the keywords in a section of speech with more times
In machine learning, the training algorithm model mainly comprises three steps: firstly, preprocessing data; secondly, selecting a proper algorithm model; and thirdly, training the model based on the training sample and obtaining an optimal algorithm model.
With the popularization and application of the internet, information carriers gradually transit from paper newspapers and periodicals to the internet information. With the wide rise of the recruitment websites, the release of enterprise recruitment information is gradually changed from paper newspapers to various recruitment websites on the internet. At present, a recruitment website becomes a main way for enterprises and applicants to release and acquire recruitment information, and resume delivery and post screening in a recruitment link are completed on line through the Internet. And in order to increase the probability of finding the job, the applicant often delivers resumes in a broadcast network mode. Although the broad net can increase the possibility of finding the job by the applicant, the wide net will undoubtedly increase the workload for the enterprise to screen the resume which is not matched with the job. How to quickly and accurately find talents meeting the requirements of the user from the massive resumes becomes a major problem for enterprises.
Disclosure of Invention
In order to solve the technical problems, the invention provides an automatic screening method of recruitment resumes, which can realize automatic screening of resumes according to information released by both the application and the recruitment parties, greatly save time and consumption cost of enterprises, improve the accuracy of final results and enable the enterprises to find resumes and talents meeting requirements in a short time.
The technical scheme of the invention is as follows:
an automatic screening method of recruitment resumes,
and extracting key words in the recruitment resume information by using a Text-Rank algorithm, classifying the recruitment resumes by using a multi-label classification method ML-KNN, mining the association degree of the resumes and the due post based on association rules, and establishing an automatic screening model of the recruitment resumes.
Comprises that
1) Using a crawler to acquire post and resume data in the recruitment website;
2) extracting keywords in the resume and the post information by using a Text-Rank algorithm, taking the resume keywords as features, and taking the post keywords as marks to generate a training sample;
3) and training by using an ML-KNN algorithm model to obtain a screening model.
Further, in the above-mentioned case,
preprocessing the crawled recruitment resume information;
and performing keyword extraction on the processed recruitment resume data by using a Text-Rank algorithm.
In a still further aspect of the present invention,
the method comprises the following specific steps:
step 1): acquiring data; firstly, crawling an enterprise recruitment information page and corresponding delivery resume information; extracting the post requirement and skill requirement data in the recruitment information and the individual skills in the resume to obtain initial post data and resume data;
step 2): extracting key words; extracting key words in post and resume initial data by using a Text-Rank algorithm;
step 3): verifying the keywords extracted in the step 2), and if the extraction quality is poor or the accuracy is low, performing optimization adjustment on the algorithm model in the step 2), and extracting again;
step 4): processing data; carrying out conversion pretreatment on the acquired post and resume keyword information to obtain training examples required by classification;
step 5): establishing a screening model; learning a classifier by using an ML-KNN algorithm according to the training sample in the step 4), and finally establishing a screening model;
step 6): and the model is screened by training for several times, so that the performance of the model is more stable.
The training sample in step 4 is represented as x1 ═ x11, x12, x13, …, x1n ], and the corresponding result set y ═ L1, L2, …, Lm }, where the value of the label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label.
And (x) establishing a screening model Y ═ f (x), predicting the adaptation position of the unknown sample x according to the model, and calculating the holding probability according to the probability, wherein the larger the holding probability is, the higher the adaptation degree of the resume position is.
The screening model can be trained according to a ten-fold cross validation mode.
The invention has the advantages that
1) The keyword extraction is carried out by using a Text-Rank algorithm, the algorithm is simple and effective, and a plurality of documents do not need to be learned and trained in advance;
2) the method comprises the steps of carrying out classification screening by using an ML-KNN algorithm, searching K neighbor samples by using the idea of KNN through the ML-KNN, and calculating the probability that the current label is 1 and 0 by using Bayesian conditional probability, wherein the label with high probability is determined as the final label of the sample;
3) and 1) and 2) are combined, the automatic resume screening service is provided for enterprise recruitment, the time and the cost of the enterprise are effectively saved, the accuracy of the final result is improved, and the enterprise can find the resumes and talents meeting the requirements in a short time.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
As shown in the figure, the invention defines an automatic screening method of the recruitment resume based on an ML-KNN multi-label learning algorithm, which mainly comprises the following steps:
step 1: and (6) acquiring data. Firstly, an enterprise recruitment information page and corresponding delivery resume information are crawled. Extracting the post requirement and skill requirement data in the recruitment information and the individual skills in the resume to obtain initial post data and resume data;
step 2: and (5) extracting keywords. Extracting key words in post and resume initial data by using a Text-Rank algorithm;
and step 3: and (3) verifying the keywords extracted in the step (2), and if the extraction quality is poor or the accuracy is low, optimizing and adjusting the algorithm model in the step (2) and extracting again.
And 4, step 4: processing data; and performing conversion preprocessing on the acquired position and resume keyword information to obtain training samples required by classification, wherein x1 is [ x11, x12, x13, … and x1n ], and a corresponding result set y is { L1, L2, … and Lm } (the value of the label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label).
And 5: establishing a screening model; and (4) learning the classifier by using an ML-KNN algorithm according to the training sample in the step 4, finally establishing a screening model Y ═ f (x), predicting the adaptive position of the unknown sample x according to the model, and calculating the holding probability according to the possibility, wherein the larger the holding probability is, the higher the position adaptation degree of the resume is.
Step 6: and training and screening the model for multiple times according to a ten-fold cross validation mode, so that the performance of the model is more stable.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (8)
1. An automatic screening method of recruitment resumes is characterized in that,
and extracting key words in the recruitment resume information by using a Text-Rank algorithm, classifying the recruitment resumes by using a multi-label classification method ML-KNN, mining the association degree of the resumes and the due post based on association rules, and establishing an automatic screening model of the recruitment resumes.
2. The method of claim 1, comprising
1) Using a crawler to acquire post and resume data in the recruitment website;
2) extracting keywords in the resume and the post information by using a Text-Rank algorithm, taking the resume keywords as features, and taking the post keywords as marks to generate a training sample;
3) and training by using an ML-KNN algorithm model to obtain a screening model.
3. The method of claim 2,
and preprocessing the crawled recruitment resume information.
4. The method of claim 3,
and performing keyword extraction on the processed recruitment resume data by using a Text-Rank algorithm.
5. The method of claim 4,
the method comprises the following specific steps:
step 1): acquiring data; firstly, crawling an enterprise recruitment information page and corresponding delivery resume information; extracting the post requirement and skill requirement data in the recruitment information and the individual skills in the resume to obtain initial post data and resume data;
step 2): extracting key words; extracting key words in post and resume initial data by using a Text-Rank algorithm;
step 3): verifying the keywords extracted in the step 2), and if the extraction quality is poor or the accuracy is low, performing optimization adjustment on the algorithm model in the step 2), and extracting again;
step 4): processing data; carrying out conversion pretreatment on the acquired post and resume keyword information to obtain training examples required by classification;
step 5): establishing a screening model; learning a classifier by using an ML-KNN algorithm according to the training sample in the step 4), and finally establishing a screening model;
step 6): and the model is screened by training for several times, so that the performance of the model is more stable.
6. The method of claim 1,
the training sample in step 4 is represented as x1 ═ x11, x12, x13, …, x1n ], and the corresponding result set y ═ L1, L2, …, Lm }, where the value of label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label.
7. The method of claim 6,
and (x) establishing a screening model Y ═ f (x), predicting the adaptation position of the unknown sample x according to the model, and calculating the holding probability according to the probability, wherein the larger the holding probability is, the higher the adaptation degree of the resume position is.
8. The method of claim 5,
and training a screening model according to a ten-fold cross validation mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010619694.2A CN111754208A (en) | 2020-07-01 | 2020-07-01 | Automatic screening method for recruitment resumes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010619694.2A CN111754208A (en) | 2020-07-01 | 2020-07-01 | Automatic screening method for recruitment resumes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111754208A true CN111754208A (en) | 2020-10-09 |
Family
ID=72678619
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010619694.2A Withdrawn CN111754208A (en) | 2020-07-01 | 2020-07-01 | Automatic screening method for recruitment resumes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111754208A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113342983A (en) * | 2021-06-30 | 2021-09-03 | 中国平安人寿保险股份有限公司 | Resume distribution method, device and equipment based on machine learning and storage medium |
CN113506084A (en) * | 2021-06-23 | 2021-10-15 | 上海师范大学 | False recruitment position detection method based on deep learning |
CN115879901A (en) * | 2023-02-22 | 2023-03-31 | 陕西湘秦衡兴科技集团股份有限公司 | Intelligent personnel self-service platform |
-
2020
- 2020-07-01 CN CN202010619694.2A patent/CN111754208A/en not_active Withdrawn
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113506084A (en) * | 2021-06-23 | 2021-10-15 | 上海师范大学 | False recruitment position detection method based on deep learning |
CN113342983A (en) * | 2021-06-30 | 2021-09-03 | 中国平安人寿保险股份有限公司 | Resume distribution method, device and equipment based on machine learning and storage medium |
CN113342983B (en) * | 2021-06-30 | 2023-02-07 | 中国平安人寿保险股份有限公司 | Resume distribution method, device and equipment based on machine learning and storage medium |
CN115879901A (en) * | 2023-02-22 | 2023-03-31 | 陕西湘秦衡兴科技集团股份有限公司 | Intelligent personnel self-service platform |
CN115879901B (en) * | 2023-02-22 | 2023-07-28 | 陕西湘秦衡兴科技集团股份有限公司 | Intelligent personnel self-service platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Deepak et al. | A novel firefly driven scheme for resume parsing and matching based on entity linking paradigm | |
CN110110335B (en) | Named entity identification method based on stack model | |
US10410136B2 (en) | Model-based classification of content items | |
Shilpa et al. | Sentiment analysis using deep learning | |
CN112395410B (en) | Entity extraction-based industry public opinion recommendation method and device and electronic equipment | |
CN111767725B (en) | Data processing method and device based on emotion polarity analysis model | |
CN112836509B (en) | Expert system knowledge base construction method and system | |
US20170075978A1 (en) | Model-based identification of relevant content | |
CN107908715A (en) | Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion | |
CN108885623A (en) | The lexical analysis system and method for knowledge based map | |
Akhter et al. | Supervised ensemble learning methods towards automatically filtering Urdu fake news within social media | |
CN112307153B (en) | Automatic construction method and device of industrial knowledge base and storage medium | |
Nasim et al. | Sentiment analysis on Urdu tweets using Markov chains | |
CN110134799B (en) | BM25 algorithm-based text corpus construction and optimization method | |
CN111462752A (en) | Client intention identification method based on attention mechanism, feature embedding and BI-L STM | |
CN111754208A (en) | Automatic screening method for recruitment resumes | |
CN113780007A (en) | Corpus screening method, intention recognition model optimization method, equipment and storage medium | |
US20220148049A1 (en) | Method and system for initiating an interface concurrent with generation of a transitory sentiment community | |
TWI734085B (en) | Dialogue system using intention detection ensemble learning and method thereof | |
CN103049454B (en) | A kind of Chinese and English Search Results visualization system based on many labelings | |
Sun et al. | GubaLex: Guba-oriented sentiment lexicon for big texts in finance | |
Swaileh et al. | A named entity extraction system for historical financial data | |
US20220253728A1 (en) | Method and System for Determining and Reclassifying Valuable Words | |
CN112613318B (en) | Entity name normalization system, method thereof and computer readable medium | |
CN115934936A (en) | Intelligent traffic text analysis method based on natural language processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20201009 |
|
WW01 | Invention patent application withdrawn after publication |