CN113627182B - Data matching method, device, computer equipment and storage medium - Google Patents
Data matching method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN113627182B CN113627182B CN202110912447.6A CN202110912447A CN113627182B CN 113627182 B CN113627182 B CN 113627182B CN 202110912447 A CN202110912447 A CN 202110912447A CN 113627182 B CN113627182 B CN 113627182B
- Authority
- CN
- China
- Prior art keywords
- resume
- matching
- resume file
- word segmentation
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000011218 segmentation Effects 0.000 claims description 86
- 239000013598 vector Substances 0.000 claims description 69
- 238000006243 chemical reaction Methods 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 11
- 238000010276 construction Methods 0.000 claims description 3
- 230000007115 recruitment Effects 0.000 abstract description 20
- 238000012216 screening Methods 0.000 abstract description 10
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000012015 optical character recognition Methods 0.000 description 4
- 238000012827 research and development Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data matching method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a resume file to be processed, and determining the matching degree between the resume file to be processed and the target post; determining at least one resume file to be processed as the candidate resume file according to the matching degree; acquiring a history recording resume file corresponding to a history post with the same attribute as the target post; determining the similarity between the candidate resume file and the resume file for history; and determining at least one candidate resume file as a target resume file according to the similarity. According to the invention, by combining the matching degree of the resume and the post and the similarity of the resume and the resume for history recording, a more accurate resume screening result is ensured, intelligent matching of the recruitment post and the delivery resume is realized, the post matching is more accurate, and the talent screening efficiency of recruiters is improved.
Description
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a data matching method, apparatus, computer device, and storage medium.
Background
In the prior art, recruitment services were typically responsible for recruiters with a high level of experience within the company. With the rapid development of the Internet, demands for job hunting and recruitment are increased rapidly, massive job hunting resume files are faced, manual classification processing often has subjective factors, and information repeated entry or information deletion is easy to occur. The artificial intelligence-based person post matching auxiliary system has the advantages that the manual screening workload of recruiters can be reduced, and therefore the labor cost of the company is reduced.
However, the current artificial intelligence post matching program only pays attention to the current resume and post information to be matched, ignores resume files for the same post history, and the history information can have a certain guiding effect on the current post matching, and ignores the information possibly resulting in inaccurate post matching results.
Therefore, how to match the current resume and the history information based on the matching of the current resume and the post information and ensure the accurate matching of the post is a great difficulty to be solved at present.
Disclosure of Invention
The invention aims to provide a data matching method, a data matching device, computer equipment and a storage medium, which are used for solving the problems existing in the prior art.
In order to achieve the above object, the present invention provides a data matching method, including:
Acquiring a history recording resume file corresponding to a history post with the same attribute as the target post;
Respectively inputting the candidate resume file and the resume file for the history record into a preset word segmentation model to obtain a candidate word segmentation text and a history word segmentation text after word segmentation processing;
Respectively constructing co-occurrence matrixes according to the candidate word segmentation texts and the history word segmentation texts, and determining first keywords of the candidate word segmentation texts and second keywords of the history word segmentation texts based on the co-occurrence matrixes;
Converting the first keyword and the second keyword into a first vector and a second vector respectively, and obtaining similarity values of the first vector and the second vector;
according to the similarity value, determining the similarity between the candidate resume file and the resume file for history;
And determining at least one candidate resume file as a target resume file according to the similarity.
Preferably, before the step of obtaining the history record resume file corresponding to the history post with the same attribute as the target post, the method further comprises:
acquiring a resume file to be processed, and determining the matching degree between the resume file to be processed and the target post;
And determining at least one resume file to be processed as a candidate resume file according to the matching degree.
Preferably, obtaining a resume file to be processed, and determining a matching degree between the resume file to be processed and the target post, including:
extracting characteristic information of the resume file to be processed and requirement information of the target post;
Inputting the characteristic information and the demand information into a preset matching model to perform matching degree identification; a plurality of matching lists are stored in the matching model in advance, the matching lists correspond to different target posts respectively, and the matching lists store the corresponding relations of the requirement information, the scores and the weight values;
determining a matching score between the characteristic information and the demand information according to the matching model output result;
And obtaining the matching degree of the resume file to be processed and the target post according to the matching score and the weight value corresponding to the requirement information.
Preferably, the extracting the feature information of the resume file to be processed and the requirement information of the target post includes:
judging whether the format of the resume file to be processed meets a preset format or not;
If the preset format is met, reading text information in the resume file to be processed, and extracting the characteristic information from the text information;
If the preset format is not met, reading resume information in the resume file to be processed, generating a conversion file with the preset format, reading conversion text information from the conversion file, and extracting the characteristic information from the conversion text information.
Preferably, the determining, according to the matching degree, that at least one resume file to be processed is a candidate resume file includes:
and taking the resume files to be processed corresponding to the threshold value of the matching degree as candidate resume files.
Preferably, the constructing co-occurrence matrix according to the candidate word segmentation text and the history word segmentation text, and determining the first keyword of the candidate word segmentation text and the second keyword of the history word segmentation text based on the co-occurrence matrix includes:
Respectively constructing the co-occurrence matrix according to the occurrence times of each word in the candidate word segmentation text and the historical word segmentation text;
extracting word frequency and degree of each word in the candidate word segmentation text and the historical word segmentation text according to the co-occurrence matrix;
Obtaining the score of each word in the candidate word segmentation text and the historical word segmentation text according to the word frequency and the word degree;
And outputting each word in the candidate word segmentation text and the historical word segmentation text in a descending order according to the score to obtain a first keyword of the candidate word segmentation text and a second keyword of the historical word segmentation text.
Preferably, the converting the first keyword and the second keyword into a first vector and a second vector, respectively, and obtaining similarity values of the first vector and the second vector includes:
calculating cosine distances between the first vector and the second vector;
And determining the similarity value of the first vector and the second vector according to the cosine distance.
In order to achieve the above object, the present invention further provides a data matching apparatus, including:
the acquisition module is used for acquiring a history recording resume file corresponding to a history post with the same attribute as the target post;
the word segmentation module is used for respectively inputting the candidate resume file and the history recording resume file into a preset word segmentation model to obtain a candidate word segmentation text and a history word segmentation text after word segmentation processing;
The construction module is used for respectively constructing co-occurrence matrixes according to the candidate word segmentation texts and the history word segmentation texts, and determining first keywords of the candidate word segmentation texts and second keywords of the history word segmentation texts based on the co-occurrence matrixes;
The conversion module is used for converting the first keyword and the second keyword into a first vector and a second vector respectively and obtaining similarity values of the first vector and the second vector;
the first determining module is used for determining the similarity between the candidate resume file and the history recording resume file according to the similarity value;
and the second determining module is used for determining at least one candidate resume file as a target resume file according to the similarity.
To achieve the above object, the present invention also provides a computer apparatus comprising:
A memory storing at least a computer program; and
And a processor executing the computer program stored in the memory to implement the data matching method of any one of the above.
To achieve the above object, the present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the data matching method of any one of the above.
The beneficial effects of the technical scheme are that:
According to the data matching method, the device, the computer equipment and the storage medium, the matching degree of the characteristic information of the resume file to be processed and the demand information of the target post is determined by extracting the characteristic information of the resume file to be processed and the demand information of the target post, and the candidate resume file with higher matching degree with the target post is determined by the matching degree. And determining the similarity between the candidate resume file and the history resume file by acquiring the history resume file for the history post with the same attribute as the target post, and selecting at least one candidate resume file from a plurality of processed candidate resume files as the target resume file according to the matching degree and the similarity. According to the invention, by combining the matching degree of the resume and the post and the similarity of the resume and the resume for history recording, a more accurate resume screening result is ensured, intelligent matching of the recruitment post and the delivery resume is realized, the post matching is more accurate, and the talent screening efficiency of recruiters is improved.
Drawings
FIG. 1 is a schematic flow chart of a matching degree determination according to an embodiment of the present invention;
FIG. 2 is a flow chart of feature information extraction according to an embodiment of the invention;
FIG. 3 is a schematic diagram illustrating a similarity determination process according to an embodiment of the present invention;
FIG. 4 is a functional block diagram of a data matching device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to a third data matching method in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As can be seen from the figures, with reference to fig. 1 and 2, the method specifically comprises the following steps:
S001: and acquiring the resume file to be processed, and determining the matching degree between the resume file to be processed and the target post.
In an exemplary embodiment, the server collects job-seeking requests of job seekers received by each recruitment channel through a big data platform, sorts job-seeking resume information delivered by the job seekers into a document set, and accordingly obtains resume files to be processed, and the resume files to be processed are stored in a resume library in a doc, docx, PDF, HTML format. The resume file to be processed includes a plurality of feature information, for example: talent basic information (name, age, contact, residence, etc.), educational experiences (school, professional, in-school experience, etc.), work experiences (company, work time, post, project experience, work content, etc.), desired salary, desired work place, etc. In this embodiment, the server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Further, the target position is used to characterize the recruitment position in the recruiter's recruitment request. When the recruiter sends a recruitment request to the server, the recruiter can attach demand information to add a job request to the recruitment post so as to limit the range of the recruiter. Wherein the demand information includes, but is not limited to: gender, age, academic, working years, work responsibilities, work skills, etc.
And determining the matching degree between the resume file to be processed and the target post by matching and identifying the characteristic information in the resume file to be processed and the requirement information of the target post.
Fig. 1 is a schematic flow chart of a matching degree determination in the present embodiment, which is specifically as follows:
s01: and extracting the characteristic information of the resume file to be processed and the requirement information of the target post.
When the server receives a recruitment request of a recruiter, a plurality of resume files to be processed can be obtained from the resume library, and after the resume files to be processed are obtained from the resume library, the server extracts characteristic information of the resume files to be processed.
Fig. 2 is a schematic flow chart of feature information extraction according to the present embodiment, which is specifically as follows:
S01A: and judging whether the format of the resume file to be processed meets a preset format or not.
When the resume file to be processed is obtained, reading the extension of the resume file to be processed, comparing the extension of the resume file to be processed with the extension of a preset format, and judging whether the format of the resume file to be processed is the preset format or not. For example: the json (JavaScript Object Notation ) format is preset as the preset format of the embodiment, and the extension of the preset format is set as. Json. And if the read resume file to be processed is the expansion name of the doc, comparing the doc with the json, and judging whether the format of the resume file to be processed is a preset format or not through the consistency of the doc and the json.
S01B: and if the preset format is met, reading text information in the resume file to be processed, and extracting the characteristic information from the text information.
When the format of the resume file to be processed is consistent with the preset format, text information of the resume file to be processed is directly read, the text information is used for representing characteristic information such as talent basic information, educational experience, work experience and the like in the resume file to be processed, and a plurality of characteristic information are extracted from the text information. For example: the resume file to be processed and the format of the preset format are json formats, information in the json formats exists in the form of key-value pairs, and corresponding value exists for key in the resume file to be processed. Such as: the name feature information in the talent basic information is wander if the value of the name of the corresponding key is wander if the value of the name of the corresponding key is 25, and the age feature information in the talent basic information is 25, namely, each extracted feature information exists in the form of the corresponding relation between the type and the information.
S01C: if the preset format is not met, reading resume information in the resume file to be processed, generating a conversion file with the preset format, reading conversion text information from the conversion file, and extracting the characteristic information from the conversion text information.
When the format of the resume file to be processed is inconsistent with the preset format, resume information in the resume file to be processed is read first, and the resume information is the same and used for representing characteristic information such as talent basic information, educational experience, work experience and the like in the resume file to be processed, except that the resume information does not exist in a corresponding relation of key value pairs of types and information per se. The process of reading the resume information may be a process of identifying information contained in the resume file. For example: and identifying information in the resume file to be processed through OCR (Optical Character Recognition) and optical character recognition, and taking the identified information as resume information. In addition, a preset template consistent with a preset format is preset, the resume information is firstly formed into key value pairs, the formed key value pairs are input into the preset template, and a conversion file with the preset format is generated and stored. For example: the information identified by OCR includes age: 35, sex: and (3) female, taking the age as a key and 35 as a value to form a key value pair, taking the sex as the key and female as the value to form a key value pair, storing each formed key value pair into a preset template with a preset format, and generating a conversion file so that various information in the formed conversion file exists in the form of the key value pair. It is known that the converted text information is substantially text information, and the information corresponding to each type of feature information can be determined by reading the converted text information.
S02: and inputting the characteristic information and the requirement information into a preset matching model to perform matching degree identification.
In an exemplary embodiment, a plurality of matching lists are pre-stored in the matching model, the plurality of matching lists respectively correspond to different target posts, and the matching lists store the corresponding relations of each requirement information, the score and the weight value. And inquiring a corresponding matching list from the matching model according to the target post. For example: and taking the target post as a key in advance, taking a matching list corresponding to the target post as a value, forming key value pairs, and storing each key value pair in the matching model. And inquiring the key to obtain a value (matching list) corresponding to the target post.
S03: and determining the matching score between the characteristic information and the demand information according to the matching model output result.
In an exemplary embodiment, according to the correspondence stored in the matching list, a matching score between any feature information of the resume file to be processed and the requirement information is obtained. For example: age item in demand information: the ages of 26-30 are 10 minutes, 31-35 are 8 minutes, and 36-40 are 6 minutes; the academic terms: doctor is 10 points, filling is 8 points, and scholars are 6 points; service life item: the number of the components is 10 for more than 10 years, 8 for 6 to 10 years and 6 for 3 to 5 years. The characteristic information extracted from the resume file to be processed comprises the following steps: li Mou, 29 years old, filling, 5 years old. The 26 year old corresponding match score is 10 points, the filling corresponding match score is 8 points, and the year old 5 year corresponding match score is 6 points. The score of the demand information is preset by recruiters according to service demands, and the higher the score of the characteristic information is, the higher the matching degree of the corresponding characteristic information and the target post is.
S04: and obtaining the matching degree of the resume file to be processed and the target post according to the matching score and the weight value corresponding to the requirement information.
In an exemplary embodiment, according to the correspondence stored in the matching list, a matching degree between the resume file to be processed and the target post is obtained. For example: the weight value of the age item is 0.3, the weight value of the learning item is 0.5, the weight value of the working year item is 0.6, the matching score of the age characteristic information extracted from the resume file to be processed is 10 points, the matching score of the learning characteristic information is 8 points, and the matching score of the working year characteristic information is 6 points. And the matching degree of the resume file to be processed is 10+0.3+8+0.5+6+0.6=10.6 minutes, and the higher the matching degree value of the resume file to be processed is, the higher the matching degree of the resume file to be processed and a target post is.
S002: and determining at least one resume file to be processed as the candidate resume file according to the matching degree.
In an exemplary embodiment, a preset constraint condition is obtained, the resume file to be processed corresponding to the matching degree satisfying the constraint condition is used as a candidate resume file, and the constraint condition is used for representing a threshold value applicable to the matching degree of the resume file to be processed.
The threshold may be an average of the matching degrees of the plurality of processed resume files to be processed, or set by the recruiter according to the service requirement. For example: acquiring the matching degree of a plurality of processed resume files to be processed, determining the average value of the matching degree of all resume files to be processed, setting the average value as a threshold value, and selecting the resume files to be processed corresponding to the matching degree larger than the threshold value as candidate resume files; also for example: and performing descending order sorting on the matching degrees, setting the matching degree of which the sorting is positioned at the first 20% as a threshold value, and selecting the resume files to be processed corresponding to the matching degree (namely 20% before sorting) which is larger than the threshold value as candidate resume files.
Specifically, on the basis of determining the matching degree of the resume file to be processed and the target post, the similarity of the resume file for history corresponding to the history post with the same attribute as the target post is further determined, and the resume screening accuracy is improved. And screening the resume files to be processed before similarity determination of the resume files to be processed and the resume files for the history records is carried out, screening the resume files to be processed with lower matching degree in the matching degree determination process, taking the resume files to be processed with higher matching degree screened as candidate resume files, carrying out similarity matching with the resume files for the history records, and improving the processing efficiency of the server during the similarity determination.
Fig. 3 is a schematic flow chart of a similarity determination according to the present embodiment, which is specifically as follows:
s100: and acquiring a history recording resume file corresponding to the history post with the same attribute as the target post.
In an exemplary embodiment, attribute information of the target post is extracted, and historical posts with the same attribute as the target post are queried according to the attribute information, wherein the attribute information is used for representing the type, the function and the level of the target post.
After receiving a recruiter's release recruitment request, the server divides types, functions and levels of recruitment posts in the recruitment request, and attaches the types, functions and levels obtained after division to the recruitment posts in the form of labels to serve as attribute information of the recruitment posts. For example: the recruitment positions are classified into administrative classes, research and development classes, sales classes and the like according to work responsibilities, the recruitment positions of the research and development classes are classified into development classes, test classes, operation classes and the like according to work skills, and the recruitment positions of the test classes in the research and development classes are classified into primary, intermediate, high and the like according to work years. And extracting the attribute information of the target post, and obtaining the historical post with the same attribute as the attribute information through label inquiry.
If the recruiter determines the recording resume file, the server receives a recording request of the recruiter and caches the recording request and the corresponding resume file. Acquiring historical posts with the same attribute as the target post, inquiring recruitment state of each historical post, namely, whether the historical post carries a recording request, if so, extracting a corresponding historical recording resume file in the recording request of the historical post, and if not, deleting the historical post from the acquired list.
S200: and respectively inputting the candidate resume file and the history resume file into a preset word segmentation model to obtain a candidate word segmentation text and a history word segmentation text after word segmentation processing.
In an exemplary embodiment, the word segmentation model may include a prefix dictionary and a custom dictionary. The prefix dictionary includes prefixes for each word segment in the statistical dictionary, for example: the prefixes of the universities of Beijing are respectively "north", "Beijing universities", "universities" are "universities"; the custom dictionary may be understood as a dictionary of terms that does not exist in the statistical dictionary for characterizing certain domain-specific, yo-yo words, such as: resume, work experience, etc.
S300: and respectively constructing co-occurrence matrixes according to the candidate word segmentation texts and the history word segmentation texts, and determining first keywords of the candidate word segmentation texts and second keywords of the history word segmentation texts based on the co-occurrence matrixes.
In an exemplary embodiment, the co-occurrence matrix is a vector of the current word by counting the number of co-occurrence times of the word in a window of a pre-specified size, and the number of co-occurrence words around the word is used as a vector of the current word. For example: the candidate resume text comprises the following text: "I are good at studying. "(the text includes the word" I "," good at "," study "and"); "I have good at programming. "(the word" I "," good at "," programmed "and") is included in the text. And constructing a co-occurrence matrix with the size of 7*7 according to the text in the candidate resume text.
Constructing the co-occurrence matrix according to the occurrence times of the segmented words in the candidate segmented word text, extracting the word frequency (freq) and the degree (deg) of each segmented word from the co-occurrence matrix, obtaining the score of each segmented word according to the word frequency and the degree of each segmented word, and outputting each segmented word in a descending order according to the score of each segmented word to obtain the first keyword of the candidate resume text. For example: and outputting the score of each word segment in a descending order to obtain the first n words, wherein the first 1/3 word output in the descending order is used as the first keyword of the candidate resume text.
Preferably, when the number of times that two first keywords are adjacent in the candidate word segmentation text is greater than a preset value, the two keywords are combined into a new keyword, so that similar keywords are further combined, and redundant keywords are avoided. The preset value is preset by a salesman, for example, the preset value can be 2 times. In this embodiment, the candidate word segmentation text is taken as an example to explain the same principle as the history word segmentation text.
S400: and converting the first keyword and the second keyword into a first vector and a second vector respectively, and obtaining similarity values of the first vector and the second vector.
In an exemplary embodiment, a word sequence of the first keyword is obtained, a vector a containing the word sequence and the word sequence context information is generated by reading the word sequence in a forward direction, a vector b containing the word sequence and the word sequence context information is generated by reading the word sequence in a reverse direction, the vector a and the vector b are connected, and a vector c of the word sequence and the word sequence context information and a first vector of the first keyword are obtained. The embodiment takes the first keyword to convert the first vector as an example for explanation, and the second keyword to convert the second vector to be the same principle.
Further, the cosine distances of the first vector and the second vector are obtained, and the similarity value of the first vector and the second vector is determined through the cosine distances. The cosine distance is used for representing cosine values of included angles of two vectors in the vector space, and the cosine values are used as a measure for measuring the difference between two individuals. And the closer the cosine value is to 1, the closer the included angle is to 0 degrees, which means that the higher the similarity value of the two vectors is.
Preferably, the cosine distance of this embodiment is obtained by using the following formula:
Wherein X is used for representing a first vector, xi is used for representing an ith first vector in the candidate resume file, Y is used for representing a first vector, and Yi is used for representing an ith second vector in the history recording resume file.
The cosine value range is (-1, 1), -1 indicates that the directions of the two vectors are exactly opposite, 1 indicates that the two vectors are exactly the same, 0 indicates that the two vectors are mutually independent, and other values in the range indicate similar values of the two vectors.
S500: and determining the similarity between the candidate resume file and the history resume file according to the similarity value.
In an exemplary embodiment, the cosine value is used as a similarity corresponding to the candidate resume file and the history resume file, and the larger the similarity value, the higher the similarity corresponding to the candidate resume file and the history resume file is.
S600: and determining at least one candidate resume file as a target resume file according to the similarity.
In an exemplary embodiment, the similarity is sorted in a descending order, and the candidate resume file with the first preferred sorting is pushed to the recruiter as the target resume file, or the candidate resume files with the first ten sorting can be sequentially pushed to the recruiter as the target resume file.
Preferably, at least one candidate resume file is determined to be a target resume file according to the similarity and the matching degree.
And adding the matching degree and the similarity degree, and sorting the added results in a descending order, wherein the candidate resume file with the first optimized sorting is used as a target resume file to be pushed to a recruiter, and the candidate resume files with the first ten sorted candidate resume files can be sequentially pushed to the recruiter as target resume files.
According to the invention, by combining the matching degree of the resume and the post and the similarity of the resume and the resume for history recording, a more accurate resume screening result is ensured, intelligent matching of the recruitment post and the delivery resume is realized, the post matching is more accurate, and the talent screening efficiency of recruiters is improved.
Example two
As shown in fig. 4, a functional block diagram of the data matching device of the second embodiment is shown.
The data matching device 30 includes an obtaining module 31, a word segmentation module 32, a constructing module 33, a converting module 34, a first determining module 35 and a second determining module 36. The module referred to in the present invention refers to a series of computer program segments capable of being executed by a processor and of performing a fixed function, which are stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.
The obtaining module 31 is configured to obtain a history recording resume file corresponding to a history post with the same attribute as the target post.
The word segmentation module 32 is configured to input the candidate resume file and the history recording resume file into a preset word segmentation model respectively, so as to obtain a candidate word segmentation text and a history word segmentation text after word segmentation processing.
The construction module 33 is configured to construct co-occurrence matrices according to the candidate segmented text and the history segmented text, and determine a first keyword of the candidate segmented text and a second keyword of the history segmented text based on the co-occurrence matrices.
In an exemplary embodiment, the co-occurrence matrix is respectively constructed according to the occurrence times of each word in the candidate word segmentation text and the historical word segmentation text; extracting word frequency and degree of each word in the candidate word segmentation text and the historical word segmentation text according to the co-occurrence matrix; obtaining the score of each word in the candidate word segmentation text and the historical word segmentation text according to the word frequency and the word degree; and outputting each word in the candidate word segmentation text and the historical word segmentation text in a descending order according to the score to obtain a first keyword of the candidate word segmentation text and a second keyword of the historical word segmentation text.
The conversion module 34 is configured to convert the first keyword and the second keyword into a first vector and a second vector, respectively, and obtain similarity values of the first vector and the second vector.
In an exemplary embodiment, a cosine distance of the first vector from the second vector is calculated; and determining the similarity value of the first vector and the second vector according to the cosine distance.
The first determining module 35 is configured to determine a similarity between the candidate resume file and the history recording resume file according to the similarity value.
The second determining module 36 determines at least one of the candidate resume files as a target resume file based on the similarity.
Example III
Fig. 5 is a schematic structural diagram of a computer device according to the third data matching method of the present embodiment.
In an exemplary embodiment, the computer device 40 includes, but is not limited to, a memory 41, a processor 42, and a computer program, such as a data matching program, stored in the memory 41 and executable on the processor. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a computer device and is not limiting of the computer device, and may include more or fewer components than shown, or may combine some of the components, or different components, e.g., the computer device may also include input and output devices, network access devices, buses, etc.
The memory 41 includes at least one type of computer-readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 41 may be an internal storage module of a computer device, such as a hard disk or memory of the computer device. In other embodiments, the memory 41 may also be an external storage device of a computer device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the computer device. Of course, the memory 41 may also include both internal memory modules of the computer device and external memory devices. In this embodiment, the memory 41 is typically used to store an operating system and various types of application software installed on a computer device. In addition, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The Processor 42 may be a central processing module (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 42 is the computing core and control center of the computer device, connects the various parts of the entire computer device using various interfaces and lines, and executes the operating system of the computer device and various applications, program code, etc. installed.
The processor 42 executes the operating system of the computer device as well as the various applications installed. The processor 42 executes the application program to implement the steps in the respective data matching method embodiments described above, such as steps S100, S200, S300, S400, S500, S600 shown in fig. 3.
Example IV
The present embodiment also provides a computer-readable storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs the corresponding functions. The computer readable storage medium of the present embodiment is used to store a computer program implementing the data matching method, and when executed by the processor 42, implements the data matching method of the first, second or third embodiment.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (8)
1. A method of data matching, comprising:
Acquiring a history recording resume file corresponding to a history post with the same attribute as the target post;
Respectively inputting the candidate resume file and the resume file for the history record into a preset word segmentation model to obtain a candidate word segmentation text and a history word segmentation text after word segmentation processing;
Respectively constructing co-occurrence matrixes according to the candidate word segmentation texts and the history word segmentation texts, and determining first keywords of the candidate word segmentation texts and second keywords of the history word segmentation texts based on the co-occurrence matrixes;
Converting the first keyword and the second keyword into a first vector and a second vector respectively, and obtaining similarity values of the first vector and the second vector;
according to the similarity value, determining the similarity between the candidate resume file and the resume file for history;
determining at least one candidate resume file as a target resume file according to the similarity;
before the history recording resume file corresponding to the history post with the same attribute as the target post is obtained, the method further comprises:
acquiring a resume file to be processed, and determining the matching degree between the resume file to be processed and the target post;
determining at least one resume file to be processed as the candidate resume file according to the matching degree;
The obtaining the resume file to be processed and determining the matching degree between the resume file to be processed and the target post comprise the following steps:
extracting characteristic information of the resume file to be processed and requirement information of the target post;
Inputting the characteristic information and the demand information into a preset matching model to perform matching degree identification; a plurality of matching lists are stored in the matching model in advance, the matching lists correspond to different target posts respectively, and the matching lists store the corresponding relations of the requirement information, the scores and the weight values;
determining a matching score between the characteristic information and the demand information according to the matching model output result;
And obtaining the matching degree of the resume file to be processed and the target post according to the matching score and the weight value corresponding to the requirement information.
2. The method for matching data according to claim 1, wherein the extracting the feature information of the resume file to be processed and the requirement information of the target post includes:
judging whether the format of the resume file to be processed meets a preset format or not;
If the preset format is met, reading text information in the resume file to be processed, and extracting the characteristic information from the text information;
If the preset format is not met, reading resume information in the resume file to be processed, generating a conversion file with the preset format, reading conversion text information from the conversion file, and extracting the characteristic information from the conversion text information.
3. The method for matching data according to claim 1, wherein said determining at least one resume file to be processed as the candidate resume file according to the matching degree comprises:
and taking the resume files to be processed corresponding to the threshold value of the matching degree as candidate resume files.
4. The method of claim 1, wherein the constructing co-occurrence matrices according to the candidate segmented text and the history segmented text, and determining the first keyword of the candidate segmented text and the second keyword of the history segmented text based on the co-occurrence matrices, respectively, comprises:
Respectively constructing the co-occurrence matrix according to the occurrence times of each word in the candidate word segmentation text and the historical word segmentation text;
extracting word frequency and degree of each word in the candidate word segmentation text and the historical word segmentation text according to the co-occurrence matrix;
Obtaining the score of each word in the candidate word segmentation text and the historical word segmentation text according to the word frequency and the word degree;
And outputting each word in the candidate word segmentation text and the historical word segmentation text in a descending order according to the score to obtain a first keyword of the candidate word segmentation text and a second keyword of the historical word segmentation text.
5. The method of claim 1, wherein the converting the first keyword and the second keyword into a first vector and a second vector, respectively, and obtaining the similarity value of the first vector and the second vector comprises:
calculating cosine distances between the first vector and the second vector;
And determining the similarity value of the first vector and the second vector according to the cosine distance.
6. A data matching apparatus, comprising:
the acquisition module is used for acquiring a history recording resume file corresponding to a history post with the same attribute as the target post;
the word segmentation module is used for respectively inputting the candidate resume file and the history recording resume file into a preset word segmentation model to obtain a candidate word segmentation text and a history word segmentation text after word segmentation processing;
The construction module is used for respectively constructing co-occurrence matrixes according to the candidate word segmentation texts and the history word segmentation texts, and determining first keywords of the candidate word segmentation texts and second keywords of the history word segmentation texts based on the co-occurrence matrixes;
The conversion module is used for converting the first keyword and the second keyword into a first vector and a second vector respectively and obtaining similarity values of the first vector and the second vector;
the first determining module is used for determining the similarity between the candidate resume file and the history recording resume file according to the similarity value;
the second determining module determines at least one candidate resume file as a target resume file according to the similarity;
The acquisition module is also used for acquiring a resume file to be processed and determining the matching degree between the resume file to be processed and the target post; determining at least one resume file to be processed as the candidate resume file according to the matching degree;
wherein, the acquisition module is further configured to:
extracting characteristic information of the resume file to be processed and requirement information of the target post;
Inputting the characteristic information and the demand information into a preset matching model to perform matching degree identification; a plurality of matching lists are stored in the matching model in advance, the matching lists correspond to different target posts respectively, and the matching lists store the corresponding relations of the requirement information, the scores and the weight values;
determining a matching score between the characteristic information and the demand information according to the matching model output result;
And obtaining the matching degree of the resume file to be processed and the target post according to the matching score and the weight value corresponding to the requirement information.
7. A computer device, comprising:
A memory storing at least a computer program; and
A processor executing a computer program stored in the memory to implement the data matching method as claimed in any one of claims 1 to 5.
8. A computer readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the data matching method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110912447.6A CN113627182B (en) | 2021-08-10 | 2021-08-10 | Data matching method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110912447.6A CN113627182B (en) | 2021-08-10 | 2021-08-10 | Data matching method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113627182A CN113627182A (en) | 2021-11-09 |
CN113627182B true CN113627182B (en) | 2024-07-26 |
Family
ID=78383905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110912447.6A Active CN113627182B (en) | 2021-08-10 | 2021-08-10 | Data matching method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113627182B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114240227A (en) * | 2021-12-23 | 2022-03-25 | 河北冀联人力资源服务集团有限公司 | Recruitment data matching system, method and storage medium thereof |
CN114819924B (en) * | 2022-06-28 | 2022-09-23 | 杭银消费金融股份有限公司 | Enterprise information push processing method and device based on portrait analysis |
CN117709916A (en) * | 2024-02-01 | 2024-03-15 | 武汉厚溥数字科技有限公司 | Employment information processing method and device, electronic equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111428488A (en) * | 2020-03-06 | 2020-07-17 | 平安科技(深圳)有限公司 | Resume data information analyzing and matching method and device, electronic equipment and medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112699232A (en) * | 2019-10-17 | 2021-04-23 | 北京京东尚科信息技术有限公司 | Text label extraction method, device, equipment and storage medium |
CN110851598B (en) * | 2019-10-30 | 2023-04-07 | 深圳价值在线信息科技股份有限公司 | Text classification method and device, terminal equipment and storage medium |
CN112988980B (en) * | 2021-05-12 | 2021-07-30 | 太平金融科技服务(上海)有限公司 | Target product query method and device, computer equipment and storage medium |
-
2021
- 2021-08-10 CN CN202110912447.6A patent/CN113627182B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111428488A (en) * | 2020-03-06 | 2020-07-17 | 平安科技(深圳)有限公司 | Resume data information analyzing and matching method and device, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN113627182A (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113627182B (en) | Data matching method, device, computer equipment and storage medium | |
CN108829681B (en) | Named entity extraction method and device | |
CN112417096B (en) | Question-answer pair matching method, device, electronic equipment and storage medium | |
CN108664574B (en) | Information input method, terminal equipment and medium | |
CN111475617A (en) | Event body extraction method and device and storage medium | |
CN110162754B (en) | Method and equipment for generating post description document | |
CN111984792A (en) | Website classification method and device, computer equipment and storage medium | |
CN116028722B (en) | Post recommendation method and device based on word vector and computer equipment | |
CN113590823A (en) | Contract approval method and device, storage medium and electronic equipment | |
CN115687647A (en) | Notarization document generation method and device, electronic equipment and storage medium | |
CN113868351B (en) | Address clustering method and device, electronic equipment and storage medium | |
CN113868391A (en) | Knowledge graph-based legal document generation method, device, equipment and medium | |
CN112597135A (en) | User classification method and device, electronic equipment and readable storage medium | |
Kim | Analysis of standard vocabulary use of the open government data: the case of the public data portal of Korea | |
CN115269816A (en) | Core personnel mining method and device based on information processing method and storage medium | |
US12118816B2 (en) | Continuous learning for document processing and analysis | |
CN113344125B (en) | Long text matching recognition method and device, electronic equipment and storage medium | |
CN114281991A (en) | Text classification method and device, electronic equipment and storage medium | |
CN113516094A (en) | System and method for matching document with review experts | |
CN114706927B (en) | Data batch labeling method based on artificial intelligence and related equipment | |
CN112989820B (en) | Legal document positioning method, device, equipment and storage medium | |
CN116244421A (en) | Method, device, equipment and readable storage medium for matching project names | |
CN116361681A (en) | Document classification method, device, computer equipment and medium based on artificial intelligence | |
CN115408995A (en) | Structured analysis method and system for project electronic document | |
CN114357175A (en) | Data mining system based on semantic network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |