[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111428488A - Resume data information analyzing and matching method and device, electronic equipment and medium - Google Patents

Resume data information analyzing and matching method and device, electronic equipment and medium Download PDF

Info

Publication number
CN111428488A
CN111428488A CN202010151399.9A CN202010151399A CN111428488A CN 111428488 A CN111428488 A CN 111428488A CN 202010151399 A CN202010151399 A CN 202010151399A CN 111428488 A CN111428488 A CN 111428488A
Authority
CN
China
Prior art keywords
resume
word
sequence
label
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010151399.9A
Other languages
Chinese (zh)
Other versions
CN111428488B (en
Inventor
侯丽
周慧娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010151399.9A priority Critical patent/CN111428488B/en
Publication of CN111428488A publication Critical patent/CN111428488A/en
Priority to PCT/CN2020/131916 priority patent/WO2021174919A1/en
Application granted granted Critical
Publication of CN111428488B publication Critical patent/CN111428488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a resume data information analyzing and matching method, a resume data information analyzing and matching device, electronic equipment and a medium. The method can preprocess the called resume to obtain the resume to be analyzed, construct a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary to segment the resume to be analyzed, further, the word segmentation result of the resume to be analyzed can be quickly obtained, the resume text is obtained, the co-occurrence matrix is further constructed according to the resume text, determining keywords of the resume text based on the co-occurrence matrix, acquiring word sequences in the keywords, performing word representation processing on the word sequences by using a word representation model to obtain word representations of the word sequences, improving the analytic effect, inputting the word representations into a resume label analytic model, and obtaining a resume label sequence, further calculating the similarity between each label in the resume label sequence and the label of each post to determine the resume matched with each post, and realizing quick and accurate intelligent matching of the post and the resume.

Description

Resume data information analyzing and matching method and device, electronic equipment and medium
Technical Field
The invention relates to the technical field of data processing, in particular to a resume data information analyzing and matching method, a resume data information analyzing and matching device, electronic equipment and a resume data information matching medium.
Background
In the prior art, manual screening is usually required when resume matching is performed, and resumes associated with posts are matched, which not only consumes a large amount of labor cost, but also consumes a long time.
However, the current intelligent screening of resumes only stays in the primary stage of removing some resumes which do not meet the requirements (for example, screening resumes which do not meet the learning condition), and automatic matching of the posts and the resumes cannot be realized.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a resume data information parsing and matching method, device, electronic device and medium, which can implement fast and accurate intelligent matching between a post and a resume.
A resume data information analyzing and matching method comprises the following steps:
retrieving the resume from the database, and preprocessing the retrieved resume to obtain the resume to be analyzed;
constructing a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation;
constructing a co-occurrence matrix according to the resume text subjected to word segmentation processing, and determining keywords of the resume text based on the co-occurrence matrix;
acquiring a word sequence in the keyword, and processing the word sequence by using a word representation model to obtain a word representation of the word sequence;
inputting the word representation into a constructed resume label analysis model to obtain a predicted resume label sequence;
and calculating the similarity between each label in the resume label sequence and the label of each position, and determining the resume matched with each position from the resume to be analyzed according to the calculated similarity.
According to a preferred embodiment of the present invention, the preprocessing the retrieved resume includes:
and performing stop word processing on the called resume by adopting a stop word list filtering method.
According to a preferred embodiment of the present invention, the constructing a co-occurrence matrix according to the resume text, and determining the keyword of the resume text based on the co-occurrence matrix includes:
constructing the co-occurrence matrix according to the occurrence frequency of each word segmentation in the resume text;
extracting the word frequency and the angle of each participle from the co-occurrence matrix;
calculating the score of each participle according to the word frequency and the angle of each participle;
and outputting each word segmentation in a descending order according to the score of each word segmentation to obtain the keywords of the resume text.
According to the preferred embodiment of the present invention, after obtaining the keywords of the resume text, the method further includes:
and when the adjacent times of two keywords in the same document are more than a preset value, combining the two keywords into a new keyword.
According to a preferred embodiment of the present invention, the performing word representation processing on the word sequence by using a word representation model to obtain a word representation of the word sequence includes:
inputting a word sequence in the keyword into the word representation model, and generating a first vector containing the word sequence and the above information of the word sequence by reading the word sequence in a forward direction, and generating a second vector containing the word sequence and the below information of the word sequence by reading the word sequence in a reverse direction;
and connecting the first vector and the second vector to obtain a word representation comprising the word sequence and the context information of the word sequence.
According to a preferred embodiment of the invention, the method further comprises:
acquiring resume data;
splitting the resume data to obtain a training set and a verification set;
training a CRF (cross domain similarity) model by using the verification set, and predicting a target label sequence by using a conditional log-likelihood function and a maximum score formula;
validating the target tag sequence with the validation set;
and when the target label sequence passes verification, stopping training and obtaining the resume label analysis model.
According to a preferred embodiment of the present invention, the calculating the similarity between each tag in the resume tag sequence and the tag of each position, and determining the resume matching each position from the resume to be parsed according to the calculated similarity includes:
calculating the cosine distance between each label and the label of each post;
when the cosine distance between the target label and the target position is smaller than or equal to a preset distance, calling a target resume corresponding to the target label from the resume to be analyzed;
determining that the target resume matches the target post.
A resume data information parsing and matching device, the device comprising:
the preprocessing unit is used for calling the resume from the database and preprocessing the called resume to obtain the resume to be analyzed;
the construction unit is used for constructing a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation;
the determining unit is used for constructing a co-occurrence matrix according to the resume texts subjected to word segmentation processing and determining keywords of the resume texts based on the co-occurrence matrix;
the processing unit is used for acquiring a word sequence in the keyword and processing the word sequence by using a word representation model to obtain a word representation of the word sequence;
the prediction unit is used for inputting the word representation into the constructed resume label analysis model to obtain a predicted resume label sequence;
the determining unit is further configured to calculate similarity between each tag in the resume tag sequence and each post tag, and determine a resume matched with each post from the to-be-analyzed resume according to the calculated similarity.
According to a preferred embodiment of the present invention, the preprocessing unit is specifically configured to:
and performing stop word processing on the called resume by adopting a stop word list filtering method.
According to a preferred embodiment of the present invention, the determining unit constructs a co-occurrence matrix according to the resume text, and determining the keyword of the resume text based on the co-occurrence matrix includes:
constructing the co-occurrence matrix according to the occurrence frequency of each word segmentation in the resume text;
extracting the word frequency and the angle of each participle from the co-occurrence matrix;
calculating the score of each participle according to the word frequency and the angle of each participle;
and outputting each word segmentation in a descending order according to the score of each word segmentation to obtain the keywords of the resume text.
According to a preferred embodiment of the invention, the apparatus further comprises:
and the merging unit is used for merging the two keywords into a new keyword when the adjacent times of the two keywords in the same document are more than a preset value after the keywords of the resume text are obtained.
According to a preferred embodiment of the present invention, the processing unit is specifically configured to:
inputting a word sequence in the keyword into the word representation model, and generating a first vector containing the word sequence and the above information of the word sequence by reading the word sequence in a forward direction, and generating a second vector containing the word sequence and the below information of the word sequence by reading the word sequence in a reverse direction;
and connecting the first vector and the second vector to obtain a word representation comprising the word sequence and the context information of the word sequence.
According to a preferred embodiment of the invention, the apparatus further comprises:
an acquisition unit configured to acquire resume data;
the splitting unit is used for splitting the resume data to obtain a training set and a verification set;
the training unit is used for training a CRF model by using the verification set and predicting a target label sequence by using a conditional log-likelihood function and a maximum score formula;
a verification unit for verifying the target tag sequence with the verification set;
and the training unit is also used for stopping training and obtaining the resume label analysis model when the target label sequence passes verification.
According to a preferred embodiment of the present invention, the determining unit calculates the similarity between each tag in the resume tag sequence and the tag of each position, and determines the resume matching each position from the resume to be parsed according to the calculated similarity includes:
calculating the cosine distance between each label and the label of each post;
when the cosine distance between the target label and the target position is smaller than or equal to a preset distance, calling a target resume corresponding to the target label from the resume to be analyzed;
determining that the target resume matches the target post.
An electronic device, the electronic device comprising:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the resume data information analysis and matching method.
A computer-readable storage medium having at least one instruction stored therein, the at least one instruction being executable by a processor in an electronic device to implement the resume data information parsing and matching method.
It can be seen from the above technical solutions that the present invention can retrieve resumes from a database, preprocess the retrieved resumes to obtain resumes to be parsed, construct a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, segment the resumes to be parsed according to the constructed word segmentation directed acyclic graph to obtain resume texts after word segmentation processing, further rapidly obtain word segmentation results of the resumes to be parsed, further construct a co-occurrence matrix according to the resume texts, determine keywords of the resume texts based on the co-occurrence matrix, obtain word sequences in the keywords, process the word sequences by using a word representation model to obtain word representations of the word sequences, enhance the parsing effect, input the word representations into the constructed resume label parsing model to obtain predicted resume label sequences, further calculate the similarity between each label in the resume label sequences and each post label, and determining the resume matched with each post from the resumes to be analyzed according to the calculated similarity, thereby realizing the quick and accurate intelligent matching of the posts and the resumes.
Drawings
FIG. 1 is a flowchart illustrating a method for parsing and matching resume data information according to a preferred embodiment of the present invention.
FIG. 2 is a functional block diagram of a preferred embodiment of the resume data information parsing and matching apparatus of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device implementing a resume data information parsing and matching method according to a preferred embodiment of the invention.
FIG. 4 is a diagram of a co-occurrence matrix in a preferred embodiment of the method for parsing and matching resume data information according to the invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a flowchart illustrating a method for parsing and matching resume data information according to a preferred embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
The resume data information analyzing and matching method is applied to one or more electronic devices, which are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), an intelligent wearable device, and the like.
The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a cloud computing (cloud computing) based cloud consisting of a large number of hosts or network servers.
The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
And S10, retrieving the resume from the database, and preprocessing the retrieved resume to obtain the resume to be analyzed.
In at least one embodiment of the present invention, the database may be a database in communication with the electronic device, or an internal database of the electronic device, and may be configured in a customized manner according to different requirements.
For example: the database may be a talent bank. And the electronic equipment calls and arranges the resume from the talent library to obtain a large number of resumes. The resume may be summarized as a set of nouns { name, gender, birthday, political, school, academic, specialty, contact, native, educational, skills … … }, each of which has an expanded description and each of which has a separator. Due to the specificity of social behavior of job hunting and human-to-human simulation, many job hunters have considerable commonality in describing their own characteristics. The electronic equipment analyzes the resume containing the contents of interest and concern of the resume selector from a large number of resumes with commonalities to form a limited resume set which is approximately converged and is used as the called resume.
In at least one embodiment of the invention, since the same person is likely to send a plurality of resumes in the job hunting process, the electronic device can firstly remove the repeated resumes, so as to realize the deduplication of the resumes.
Further, since there are some redundant stop words in the resume, which also adversely affect the parsing, it is also necessary to eliminate the stop words, i.e. to pre-process the called resume.
Specifically, the electronic device preprocessing the retrieved resume includes:
and the electronic equipment adopts a deactivation vocabulary filtering method to perform deactivation word processing on the called resume.
The stop words are words without practical meaning in the text data function words, have no influence on the classification of the text, but have high occurrence frequency, and specifically include common pronouns, prepositions and the like. The stop words may reduce the accuracy of the text classification effect.
Further, the electronic device may match words in the called resume with a pre-constructed stop word list one by one, and if the matching is successful, the word is a stop word, and the electronic device deletes the word.
And S11, constructing a word segmentation directed acyclic graph according to the pre-constructed word segmentation dictionary, and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain the resume text subjected to word segmentation.
In at least one embodiment of the present invention, the segmentation dictionary may include a prefix dictionary, a custom dictionary, or the like.
Wherein the prefix dictionary includes prefixes of each participle in a statistical dictionary, such as: prefixes of the word "Beijing university" in the dictionary are "Beijing", "Beijing Dada", respectively; the word "university" is prefixed by "large"; the custom dictionary may also be called a proper noun dictionary, which is a word that does not exist in the statistical dictionary but is specific and special in a certain field, such as resume, work experience, and the like.
Further, the electronic device constructs a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, wherein each word corresponds to one directed edge in the graph and is assigned with a corresponding edge length (weight). Furthermore, the electronic device obtains length values in all paths from the starting point to the end point, and arranges the length values in a strict ascending order (that is, the values at any two different positions are not equal and are the same hereinafter), and the length values are sequentially the 1 st, 2 nd, … th, i th, … th and N th path sets as corresponding rough-scoring result sets. If the lengths of two or more paths are equal, the lengths of the two or more paths are listed as the ith, the coarse scoring result set is listed, the arrangement serial numbers of other paths are not influenced, and the size of the final coarse scoring result set is larger than or equal to N, so that the resume text subjected to word segmentation is obtained.
Through the implementation mode, the word segmentation result of the resume text can be quickly obtained by utilizing the word segmentation dictionary and the directed acyclic graph.
S12, constructing a co-occurrence matrix according to the resume text subjected to word segmentation processing, and determining the key words of the resume text based on the co-occurrence matrix.
In at least one embodiment of the present invention, the electronic device constructs a co-occurrence matrix according to the resume text, and determining the keyword of the resume text based on the co-occurrence matrix includes:
the electronic equipment constructs the co-occurrence matrix according to the occurrence frequency of each participle in the resume text, extracts the word frequency (freq) and degree (deg) of each participle from the co-occurrence matrix, calculates the score of each participle according to the word frequency and degree of each participle, and further outputs each participle in a descending order according to the score of each participle to obtain the keywords of the resume text.
For example: and the electronic equipment outputs each word segmentation in a descending order according to the score of each word segmentation to obtain the top n words, and for example, the top 1/3 words are output in a descending order according to the size of score to serve as the keywords of the resume text.
The co-occurrence matrix is obtained by counting the co-occurrence times of words in a window with a preset size, and taking the times of the co-occurrence words around the words as the vector of the current words.
For example, when the resume text has the following corpora:
i are adept at studying. (the language material includes the participles of "I", "good", "research" and "", the two language materials below adopt similar participles and will not be listed one by one)
I adept at programming.
I enjoy reading.
The co-occurrence matrix X constructed according to the corpus in the resume text is shown in fig. 4. In at least one embodiment of the present invention, after obtaining the keywords of the resume text, the method further includes:
and when the adjacent times of two keywords in the same document are more than a preset value, the electronic equipment combines the two keywords into a new keyword.
Wherein the preset value may be 2 times and the like.
Through the implementation mode, similar keywords can be further combined, and redundant keywords are avoided.
And S13, acquiring the word sequence in the keyword, and performing word representation processing on the word sequence by using a word representation model to obtain the word representation of the word sequence.
In at least one embodiment of the present invention, the processing, by the electronic device, the word sequence by using a word representation model to obtain a word representation of the word sequence includes:
the electronic equipment inputs a word sequence in the keyword into the word representation model, generates a first vector containing the word sequence and the context information of the word sequence by reading the word sequence in a forward direction, generates a second vector containing the word sequence and the context information of the word sequence by reading the word sequence in a reverse direction, and connects the first vector and the second vector to obtain a word representation containing the word sequence and the context information of the word sequence.
For example: word sequence Char ═ for a given unstructured text resume containing n keywords (Char)1,char2…,charn) Wherein charnInputting the unstructured text word sequence into a word representation model, modeling the word sequence by using the word representation model, and reading the word sequence in the forward direction to generate a vector containing the word sequence and the text information on the word sequence, wherein the vector is represented by CharFiSimilarly, by reading the word sequence in reverse, a vector is generated comprising the word sequence and information underlying the word sequence, denoted CharBiThen, the CharFiAnd CharBiConcatenating to form a word representation comprising the word sequence and context information:
Wd=[CharFi:CharBi]
accordingly, the electronic device obtains a word representation of the word sequence.
In natural language processing, symbolic information such as "words" can be expressed in a mathematical vector form using various word expression models. The vector representation of the word may be used as input to various machine learning models. Existing word representation models can include two broad categories: one is syntagmatic models and one is paradigmatic models.
Further, for the word expression, the electronic device may further perform formatting processing on the word expression by using regular expression matching, and further analyze, classify and store the word expression in a designated database for subsequent use.
And S14, inputting the word representation into the constructed resume label analysis model to obtain a predicted resume label sequence.
In at least one embodiment of the present invention, the resume label analysis model is obtained by training a large amount of resume data as a training sample and performing verification with a verification set. And analyzing the unstructured word representation by using the resume label analysis model, and outputting corresponding labels to form the resume label sequence.
For example: the tags in the resume tag sequence may include, but are not limited to: this student, the research student, the skilled WORD, etc.
In at least one embodiment of the invention, the method further comprises:
the electronic equipment acquires resume data, splits the resume data to obtain a training set and a verification set, further trains a CRF (learning random number) model by using the verification set, predicts a target label sequence by adopting a conditional log-likelihood function and a maximum score formula, verifies the target label sequence by using the verification set, and stops training and obtains a resume label analysis model when the target label sequence passes verification.
Wherein, the label sequence refers to the predicted most suitable label sequence.
Specifically, the electronic device is modeled using a CRF (conditional random field). Assume that the output target sequence (i.e. the corresponding tag sequence) for obtaining the keyword information of the unstructured text is: y ═ y1,…yn). In order to effectively obtain the target sequence of the unstructured text resume information, the score formula of the model is defined as follows:
Figure BDA0002402557760000111
wherein, P represents the output score moment of the bidirectional L STM algorithm (L ong short-term memory algorithm)An array of size n × k, k representing the number of target tags that are summary ratings for the resume, n representing the length of the word sequence, a representing the transition score matrix, when j is 0, y0Indicating the start of a sequence, when j equals n, yn+1Indicating an end of sequence marker, the size of the a square matrix is k + 2.
The probability of generating the target sequence y by the CRF is as follows on the label sequences of all resume information:
Figure BDA0002402557760000121
wherein, YWdRepresenting all possible tag sequences corresponding to the resume information sequence Wd. In the training process, in order to obtain the label sequence with correct resume information, a conditional log-likelihood function which maximizes the correct label sequence is adopted for calculation, and the most suitable label sequence is predicted by using a maximum score formula:
Figure BDA0002402557760000122
through the implementation mode, the accuracy of the model can be improved by combining the conditional log-likelihood function and the maximum score formula.
And S15, calculating the similarity between each label in the resume label sequence and the label of each position, and determining the resume matched with each position from the resume to be analyzed according to the calculated similarity.
In at least one embodiment of the present invention, the electronic device calculates similarity between each tag in the resume tag sequence and a tag of each position, and determining the resume matching each position from the resume to be parsed according to the calculated similarity includes:
and the electronic equipment calculates the cosine distance between each label and the label of each post, and when the cosine distance between a target label and the target post is smaller than or equal to a preset distance, the electronic equipment calls the target resume corresponding to the target label from the resume to be analyzed and determines that the target resume is matched with the target post.
Specifically, the cosine distance is a measure for measuring the difference between two individuals by using a cosine value of an included angle between two vectors in a vector space, and the closer the cosine value is to 1, the closer the included angle is to 0 degree, namely the more similar the two vectors are.
For example: calculating the obtained resume label sequence X and resume label sequence Y required by the position of job entry by using the following formula, wherein X isiRepresents the ith vector, Y, in the resume tag sequence XiThe ith vector in the resume label sequence Y required for representing the position of job entry:
Figure BDA0002402557760000131
the similarity range produced is from-1 to 1, where-1 means that the two vectors point in exactly opposite directions, 1 means that their points are exactly the same, 0 usually means that they are independent, and the value between them means moderate similarity or dissimilarity, according to which algorithm we can choose the highly similar profile of the label for each position to enter into the job of fast matching.
In at least one embodiment of the present invention, the electronic device may further express the resume label sequence by the score according to the obtained resume label sequence and the corresponding configured weight (for example, the weight occupied by the student label in the resume score is 0.2, and the weight occupied by the student label in the resume score is 0.1), and further quickly screen out the required staff according to the score.
It can be seen from the above technical solutions that the present invention can retrieve resumes from a database, preprocess the retrieved resumes to obtain resumes to be parsed, construct a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, segment the resumes to be parsed according to the constructed word segmentation directed acyclic graph to obtain resume texts after the word segmentation processing, further rapidly obtain word segmentation results of the resumes to be parsed, further construct a co-occurrence matrix according to the resume texts, determine keywords of the resume texts based on the co-occurrence matrix, obtain word sequences in the keywords, process the word sequences by using a word representation model to obtain word representations of the word sequences, enhance the parsing effect, input the word representations into the constructed resume label parsing model to obtain predicted resume label sequences, further calculate the similarity between each label in the resume label sequences and each post label, and determining the resume matched with each post from the resumes to be analyzed according to the calculated similarity, thereby realizing the quick and accurate intelligent matching of the posts and the resumes.
Fig. 2 is a functional block diagram of a preferred embodiment of the resume data information parsing and matching apparatus according to the present invention. The resume data information analyzing and matching device 11 includes a preprocessing unit 110, a constructing unit 111, a determining unit 112, a processing unit 113, a predicting unit 114, a merging unit 115, a training unit 116, an obtaining unit 117, a splitting unit 118, and a verifying unit 119. The module/unit referred to in the present invention refers to a series of computer program segments that can be executed by the processor 13 and that can perform a fixed function, and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.
The preprocessing unit 110 retrieves the resume from the database and preprocesses the retrieved resume to obtain the resume to be analyzed.
In at least one embodiment of the present invention, the database may be a database in communication with an electronic device, or an internal database of the electronic device, and may be configured in a customized manner according to different requirements.
For example: the database may be a talent bank. The preprocessing unit 110 retrieves and arranges the resume from the talent bank to obtain a large number of resumes. The resume may be summarized as a set of nouns { name, gender, birthday, political, school, academic, specialty, contact, native, educational, skills … … }, each of which has an expanded description and each of which has a separator. Due to the specificity of social behavior of job hunting and human-to-human simulation, many job hunters have considerable commonality in describing their own characteristics. The preprocessing unit 110 parses out the resume including the content of interest and concern of the resume picker from a large number of resumes having commonalities, and forms a limited resume set that is approximately converged as the retrieved resume.
In at least one embodiment of the invention, since the same person is likely to send a plurality of resumes in the job hunting process, repeated resumes can be removed first, thereby implementing the deduplication of resumes.
Further, since there are some redundant stop words in the resume, which also adversely affect the parsing, it is also necessary to eliminate the stop words, i.e. to pre-process the called resume.
Specifically, the preprocessing unit 110 preprocesses the retrieved resume, including:
the preprocessing unit 110 performs stop word processing on the called resume by using a stop word list filtering method.
The stop words are words without practical meaning in the text data function words, have no influence on the classification of the text, but have high occurrence frequency, and specifically include common pronouns, prepositions and the like. The stop words may reduce the accuracy of the text classification effect.
Further, the preprocessing unit 110 may match a word in the called resume with a pre-constructed stop word list one by one, and if the matching is successful, the word is a stop word, and the preprocessing unit 110 deletes the word.
The construction unit 111 constructs a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, and segments the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation.
In at least one embodiment of the present invention, the segmentation dictionary may include a prefix dictionary, a custom dictionary, or the like.
Wherein the prefix dictionary includes prefixes of each participle in a statistical dictionary, such as: prefixes of the word "Beijing university" in the dictionary are "Beijing", "Beijing Dada", respectively; the word "university" is prefixed by "large"; the custom dictionary may also be called a proper noun dictionary, which is a word that does not exist in the statistical dictionary but is specific and special in a certain field, such as resume, work experience, and the like.
Further, the constructing unit 111 constructs a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, wherein each word corresponds to one directed edge in the graph and is assigned to a corresponding edge length (weight). Further, the building unit 111 determines length values of all paths from the starting point to the ending point, and arranges the length values in a strictly ascending order (i.e. the values at any two different positions are not equal, and the same below), and sequentially sets the 1 st, 2 nd, … th, i th, … th and N th paths as corresponding rough-scoring result sets. If the lengths of two or more paths are equal, the lengths of the two or more paths are listed as the ith, the coarse scoring result set is listed, the arrangement serial numbers of other paths are not influenced, and the size of the final coarse scoring result set is larger than or equal to N, so that the resume text subjected to word segmentation is obtained.
Through the implementation mode, the word segmentation result of the resume text can be quickly obtained by utilizing the word segmentation dictionary and the directed acyclic graph.
The determining unit 112 constructs a co-occurrence matrix according to the resume text, and determines a keyword of the resume text based on the co-occurrence matrix.
In at least one embodiment of the present invention, the determining unit 112 constructs a co-occurrence matrix according to the resume text, and determining the keyword of the resume text based on the co-occurrence matrix includes:
the determining unit 112 constructs the co-occurrence matrix according to the occurrence frequency of each participle in the resume text, extracts the word frequency (freq) and degree (deg) of each participle from the co-occurrence matrix, and the determining unit 112 calculates the score of each participle according to the word frequency and degree of each participle, and further performs descending output on each participle according to the score of each participle to obtain the keyword of the resume text.
For example: the determining unit 112 outputs each word in a descending order according to the score of each word, to obtain the top n words, for example, outputting the top 1/3 words in a descending order according to the score size as the keywords of the resume text.
The co-occurrence matrix is obtained by counting the co-occurrence times of words in a window with a preset size, and taking the times of the co-occurrence words around the words as the vector of the current words.
For example, when the resume text has the following corpora:
i are adept at studying. (the language material includes the participles of "I", "good", "research" and "", the two language materials below adopt similar participles and will not be listed one by one)
I adept at programming.
I enjoy reading.
The co-occurrence matrix X constructed according to the corpus in the resume text is shown in fig. 4. In at least one embodiment of the present invention, after obtaining the keywords of the resume text, the method further includes:
when the number of times that two keywords are adjacent in the same document is greater than a preset value, the merging unit 115 merges the two keywords into a new keyword.
Wherein the preset value may be 2 times and the like.
Through the implementation mode, similar keywords can be further combined, and redundant keywords are avoided.
The processing unit 113 obtains a word sequence in the keyword, and performs word representation processing on the word sequence by using a word representation model to obtain a word representation of the word sequence.
In at least one embodiment of the present invention, the processing unit 113 processes the word sequence by using a word representation model, and obtaining a word representation of the word sequence includes:
the processing unit 113 inputs a word sequence in the keyword into the word representation model, generates a first vector including the word sequence and the context information of the word sequence by reading the word sequence in a forward direction, generates a second vector including the word sequence and the context information of the word sequence by reading the word sequence in a reverse direction, and the processing unit 113 connects the first vector and the second vector to obtain a word representation including the word sequence and the context information of the word sequence.
For example: word sequence Char ═ for a given unstructured text resume containing n keywords (Char)1,char2…,charn) Wherein charnInputting the unstructured text word sequence into a word representation model, modeling the word sequence by using the word representation model, and reading the word sequence in the forward direction to generate a vector containing the word sequence and the text information on the word sequence, wherein the vector is represented by CharFiSimilarly, by reading the word sequence in reverse, a vector is generated comprising the word sequence and information underlying the word sequence, denoted CharBiThen, the CharGiAnd CharBiConcatenating to form a word representation comprising the word sequence and context information:
Wd=[CharFi:CharBi]
accordingly, the processing unit 113 obtains a word representation of the word sequence.
In natural language processing, symbolic information such as "words" can be expressed in a mathematical vector form using various word expression models. The vector representation of the word may be used as input to various machine learning models. Existing word representation models can include two broad categories: one is syntagmatic models and one is paradigmatic models.
Further, for the word expression, the electronic device may further perform formatting processing on the word expression by using regular expression matching, and further analyze, classify and store the word expression in a designated database for subsequent use.
The prediction unit 114 inputs the word representation into the constructed resume label analysis model to obtain a predicted resume label sequence.
In at least one embodiment of the present invention, the resume label analysis model is obtained by training a large amount of resume data as a training sample and performing verification with a verification set. And analyzing the unstructured word representation by using the resume label analysis model, and outputting corresponding labels to form the resume label sequence.
For example: the tags in the resume tag sequence may include, but are not limited to: this student, the research student, the skilled WORD, etc.
In at least one embodiment of the invention, training the resume label parsing model comprises:
the obtaining unit 117 obtains resume data, the splitting unit 118 splits the resume data to obtain a training set and a verification set, further, the verification unit 119 trains a CRF model by using the verification set, the training unit 116 predicts a target label sequence by using a conditional log-likelihood function and a maximum score formula, verifies the target label sequence by using the verification set, and when the target label sequence passes verification, the training unit 116 stops training and obtains the resume label analysis model.
Wherein, the label sequence refers to the predicted most suitable label sequence.
Specifically, the training unit 116 uses a CRF (conditional random field) for modeling. Assume that the output target sequence (i.e. the corresponding tag sequence) for obtaining the keyword information of the unstructured text is: y ═ y1,…yn). In order to effectively obtain the target sequence of the unstructured text resume information, the score formula of the model is defined as follows:
Figure BDA0002402557760000181
where P denotes the output score matrix of the bi-directional L STM algorithm (L ong short-term memory algorithm), with the size n × k, k denotes the number of target tags that are summary ratings for the resume, n denotes the length of the word sequence, a denotes the transition score matrix, when j is 0, y denotes the transition score matrix0It is indicated that a flag for the start of a sequence,when j is n, yn+1Indicating an end of sequence marker, the size of the a square matrix is k + 2.
The probability of generating the target sequence y by the CRF is as follows on the label sequences of all resume information:
Figure BDA0002402557760000182
wherein, YWdRepresenting all possible tag sequences corresponding to the resume information sequence Wd. In the training process, in order to obtain the label sequence with correct resume information, the training unit 116 calculates a conditional log-likelihood function that maximizes the correct label sequence, and predicts the most suitable label sequence using a maximum score formula:
Figure BDA0002402557760000191
through the implementation mode, the accuracy of the model can be improved by combining the conditional log-likelihood function and the maximum score formula.
The determining unit 112 calculates the similarity between each label in the resume label sequence and the label of each position, and determines the resume matched with each position from the resume to be analyzed according to the calculated similarity.
In at least one embodiment of the present invention, the determining unit 112 calculates a similarity between each tag in the resume tag sequence and a tag of each position, and determining the resume matching each position from the resume to be parsed according to the calculated similarity includes:
the determining unit 112 calculates a cosine distance between each tag and each post tag, and when the cosine distance between a target tag and a target post is less than or equal to a preset distance, the determining unit 112 retrieves a target resume corresponding to the target tag from the resume to be analyzed, and determines that the target resume is matched with the target post.
Specifically, the cosine distance is a measure for measuring the difference between two individuals by using a cosine value of an included angle between two vectors in a vector space, and the closer the cosine value is to 1, the closer the included angle is to 0 degree, namely the more similar the two vectors are.
For example: calculating the obtained resume label sequence X and resume label sequence Y required by the position of job entry by using the following formula, wherein X isiRepresents the ith vector, Y, in the resume tag sequence XiThe ith vector in the resume label sequence Y required for representing the position of job entry:
Figure BDA0002402557760000192
the similarity range produced is from-1 to 1, where-1 means that the two vectors point in exactly opposite directions, 1 means that their points are exactly the same, 0 usually means that they are independent, and the value between them means moderate similarity or dissimilarity, according to which algorithm we can choose the highly similar profile of the label for each position to enter into the job of fast matching.
In at least one embodiment of the present invention, the determining unit 112 may further express the resume label sequence by the score according to the obtained resume label sequence and the corresponding configured weight (for example, the weight occupied by the student label in the resume score is 0.2, and the weight occupied by the student label in the resume score is 0.1), and further quickly screen out the required staff according to the score.
The technical scheme can show that the method can call the resume from the database, preprocesses the called resume to obtain the resume to be analyzed, constructs a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, segments the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain the resume text, further can quickly obtain the word segmentation result of the resume to be analyzed, further constructs a co-occurrence matrix according to the resume text, determines the keywords of the resume text based on the co-occurrence matrix, obtains the word sequences in the keywords, processes the word sequences by using a word representation model to obtain the word representation of the word sequences, improves the analysis effect, inputs the word representation into the constructed resume label analysis model to obtain the predicted resume label sequence, further calculates the similarity between each label in the resume label sequence and each post label, and determining the resume matched with each post from the resumes to be analyzed according to the calculated similarity, thereby realizing the quick and accurate intelligent matching of the posts and the resumes.
Fig. 3 is a schematic structural diagram of an electronic device implementing a resume data information parsing and matching method according to a preferred embodiment of the present invention.
The electronic device 1 may include a memory 12, a processor 13 and a bus, and may further include a computer program, such as a resume data information parsing and matching program, stored in the memory 12 and executable on the processor 13.
It will be understood by those skilled in the art that the schematic diagram is merely an example of the electronic device 1, and does not constitute a limitation to the electronic device 1, the electronic device 1 may have a bus-type structure or a star-type structure, the electronic device 1 may further include more or less hardware or software than those shown in the figures, or different component arrangements, for example, the electronic device 1 may further include an input and output device, a network access device, and the like.
It should be noted that the electronic device 1 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
The memory 12 includes at least one type of readable storage medium, which includes flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, for example a removable hard disk of the electronic device 1. The memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 12 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of resume data information analysis and matching programs, but also to temporarily store data that has been output or is to be output.
The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the electronic device 1 by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing a resume data information parsing and matching program and the like) stored in the memory 12 and calling data stored in the memory 12.
The processor 13 executes an operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps in the above embodiments of resume data information parsing and matching method, such as steps S10, S11, S12, S13, S14, and S15 shown in fig. 1.
Alternatively, the processor 13, when executing the computer program, implements the functions of the modules/units in the above device embodiments, for example:
retrieving the resume from the database, and preprocessing the retrieved resume to obtain the resume to be analyzed;
constructing a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation;
constructing a co-occurrence matrix according to the resume text after word segmentation processing, and determining keywords of the resume text based on the co-occurrence matrix;
acquiring a word sequence in the keyword, and processing the word sequence by using a word representation model to obtain a word representation of the word sequence;
inputting the word representation into a constructed resume label analysis model to obtain a predicted resume label sequence;
and calculating the similarity between each label in the resume label sequence and the label of each position, and determining the resume matched with each position from the resume to be analyzed according to the calculated similarity.
Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the electronic device 1. For example, the computer program may be partitioned into a preprocessing unit 110, a construction unit 111, a determination unit 112, a processing unit 113, a prediction unit 114, a merging unit 115, a training unit 116, an acquisition unit 117, a splitting unit 118, a verification unit 119.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented.
Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 or the like.
Although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 13 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard (Keyboard)), optionally, a standard wired interface, a wireless interface, optionally, in some embodiments, the Display may be an L ED Display, a liquid crystal Display, a touch-sensitive liquid crystal Display, an O L ED (Organic light-Emitting Diode) touch-sensitive device, etc.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
Fig. 3 only shows the electronic device 1 with components 12-13, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
Referring to fig. 1, the memory 12 in the electronic device 1 stores a plurality of instructions to implement a resume data information parsing and matching method, and the processor 13 can execute the plurality of instructions to implement:
retrieving the resume from the database, and preprocessing the retrieved resume to obtain the resume to be analyzed;
constructing a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation;
constructing a co-occurrence matrix according to the resume text after word segmentation processing, and determining keywords of the resume text based on the co-occurrence matrix;
acquiring a word sequence in the keyword, and processing the word sequence by using a word representation model to obtain a word representation of the word sequence;
inputting the word representation into a constructed resume label analysis model to obtain a predicted resume label sequence;
and calculating the similarity between each label in the resume label sequence and the label of each position, and determining the resume matched with each position from the resume to be analyzed according to the calculated similarity.
Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A resume data information analyzing and matching method is characterized by comprising the following steps:
retrieving the resume from the database, and preprocessing the retrieved resume to obtain the resume to be analyzed;
constructing a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary, and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation;
constructing a co-occurrence matrix according to the resume text subjected to word segmentation processing, and determining keywords of the resume text based on the co-occurrence matrix;
acquiring a word sequence in the keyword, and performing word representation processing on the word sequence by using a word representation model to obtain a word representation of the word sequence;
inputting the word representation into a constructed resume label analysis model to obtain a predicted resume label sequence;
and calculating the similarity between each label in the resume label sequence and the label of each position, and determining the resume matched with each position from the resume to be analyzed according to the calculated similarity.
2. The resume data information parsing and matching method of claim 1, wherein the preprocessing the retrieved resume comprises:
and performing stop word processing on the called resume by adopting a stop word list filtering method.
3. The resume data information parsing and matching method of claim 1, wherein the constructing a co-occurrence matrix according to the resume text and determining keywords of the resume text based on the co-occurrence matrix comprises:
constructing the co-occurrence matrix according to the occurrence frequency of each word segmentation in the resume text;
extracting the word frequency and the angle of each participle from the co-occurrence matrix;
calculating the score of each participle according to the word frequency and the angle of each participle;
and outputting each word segmentation in a descending order according to the score of each word segmentation to obtain the keywords of the resume text.
4. The resume data information parsing and matching method of claim 3, wherein after obtaining the keywords of the resume text, the method further comprises:
and when the adjacent times of two keywords in the same document are more than a preset value, combining the two keywords into a new keyword.
5. The resume data information parsing and matching method of claim 1, wherein the performing word representation processing on the word sequence by using a word representation model to obtain word representations of the word sequence comprises:
inputting a word sequence in the keyword into the word representation model, and generating a first vector containing the word sequence and the above information of the word sequence by reading the word sequence in a forward direction, and generating a second vector containing the word sequence and the below information of the word sequence by reading the word sequence in a reverse direction;
and connecting the first vector and the second vector to obtain a word representation comprising the word sequence and the context information of the word sequence.
6. The resume data information parsing and matching method of claim 1, wherein the method further comprises:
acquiring resume data;
splitting the resume data to obtain a training set and a verification set;
training a CRF (cross domain similarity) model by using the verification set, and predicting a target label sequence by using a conditional log-likelihood function and a maximum score formula;
validating the target tag sequence with the validation set;
and when the target label sequence passes verification, stopping training and obtaining the resume label analysis model.
7. The resume data information parsing and matching method of claim 1, wherein the calculating the similarity between each tag in the resume tag sequence and each post tag, and determining the resume matching each post from the resume to be parsed according to the calculated similarity comprises:
calculating the cosine distance between each label and the label of each post;
when the cosine distance between the target label and the target position is smaller than or equal to a preset distance, calling a target resume corresponding to the target label from the resume to be analyzed;
determining that the target resume matches the target post.
8. A resume data information parsing and matching device is characterized in that the device comprises:
the preprocessing unit is used for calling the resume from the database and preprocessing the called resume to obtain the resume to be analyzed;
the construction unit is used for constructing a word segmentation directed acyclic graph according to a pre-constructed word segmentation dictionary and segmenting the resume to be analyzed according to the constructed word segmentation directed acyclic graph to obtain a resume text after word segmentation;
the determining unit is used for constructing a co-occurrence matrix according to the resume texts subjected to word segmentation processing and determining keywords of the resume texts based on the co-occurrence matrix;
the processing unit is used for acquiring a word sequence in the keyword, and performing word representation processing on the word sequence by using a word representation model to obtain a word representation of the word sequence;
the prediction unit is used for inputting the word representation into the constructed resume label analysis model to obtain a predicted resume label sequence;
the determining unit is further configured to calculate similarity between each tag in the resume tag sequence and each post tag, and determine a resume matched with each post from the to-be-analyzed resume according to the calculated similarity.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing at least one instruction; and
a processor executing instructions stored in the memory to implement the resume data information parsing and matching method of any of claims 1 to 7.
10. A computer-readable storage medium characterized by: the computer-readable storage medium stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the resume data information parsing and matching method according to any one of claims 1 to 7.
CN202010151399.9A 2020-03-06 2020-03-06 Resume data information analysis and matching method and device, electronic equipment and medium Active CN111428488B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010151399.9A CN111428488B (en) 2020-03-06 2020-03-06 Resume data information analysis and matching method and device, electronic equipment and medium
PCT/CN2020/131916 WO2021174919A1 (en) 2020-03-06 2020-11-26 Method and apparatus for analysis and matching of resume data information, electronic device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010151399.9A CN111428488B (en) 2020-03-06 2020-03-06 Resume data information analysis and matching method and device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN111428488A true CN111428488A (en) 2020-07-17
CN111428488B CN111428488B (en) 2024-10-22

Family

ID=71546173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010151399.9A Active CN111428488B (en) 2020-03-06 2020-03-06 Resume data information analysis and matching method and device, electronic equipment and medium

Country Status (2)

Country Link
CN (1) CN111428488B (en)
WO (1) WO2021174919A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737969A (en) * 2020-07-27 2020-10-02 北森云计算有限公司 Resume parsing method and system based on deep learning
CN111782772A (en) * 2020-07-24 2020-10-16 平安银行股份有限公司 Text automatic generation method, device, equipment and medium based on OCR technology
CN112052670A (en) * 2020-08-28 2020-12-08 丰图科技(深圳)有限公司 Address text word segmentation method and device, computer equipment and storage medium
CN112115705A (en) * 2020-09-23 2020-12-22 普信恒业科技发展(北京)有限公司 Method and device for screening electronic resume
CN112380344A (en) * 2020-11-19 2021-02-19 平安科技(深圳)有限公司 Text classification method, topic generation method, device, equipment and medium
CN112395408A (en) * 2020-11-19 2021-02-23 平安科技(深圳)有限公司 Stop word list generation method and device, electronic equipment and storage medium
CN112632227A (en) * 2020-12-30 2021-04-09 北京百度网讯科技有限公司 Resume matching method, resume matching device, electronic equipment, storage medium and program product
CN113011155A (en) * 2021-03-16 2021-06-22 北京百度网讯科技有限公司 Method, apparatus, device, storage medium and program product for text matching
CN113297845A (en) * 2021-06-21 2021-08-24 南京航空航天大学 Resume block classification method based on multi-level recurrent neural network
WO2021174919A1 (en) * 2020-03-06 2021-09-10 平安科技(深圳)有限公司 Method and apparatus for analysis and matching of resume data information, electronic device, and medium
CN113609850A (en) * 2021-07-02 2021-11-05 北京达佳互联信息技术有限公司 Word segmentation processing method and device, electronic equipment and storage medium
CN113627182A (en) * 2021-08-10 2021-11-09 深圳平安智汇企业信息管理有限公司 Data matching method and device, computer equipment and storage medium
CN113850049A (en) * 2021-09-26 2021-12-28 北京瑞友科技股份有限公司 Automatic resume editing system and method based on artificial intelligence
CN114168819A (en) * 2022-02-14 2022-03-11 北京大学 Post matching method and device based on graph neural network
CN116994270A (en) * 2023-08-28 2023-11-03 乐麦信息技术(杭州)有限公司 Resume analysis method, device, equipment and readable storage medium

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113905095B (en) * 2021-12-09 2022-04-05 深圳佑驾创新科技有限公司 Data generation method and device based on CAN communication matrix
CN114254951A (en) * 2021-12-27 2022-03-29 南方电网物资有限公司 Power grid equipment arrival sampling inspection method based on digitization technology
CN114637839B (en) * 2022-03-15 2024-10-29 平安国际智慧城市科技股份有限公司 Text highlighting method, device, equipment and storage medium
CN114637836B (en) * 2022-03-15 2024-11-05 平安国际智慧城市科技股份有限公司 Text processing method, device, equipment and storage medium
CN115293131B (en) * 2022-09-29 2023-01-06 广州万维视景科技有限公司 Data matching method, device, equipment and storage medium
CN115879901B (en) * 2023-02-22 2023-07-28 陕西湘秦衡兴科技集团股份有限公司 Intelligent personnel self-service platform
CN116562837A (en) * 2023-07-12 2023-08-08 深圳须弥云图空间科技有限公司 Person post matching method, device, electronic equipment and computer readable storage medium
CN116843155B (en) * 2023-07-27 2024-04-30 深圳市贝福数据服务有限公司 SAAS-based person post bidirectional matching method and system
CN116680590B (en) * 2023-07-28 2023-10-20 中国人民解放军国防科技大学 Post portrait label extraction method and device based on work instruction analysis
CN117236647B (en) * 2023-11-10 2024-02-02 贵州优特云科技有限公司 Post recruitment analysis method and system based on artificial intelligence
CN117670273A (en) * 2023-12-11 2024-03-08 南京道尔医药研究院有限公司 Staff service system based on human resource intelligent terminal
CN117875921B (en) * 2024-03-13 2024-05-24 北京金诚久安人力资源服务有限公司 Human resource management method and system based on artificial intelligence
CN118035561A (en) * 2024-03-29 2024-05-14 上海云生未来技术集团有限公司 Post recommendation method and system based on big data
CN118333591A (en) * 2024-05-07 2024-07-12 中国人民解放军91977部队 Dynamic optimization-based human resource scheduling method and device
CN118195562B (en) * 2024-05-16 2024-09-20 乐麦信息技术(杭州)有限公司 Job entering willingness assessment method and system based on natural semantic analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080222133A1 (en) * 2007-03-08 2008-09-11 Anthony Au System that automatically identifies key words & key texts from a source document, such as a job description, and apply both (key words & text) as context in the automatic matching with another document, such as a resume, to produce a numerically scored result.
CN108874928A (en) * 2018-05-31 2018-11-23 平安科技(深圳)有限公司 Resume data information analyzing and processing method, device, equipment and storage medium
CN109710930A (en) * 2018-12-20 2019-05-03 重庆邮电大学 A kind of Chinese Resume analytic method based on deep neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766318B (en) * 2016-08-17 2021-03-16 北京金山安全软件有限公司 Keyword extraction method and device and electronic equipment
CN110399475A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 Resume matching process, device, equipment and storage medium based on artificial intelligence
CN110750993A (en) * 2019-10-15 2020-02-04 成都数联铭品科技有限公司 Word segmentation method, word segmentation device, named entity identification method and system
CN111428488B (en) * 2020-03-06 2024-10-22 平安科技(深圳)有限公司 Resume data information analysis and matching method and device, electronic equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080222133A1 (en) * 2007-03-08 2008-09-11 Anthony Au System that automatically identifies key words & key texts from a source document, such as a job description, and apply both (key words & text) as context in the automatic matching with another document, such as a resume, to produce a numerically scored result.
CN108874928A (en) * 2018-05-31 2018-11-23 平安科技(深圳)有限公司 Resume data information analyzing and processing method, device, equipment and storage medium
CN109710930A (en) * 2018-12-20 2019-05-03 重庆邮电大学 A kind of Chinese Resume analytic method based on deep neural network

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021174919A1 (en) * 2020-03-06 2021-09-10 平安科技(深圳)有限公司 Method and apparatus for analysis and matching of resume data information, electronic device, and medium
CN111782772A (en) * 2020-07-24 2020-10-16 平安银行股份有限公司 Text automatic generation method, device, equipment and medium based on OCR technology
CN111737969B (en) * 2020-07-27 2020-12-08 北森云计算有限公司 Resume parsing method and system based on deep learning
CN111737969A (en) * 2020-07-27 2020-10-02 北森云计算有限公司 Resume parsing method and system based on deep learning
CN112052670B (en) * 2020-08-28 2024-04-02 丰图科技(深圳)有限公司 Address text word segmentation method, device, computer equipment and storage medium
CN112052670A (en) * 2020-08-28 2020-12-08 丰图科技(深圳)有限公司 Address text word segmentation method and device, computer equipment and storage medium
CN112115705A (en) * 2020-09-23 2020-12-22 普信恒业科技发展(北京)有限公司 Method and device for screening electronic resume
CN112395408A (en) * 2020-11-19 2021-02-23 平安科技(深圳)有限公司 Stop word list generation method and device, electronic equipment and storage medium
CN112395408B (en) * 2020-11-19 2023-11-07 平安科技(深圳)有限公司 Stop word list generation method and device, electronic equipment and storage medium
CN112380344A (en) * 2020-11-19 2021-02-19 平安科技(深圳)有限公司 Text classification method, topic generation method, device, equipment and medium
CN112380344B (en) * 2020-11-19 2023-08-22 平安科技(深圳)有限公司 Text classification method, topic generation method, device, equipment and medium
CN112632227A (en) * 2020-12-30 2021-04-09 北京百度网讯科技有限公司 Resume matching method, resume matching device, electronic equipment, storage medium and program product
CN112632227B (en) * 2020-12-30 2023-06-23 北京百度网讯科技有限公司 Resume matching method, device, electronic equipment, storage medium and program product
CN113011155B (en) * 2021-03-16 2023-09-05 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for text matching
CN113011155A (en) * 2021-03-16 2021-06-22 北京百度网讯科技有限公司 Method, apparatus, device, storage medium and program product for text matching
US11989962B2 (en) 2021-03-16 2024-05-21 Beijing Baidu Netcom Science Technology Co., Ltd. Method, apparatus, device, storage medium and program product of performing text matching
CN113297845A (en) * 2021-06-21 2021-08-24 南京航空航天大学 Resume block classification method based on multi-level recurrent neural network
CN113609850A (en) * 2021-07-02 2021-11-05 北京达佳互联信息技术有限公司 Word segmentation processing method and device, electronic equipment and storage medium
CN113609850B (en) * 2021-07-02 2024-05-17 北京达佳互联信息技术有限公司 Word segmentation processing method and device, electronic equipment and storage medium
CN113627182A (en) * 2021-08-10 2021-11-09 深圳平安智汇企业信息管理有限公司 Data matching method and device, computer equipment and storage medium
CN113627182B (en) * 2021-08-10 2024-07-26 深圳平安智汇企业信息管理有限公司 Data matching method, device, computer equipment and storage medium
CN113850049A (en) * 2021-09-26 2021-12-28 北京瑞友科技股份有限公司 Automatic resume editing system and method based on artificial intelligence
CN114168819A (en) * 2022-02-14 2022-03-11 北京大学 Post matching method and device based on graph neural network
CN114168819B (en) * 2022-02-14 2022-07-12 北京大学 Post matching method and device based on graph neural network
CN116994270A (en) * 2023-08-28 2023-11-03 乐麦信息技术(杭州)有限公司 Resume analysis method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
WO2021174919A1 (en) 2021-09-10
CN111428488B (en) 2024-10-22

Similar Documents

Publication Publication Date Title
CN111428488B (en) Resume data information analysis and matching method and device, electronic equipment and medium
CN108717406B (en) Text emotion analysis method and device and storage medium
US20180181544A1 (en) Systems for Automatically Extracting Job Skills from an Electronic Document
CN102043843A (en) Method and obtaining device for obtaining target entry based on target application
CN113095076A (en) Sensitive word recognition method and device, electronic equipment and storage medium
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN110688854A (en) Named entity recognition method, device and computer readable storage medium
CN112860848B (en) Information retrieval method, device, equipment and medium
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN113722483A (en) Topic classification method, device, equipment and storage medium
CN112988962B (en) Text error correction method and device, electronic equipment and storage medium
CN111858834B (en) Case dispute focus determining method, device, equipment and medium based on AI
CN113704410A (en) Emotion fluctuation detection method and device, electronic equipment and storage medium
CN115392237B (en) Emotion analysis model training method, device, equipment and storage medium
CN116450829A (en) Medical text classification method, device, equipment and medium
CN113344125B (en) Long text matching recognition method and device, electronic equipment and storage medium
CN115510188A (en) Text keyword association method, device, equipment and storage medium
CN115714002A (en) Depression risk detection model training method, depression state early warning method and related equipment
CN110610003A (en) Method and system for assisting text annotation
CN113254814A (en) Network course video labeling method and device, electronic equipment and medium
CN112364068A (en) Course label generation method, device, equipment and medium
CN112559711A (en) Synonymous text prompting method and device and electronic equipment
CN112069322B (en) Text multi-label analysis method and device, electronic equipment and storage medium
CN115346095A (en) Visual question answering method, device, equipment and storage medium
CN113935328A (en) Text abstract generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40030827

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant