WO2021174783A1

WO2021174783A1 - Near-synonym pushing method and apparatus, electronic device, and medium

Info

Publication number: WO2021174783A1
Application number: PCT/CN2020/111915
Authority: WO
Inventors: 陈林; 金戈; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-03-02
Filing date: 2020-08-27
Publication date: 2021-09-10
Also published as: CN111460798A; CN111460798B

Abstract

A near-synonym pushing method, comprising: obtaining an interview question (S1); configuring a first preset number of keywords of an answer corresponding to the interview question (S2); performing pre-training on the basis of a super large word vector model to obtain a target word vector model (S3); constructing a word vector matrix according to the target word vector model to obtain a word-index file (S4); constructing a binary tree on the basis of all word vectors (S5); traversing the binary tree, querying, from the binary tree, first candidate word vectors having distances to the keywords greater than a preset distance threshold, and constructing a priority queue (S6); performing deduplication on the first candidate word vectors in the priority queue (S7); obtaining a second preset number of target word vectors at top positions in the deduplicated priority queue (S8); and pushing, on the basis of the target word vectors and the word-index file, a second preset number of near-synonyms to allow a user to select (S9). Also provided are a near-synonym pushing apparatus, an electronic device, and a storage medium. The present invention can achieve quick near-synonym pushing to users.

Description

Synonym push method, device, electronic equipment and medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 2, 2020, the application number is 202010136905.7, and the invention title is "Synonym push method, device, electronic equipment and medium", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic device, and storage medium for pushing synonyms.

Background technique

The project requirement is the artificial intelligence (AI) interview rule configuration system. Users of some companies can update the answer keywords in the expert rules in real time. However, the inventor realizes that the user needs to input a large amount of information manually and purely when filling in answer keywords, and the system cannot provide assistance to the user when inputting keywords, such as recommendation of synonyms. This operation reduces the user's writing efficiency, and it is extremely dependent on the user's personal understanding of the answer keywords, and cannot guarantee whether the keywords input by the user are relatively complete and objective.

Summary of the invention

In view of the above content, it is necessary to propose a method, device, electronic device, and storage medium for pushing synonyms, which can provide users with fast synonym pushing during AI interviews.

The first aspect of the present application provides a method for pushing synonyms, and the method includes:

Get interview questions;

Configure the first preset number of keywords for answers corresponding to the interview questions;

Pre-trained based on the super-large word vector model to obtain the target word vector model;

Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;

Constructing a binary tree based on all word vectors in the target word vector model;

Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;

De-duplicate the first candidate word vector in the priority queue;

Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;

Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.

The second aspect of the present application is a device for pushing synonyms, and the device includes:

Acquisition module, used to acquire interview questions;

The configuration module is used to configure the first preset number of keywords corresponding to the answers of the interview questions;

The training module is used to pre-train the target word vector model based on the super-large word vector model;

The construction module is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index;

The construction module is also used to construct a binary tree based on all word vectors in the target word vector model;

A traversal module, configured to traverse the binary tree, query the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;

A deduplication module, configured to deduplicate the first candidate word vector in the priority queue;

The acquiring module is also used to acquire the target word vectors of the second preset number in the deduplicated priority queue; and

The push module is configured to push the second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.

A third aspect of the present application provides an electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:

Get interview questions;

De-duplicate the first candidate word vector in the priority queue;

A fourth aspect of the present application provides a computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the following steps when executed by a processor:

Get interview questions;

De-duplicate the first candidate word vector in the priority queue;

The method, device, electronic equipment and storage medium for pushing synonyms described in this application. By configuring the first preset number of keywords corresponding to the answers to the interview questions, search for the second preset number of synonyms corresponding to each keyword in the pre-trained word vector model, and push the second preset number Set a number of synonyms for users to choose. You can configure more synonyms of keywords of the answers corresponding to the interview questions during the robot interview process. It is convenient for human resources HR to configure more comprehensive answers for interview questions when interviewing job applicants. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only It is the embodiment of the present application. For those of ordinary skill in the art, other drawings can be obtained according to the provided drawings without creative work.

FIG. 1 is a flowchart of a method for pushing synonyms provided in Embodiment 1 of the present application.

Fig. 2 is a functional module diagram of the push device provided in the second embodiment of the present application.

Fig. 3 is a schematic diagram of an electronic device provided in a third embodiment of the present application.

The following specific embodiments will further illustrate this application in conjunction with the above-mentioned drawings.

Detailed ways

In order to be able to understand the above objectives, features and advantages of the application more clearly, the application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the application and the features in the embodiments can be combined with each other if there is no conflict.

In the following description, many specific details are set forth in order to fully understand the present application. The described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terms used in the specification of the application herein are only for the purpose of describing specific embodiments, and are not intended to limit the application.

The terms "first", "second", and "third" in the specification and claims of the present application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the term "including" and any variations of them are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

The method for pushing synonymous words in the embodiment of this application is applied in an electronic device. For the electronic device that needs to push the approach word, the synonym push function provided by the method of this application can be directly integrated on the electronic device, or the client for implementing the method of this application can be installed. For another example, the method provided in this application can also be run on a server and other devices in the form of a Software Development Kit (SDK), providing an interface for the push function of synonyms in the form of SDK, and electronic devices or other devices provide The interface can realize the push function of synonyms.

Example one

FIG. 1 is a flowchart of a method for pushing synonyms provided in Embodiment 1 of the present application. According to different requirements, the execution sequence in the flowchart can be changed, and some steps can be omitted.

In the robot interview process, the robot can better determine whether the job applicant is correct in answering the interview questions in the interview process, and when the job applicant is graded according to the answer result. It is necessary to configure keywords according to the answers corresponding to the interview questions, and after receiving the answers input by the job applicant, extract the keywords according to the input answers. The extracted keywords are matched with the configured keywords to obtain a matching result, and the job applicant is scored according to the matching result. When configuring keywords according to the answers corresponding to the interview questions, in order to avoid the situation where the keywords are not comprehensive enough, this application provides a way to expand the keywords input by the interviewer when configuring keywords, and push the same. /Method of meaning words. The method includes:

Step S1: Obtain interview questions.

During the robot interview process, different interview questions will be configured according to different positions. For example, interview questions configured according to R&D positions include "Which programming languages are you familiar with", "How to break out of the current multiple nested loops in Java" and "Is there a memory leak in Java, please describe briefly" and so on.

In this embodiment, the robot interview needs to be pre-configured with interview questions and answers. However, different job applicants give different answers when facing the same interview questions. In order to comprehensively evaluate the ability of job applicants, when configuring interview questions and answers, it is necessary to configure detailed, complete and comprehensive answers according to the interview questions.

Step S2, configuring the first preset number of keywords of answers corresponding to the interview questions.

In one embodiment, the step of configuring the first preset number of keywords for answers corresponding to the interview questions includes:

Query the answer corresponding to the interview question configuration in the pre-established interview question and answer correspondence table, and obtain the query result;

Extract keywords in the query result, where the keywords are the first preset number.

It is understandable that the keyword may also be a keyword related to the query result obtained by performing semantic analysis according to the query result.

In another embodiment, the step of configuring the first preset number of keywords of answers corresponding to the interview questions includes:

(1) Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions.

In this embodiment, the topic analysis model can analyze the topic characteristics of the interview topic. The topic features may include topic intentions and key information. For example, when the interview question is "What programming languages are you good at?", the intent of the question stem is the programming language you are good at, and the key information can be the programming language.

(2) Determine the answer corresponding to the interview question according to the purpose of the question and a pre-established knowledge base.

For example, when the interview topic is "What programming languages are you good at?", the pre-established knowledge base may include C/C++, Java, C#, SQL, etc.

(3) Extract the first preset number of keywords according to the corresponding answer.

Step S3, pre-training based on the super large word vector model to obtain the target word vector model.

In this embodiment, pre-training is performed based on the super large word vector model to obtain a suitable target word vector model. Specifically, it includes: expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode; according to the expanded corpus The super-large word vector model is trained to obtain the target word vector model.

Specifically, the training corpus of the super large word vector model covers a large number of corpora of different dimensions, such as news, web pages, novels, Baidu Baike, and Wikipedia. For the robot interview scene, the corpus of the specific scene in the super-large word vector model is insufficient. Therefore, the corpus of the robot interview scene is integrated on the basis of the super-large word vector model, and the corpus of question and answer text and similar question text in the robot interview is expanded. The target word vector model is a word vector model that contains the prediction of the robot interview.

Then, perform word segmentation, remove stop words, and incrementally train word vectors based on the CBOW mode on the robot interview scene corpus to expand its performance in the robot interview scene. The final trained target word vector model covers more than 8 million words, and the dimension of each word is about 200 dimensions. Therefore, the target word vector model corpus is extensive, and each word vector therein can well reflect the semantics of each word. At the same time, the order of magnitude of 8 million words can completely replace the traditional way of constructing a dictionary of synonyms, and solve the problem of not being able to find words.

It should be noted that the method of pre-training the target word vector model based on the super-large word vector model is the prior art, and will not be repeated here.

Step S4, constructing a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index.

In this embodiment, the construction of a word vector matrix according to the target word vector model to obtain a word-index file may include:

(a1) Construct a word vector matrix with the dimension of each word as the number of rows and the total number of all words in the target word vector model as the number of columns;

(a2) Each row in the word vector matrix corresponds to an index;

(a3) Construct a word-index file according to the word vector matrix, and output the word-index file.

Specifically, the word vector matrix is a matrix composed of the dimension of each word as the number of rows and the total number of all words as the number of columns. In this embodiment, the dimension of each word is 200, and the target word vector model includes 8 million words. Then, a word vector matrix with 200 columns and 8 million rows can be obtained.

And each row in the word vector matrix has an index, then the index corresponding to each word can be obtained. Thus, the word-index file is output according to the word vector matrix. At the same time, the corresponding relationship between each index and each word vector can also be obtained.

In step S5, a binary tree is constructed based on all word vectors in the target word vector model.

In this embodiment, a binary tree structure is constructed according to all word vectors in the target word vector model.

The word vector is a 200-dimensional vector, that is, a 200-dimensional high-dimensional data space. Each word vector represents a point in the high-dimensional data space. The data space corresponding to all word vectors in the target word vector model can be Expressed as 8 million points. Construct a binary tree according to the target word vector model by the following method:

(1) Two points are randomly selected as initial nodes, and the two initial nodes are connected to form an equidistant hyperplane;

(2) Construct an equidistant vertical hyperplane according to the midpoint vertical line of the connection between the two initial nodes, divide the data space corresponding to all word vectors in the target word vector model into two parts, and obtain two sub- space;

(3) Multiply each point in each subspace by the normal vector of the equidistant hyperplane (vector dot product), find the sign of the angle between each point and the normal vector, and divide it by sign Whether the left subtree belongs to the binary tree or the right subtree;

(4) By analogy, repeating the above steps (1) to (3) in the two subspaces respectively, the data space can be divided into multiple subspaces, and a binary tree structure can be constructed according to the multiple subspaces.

Preferably, when there are at most k points left in each subspace, the subspace is no longer divided. Preferably, the k is greater than or equal to 8 and less than or equal to 10. In this embodiment, the value of k is 10.

The segmentation condition of each node in the above binary tree structure is those equidistant vertical hyperplanes, and finally, the word vector is the leaf node on the binary tree. That is, the binary tree includes a root node, multiple intermediate nodes, and a final layer of leaf nodes, where each leaf node represents a word vector. In this application, there is no need to save the word vector on the leaf node, only the index corresponding to the word vector needs to be saved. In this way, the similar word vectors are closer in the binary tree, which provides a faster speed for subsequent search of synonyms.

Step S6: Traverse the binary tree, query the binary tree to find a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector.

The specific method for constructing a priority queue is: taking the keyword as the root node of the binary tree; traversing all intermediate nodes under the root node; calculating the distance between the root node and each intermediate node; determining that it is greater than The intermediate node corresponding to the target distance of the preset distance threshold is the first-level target node; all the intermediate nodes under the first-level target node are traversed to the last-level leaf node; the word vectors in all the leaf nodes are used as the first candidate Word vector; calculating the similarity between the first candidate word vector and the keyword; inserting the first candidate word vector into the priority queue according to the order of similarity.

Step S7: De-duplicate the first candidate word vector in the priority queue.

Step S8: Obtain the target word vectors of the second preset number in the prioritized queue after deduplication.

Step S9: Push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.

In this embodiment, the method of pushing a second preset number of synonymous words for the user to select based on the second preset number of target word vectors and word-index files includes: obtaining the second preset number of targets The target index corresponding to the word vector; query the word vector corresponding to the target index according to the word-index file; push the synonym corresponding to the word vector for the user to select.

In this embodiment, the binary tree structure file and the word-index file are stored together, and when it is necessary to query the vocabulary of the neighbor Top N of a certain keyword, only these two files need to be used for indexing.

In this embodiment, by pushing the second preset number of synonyms for the user to filter, it is convenient for the user to more comprehensively configure the keywords of the answers corresponding to the interview questions. Therefore, when the job seeker answers the question, the job seeker will not be scored unilaterally based on the answer of the job seeker. The synonym search function supported by this application is more innovative and convenient. It can generate synonymous words of 5 keywords at a time, and push 8 synonymous words at a time. It supports users to click "change batch" to replace 8 synonymous words in another round, which is convenient for users. View and use. For example, a "change batch" button is displayed on the push interface. After the user clicks the button, the original synonyms can be updated and more synonyms can be pushed.

Preferably, since many words are not the answers to the interview questions, a preset rule is added to filter the queried vocabulary, wherein the preset rule includes at least one of the following rules:

(1) Adjust the order of the searched vocabulary according to the number of words. For example, priority is given to returning vocabulary consistent with the number of words in the keyword. For words that are inconsistent with the number of words of the keyword, for each increase/decrease of 1 word, the preset distance (for example, 0.1) is increased when sorting the queried words.

(2) Filter the queried vocabulary according to the type of vocabulary, the type includes Chinese, English and numbers. For example, preferentially return vocabulary consistent with the keyword type. In addition, for inputting Chinese, return to English; or input English and return to Chinese, return normally. However, for Chinese input, numbers are returned; or English, numbers are returned, and the synonym is deleted directly. It should be noted that a single English letter or a single Chinese character represents 1 character.

(3) Removal of words with more words than the preset number of words of the keyword. For example, a vocabulary with more than 5 characters than the keyword.

It is understandable that the above method can also be used to push synonyms.

In summary, the synonym push method provided in this application includes obtaining interview questions; configuring the first preset number of keywords corresponding to the answers to the interview questions; searching for each key word in the pre-trained word vector model The second preset number of synonyms corresponding to the word; and pushing the second preset number of synonyms for the user to choose. The word vector used in this application covers a wide range, and the vector dimension of the characterizing word is 200 dimensions. The vector of each word can well reflect the actual semantics of each word; the word vector model of this application includes 8 million words, which is very A good solution to the traditional out-of-word problem. The word vector model used in this application greatly reduces the memory usage, greatly reduces the memory usage by sampling the word-index file, and greatly increases the system stability. In addition, the query return speed of this application has been greatly increased. It used to take a query time of about ten seconds for a word, but now it is reduced to less than 0.01s. Finally, this application can configure more synonymous words of keywords corresponding to the answers of the interview questions during the robot interview process. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.

The above are only specific implementations of this application, but the scope of protection of this application is not limited to this. For those of ordinary skill in the art, without departing from the creative concept of this application, they can also make Improvements, but these all belong to the scope of protection of this application.

The functional modules and hardware structure of the electronic device implementing the above-mentioned synonym pushing method are respectively introduced below in conjunction with FIG. 2 and FIG. 3.

Example two

Fig. 2 is a diagram of functional modules in a preferred embodiment of a device for pushing synonyms of this application.

In some embodiments, the synonym pushing device 20 (referred to as "pushing device" for ease of description) runs in an electronic device. The pushing device 20 may include multiple functional modules composed of program code segments. The program code of each program segment in the pushing device 20 can be stored in a memory and executed by at least one processor to perform the function of pushing synonyms.

In order to use the robot to better determine whether the job applicant is correct in answering the interview questions in the interview process during the robot interview process, and to rate the job applicant according to the answer. It is necessary to configure keywords according to the answers corresponding to the interview questions, and after receiving the answers input by the job applicant, extract the keywords according to the input answers. The extracted keywords are matched with the configured keywords to obtain a matching result, and the job applicant is scored according to the matching result. When configuring keywords according to the answers corresponding to the interview questions, in order to avoid the situation where the keywords are not comprehensive enough, this application provides a way to expand the keywords entered by the interviewer when configuring keywords, and push the same. /Synonymous said pushing device 20. The functional modules of the pushing device 20 may include: an acquisition module 201, a configuration module 202, a training module 203, a construction module 204, a traversal module 205, a deduplication module 206, and a pushing module 207. The function of each module will be detailed in the subsequent embodiments. The module referred to in this application refers to a series of computer program segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory.

The obtaining module 201 is used to obtain interview questions.

The configuration module 202 is configured to configure a first preset number of keywords of answers corresponding to the interview questions.

In one embodiment, the keywords for configuring the first preset number of answers corresponding to the interview questions include:

In another embodiment, the keywords for configuring the first preset number of answers corresponding to the interview questions include:

The training module 203 is used for pre-training to obtain the target word vector model based on the super large word vector model.

The construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index.

(a2) Each row in the word vector matrix corresponds to an index;

The construction module 204 is also used to construct a binary tree based on all word vectors in the target word vector model.

In this embodiment, a binary tree structure is constructed for all word vectors in the target word vector model.

(1) Two points are randomly selected as initial nodes, and the two initial nodes are connected to form an equidistant hyperplane.

(2) Construct an equidistant vertical hyperplane according to the midpoint vertical line of the connection between the two initial nodes, divide the data space corresponding to all word vectors in the target word vector model into two parts, and obtain two sub- space.

(3) Multiply each point in each subspace by the normal vector of the equidistant hyperplane (vector dot product), find the sign of the angle between each point and the normal vector, and divide it by sign The left subtree belongs to the binary tree or the right subtree.

The segmentation condition of each node in the above binary tree structure is those equidistant vertical hyperplanes, and finally, the word vector is the leaf node on the binary tree. In this application, there is no need to save the word vector on the leaf node, only the index corresponding to the word vector needs to be saved. In this way, the similar word vectors are closer in the binary tree, which provides a faster speed for subsequent search of synonyms.

The traversal module 205 is configured to traverse the binary tree, query the binary tree for the first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector.

The deduplication module 206 is configured to deduplicate the first candidate word vector in the priority queue.

The acquiring module 201 is also used for acquiring the target word vectors of the second preset number in the prioritized queue after deduplication.

The pushing module 207 is configured to push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.

It is understandable that the aforementioned pushing device 20 can also be used to push synonyms.

In summary, the push device 20 described in this application includes an acquisition module 201, a configuration module 202, a training module 203, a construction module 204, a traversal module 205, a deduplication module 206, and a push module 207. The acquisition module 201 is used to obtain interview questions; the configuration module 202 is used to configure a first preset number of keywords corresponding to the answers to the interview questions; the training module 203 is used to pre-set based on the super-large word vector model The target word vector model is obtained through training; the construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index; The construction module 204 is also used to construct a binary tree based on all word vectors in the target word vector model; the traversal module 205 is used to traverse the binary tree, and query from the binary tree that the distance to the keyword is greater than A first candidate word vector with a preset distance threshold and construct a priority queue based on the first candidate word vector; the de-duplication module 206 is configured to de-duplicate the first candidate word vector in the priority queue; The acquiring module 201 is also configured to acquire the target word vectors of the second preset number in the prioritized queue after deduplication; and the pushing module 207 is configured to acquire the target word vectors of the second preset number Harmony-the index file pushes the second preset number of synonyms for users to choose.

The word vector used in this application covers a wide range, and the vector dimension of the characterizing word is 200 dimensions. The vector of each word can well reflect the actual semantics of each word; the word vector model of this application includes 8 million words, which is very A good solution to the traditional out-of-word problem. The word vector model used in this application greatly reduces the memory usage, greatly reduces the memory usage by sampling the word-index file, and greatly increases the system stability. In addition, the query return speed of this application has been greatly increased. It used to take a query time of about ten seconds for a word, but now it is reduced to less than 0.01s. Finally, this application can configure more synonymous words of keywords corresponding to the answers of the interview questions during the robot interview process. It is convenient for human resources HR to configure more comprehensive answers for interview questions when interviewing job applicants. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.

The above-mentioned integrated unit implemented in the form of a software function module may be stored in a computer readable storage medium. The above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a dual-screen device, or a network device, etc.) or a processor to execute the various embodiments of this application. Part of the method.

FIG. 3 is a schematic diagram of the electronic device provided in the third embodiment of the application.

The electronic device 3 includes a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and running on the at least one processor 32, at least one communication bus 34 and a database 35.

When the at least one processor 32 executes the computer program 33, the steps in the above-mentioned synonym push method embodiment are implemented.

Exemplarily, the computer program 33 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 31 and executed by the at least one processor 32, To complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the computer-readable instruction segments are used to describe the execution process of the computer program 33 in the electronic device 3.

The electronic device 3 may be a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA) and other devices installed with applications. Those skilled in the art can understand that the schematic diagram 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3. For example, the electronic device 3 may also include input and output devices, network access devices, buses, and so on.

The at least one processor 32 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (ASICs). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The processor 32 may be a microprocessor, or the processor 32 may also be any conventional processor, etc. The processor 32 is the control center of the electronic device 3, and connects the entire electronic device with various interfaces and lines. Various parts of device 3.

The memory 31 may be used to store the computer program 33 and/or modules/units. The processor 32 runs or executes the computer programs and/or modules/units stored in the memory 31, and calls the computer programs and/or modules/units stored in the memory 31. The data in 31 realizes various functions of the electronic device 3. The memory 31 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data (such as audio data, etc.) created according to the use of the electronic device 3 and the like are stored. In addition, the memory 31 may include a volatile memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card. , Flash Card, at least one magnetic disk storage device, flash memory device, high-speed random access memory, or other storage device.

The memory 31 stores program codes, and the at least one processor 32 can call the program codes stored in the memory 31 to perform related functions. For example, the various modules (acquisition module 201, configuration module 202, training module 203, construction module 204, traversal module 205, deduplication module 206, and push module 207) described in FIG. 2 are programs stored in the memory 31 The code is executed by the at least one processor 32, so as to realize the functions of the various modules to achieve the purpose of pushing synonyms.

The obtaining module 201 is used to obtain interview questions;

The configuration module 202 is used to configure the first preset number of keywords corresponding to the answers of the interview questions;

The training module 203 is used for pre-training to obtain a target word vector model based on the super-large word vector model;

The construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;

The construction module 204 is further configured to construct a binary tree based on all word vectors in the target word vector model;

The traversal module 205 is configured to traverse the binary tree, query the binary tree for the first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;

The deduplication module 206 is configured to deduplicate the first candidate word vector in the priority queue;

The acquiring module 201 is also used to acquire the target word vectors of the second preset number in the prioritized queue after deduplication; and

The pushing module 207 is configured to push the second preset number of target word vectors and word-index files to the second preset number of synonyms for selection by the user.

The database (Database) 35 is a warehouse built on the electronic device 3 for organizing, storing and managing data according to a data structure. Databases are usually divided into three types: hierarchical database, network database and relational database. In this embodiment, the database 35 is used to store information such as interview questions.

If the integrated module/unit of the electronic device 3 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the computer program includes computer-readable instruction code, and the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random access memory, etc.

In the several embodiments provided in this application, it should be understood that the disclosed electronic device and method can be implemented in other ways. For example, the electronic device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other division methods in actual implementation.

In addition, the functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any reference signs in the claims should not be regarded as limiting the claims involved. In addition, it is obvious that the word "including" does not exclude other elements or the singular number does not exclude the plural number. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Modifications or equivalent replacements are made without departing from the spirit and scope of the technical solution of the present application.

Claims

A method for pushing synonyms, wherein the method includes:

Get interview questions;

Configure the first preset number of keywords for answers corresponding to the interview questions;

Pre-trained based on the super-large word vector model to obtain the target word vector model;

Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;

Constructing a binary tree based on all word vectors in the target word vector model;

Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;

De-duplicate the first candidate word vector in the priority queue;

Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;

Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
8. The method for pushing synonyms according to claim 1, wherein the step of configuring the first preset number of keywords of answers corresponding to the interview questions comprises:

Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions;

According to the purpose of the question and the pre-established knowledge base, determine the answer corresponding to the interview question; and

Extract the first preset number of keywords according to the corresponding answer.
8. The method for pushing synonyms according to claim 1, wherein the step of obtaining a target word vector model based on the super-large word vector model pre-training comprises:

Expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode;

The target word vector model is obtained by training the super-large word vector model after the expanded corpus.
The method for pushing synonyms according to claim 3, wherein the step of constructing a word vector matrix according to the target word vector model to obtain a word-index file comprises:

Constructing a word vector matrix with the dimension of each word as the number of rows, and the total number of all words in the target word vector model as the number of columns;

Each row in the word vector matrix corresponds to an index;

Construct a word-index file according to the word vector matrix, and output the word-index file.
The method for pushing synonyms according to claim 3, wherein said pushing a second preset number of synonyms based on said second preset number of target word vectors and word-index files for user selection comprises:

Acquiring the target index corresponding to the second preset number of target word vectors;

Query the word vector corresponding to the target index according to the word-index file;

Push the synonyms corresponding to the word vector for the user to choose.
The method for pushing synonyms according to claim 1, wherein the first candidate word vector whose distance to the keyword is greater than a preset distance threshold is found from the binary tree and constructed based on the first candidate word vector The priority queue includes:

Use the keyword as the root node of the binary tree;

Traverse all intermediate nodes under the root node;

Calculating the distance between the root node and each intermediate node;

Determine that the intermediate node corresponding to the target distance greater than the preset distance threshold is the first-level target node;

Traverse all intermediate nodes under the first-level target node until the last-level leaf node;

Use the word vectors in all leaf nodes as the first candidate word vector;

Calculating the similarity between the first candidate word vector and the keyword;

The first candidate word vector is inserted into the priority queue according to the order of similarity.
The method for pushing synonyms according to claim 3, wherein the method further comprises: filtering the second preset number of synonyms found according to a preset rule, wherein the preset rule includes at least one of the following rules :

Adjust the order of the second preset number of synonyms according to the number of words;

Filter the second preset number of synonyms according to the type of vocabulary;

The vocabulary of the second predetermined number of synonymous words with more characters than the predetermined number of characters of the keyword is removed.
A device for pushing synonyms, wherein the device includes:

Acquisition module, used to acquire interview questions;

The configuration module is used to configure the first preset number of keywords corresponding to the answers of the interview questions;

The training module is used to pre-train the target word vector model based on the super-large word vector model;

The construction module is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index;

The construction module is also used to construct a binary tree based on all word vectors in the target word vector model;

A traversal module, configured to traverse the binary tree, query the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;

A deduplication module, configured to deduplicate the first candidate word vector in the priority queue;

The acquiring module is also used to acquire the target word vectors of the second preset number in the deduplicated priority queue; and

The push module is configured to push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for user selection.
An electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:

Get interview questions;

Configure the first preset number of keywords for answers corresponding to the interview questions;

Pre-trained based on the super-large word vector model to obtain the target word vector model;

Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;

Constructing a binary tree based on all word vectors in the target word vector model;

Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;

De-duplicate the first candidate word vector in the priority queue;

Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;

Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
9. The electronic device according to claim 9, wherein when the processor executes the computer-readable instructions to implement the configuration of the first preset number of keywords corresponding to the interview questions, specifically comprising:

Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions;

According to the purpose of the question and the pre-established knowledge base, determine the answer corresponding to the interview question; and

Extract the first preset number of keywords according to the corresponding answer.
9. The electronic device according to claim 9, wherein when the processor executes the computer-readable instructions to implement the step of obtaining the target word vector model based on the super-large word vector model pre-training, it specifically comprises:

Expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode;

The target word vector model is obtained by training the super-large word vector model after the expanded corpus.
11. The electronic device of claim 11, wherein the processor executes the computer-readable instructions to implement the construction of a word vector matrix according to the target word vector model to obtain a word-index file, which specifically includes:

Constructing a word vector matrix with the dimension of each word as the number of rows, and the total number of all words in the target word vector model as the number of columns;

Each row in the word vector matrix corresponds to an index;

Construct a word-index file according to the word vector matrix, and output the word-index file.
The electronic device according to claim 11, wherein the processor executes the computer-readable instructions to implement the push of the second preset number of target word vectors and word-index files based on the second preset number When the number of synonyms are for users to choose, it specifically includes:

Acquiring the target index corresponding to the second preset number of target word vectors;

Query the word vector corresponding to the target index according to the word-index file;

Push the synonyms corresponding to the word vector for the user to choose.
The electronic device of claim 9, wherein the processor executes the computer-readable instructions to implement the query from the binary tree for the first candidate whose distance to the keyword is greater than a preset distance threshold When constructing a priority queue based on the word vector and the first candidate word vector, it includes:

Use the keyword as the root node of the binary tree;

Traverse all intermediate nodes under the root node;

Calculating the distance between the root node and each intermediate node;

Determine that the intermediate node corresponding to the target distance greater than the preset distance threshold is the first-level target node;

Traverse all intermediate nodes under the first-level target node until the last-level leaf node;

Use the word vectors in all leaf nodes as the first candidate word vector;

Calculating the similarity between the first candidate word vector and the keyword;

The first candidate word vector is inserted into the priority queue according to the order of similarity.
11. The electronic device of claim 11, wherein the processor executing the computer-readable instructions is further used to implement the following steps:

The second preset number of synonyms found are filtered according to preset rules, where the preset rules include at least one of the following rules:

Adjust the order of the second preset number of synonyms according to the number of words;

Filter the second preset number of synonyms according to the type of vocabulary;

The vocabulary of the second predetermined number of synonymous words with more characters than the predetermined number of characters of the keyword is removed.
A computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the following steps when executed by a processor:

Get interview questions;

Configure the first preset number of keywords for answers corresponding to the interview questions;

Pre-trained based on the super-large word vector model to obtain the target word vector model;

Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;

Constructing a binary tree based on all word vectors in the target word vector model;

Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;

De-duplicate the first candidate word vector in the priority queue;

Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;

Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
The computer-readable storage medium of claim 16, wherein the computer-readable instructions are executed by the processor to implement the configuration of the first preset number of keywords corresponding to the interview questions , Specifically including:

Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions;

According to the purpose of the question and the pre-established knowledge base, determine the answer corresponding to the interview question; and

Extract the first preset number of keywords according to the corresponding answer.
The computer-readable storage medium of claim 16, wherein the computer-readable instructions are executed by the processor to implement the pre-training based on the super-large word vector model to obtain the target word vector model, specifically comprising:

Expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode;

The target word vector model is obtained by training the super-large word vector model after the expanded corpus.
18. The computer-readable storage medium according to claim 18, wherein the computer-readable instructions are executed by the processor to implement the construction of a word vector matrix according to the target word vector model to obtain a word-index file, which specifically includes:

Constructing a word vector matrix with the dimension of each word as the number of rows, and the total number of all words in the target word vector model as the number of columns;

Each row in the word vector matrix corresponds to an index;

Construct a word-index file according to the word vector matrix, and output the word-index file.
The computer-readable storage medium of claim 18, wherein the computer-readable instructions are executed by the processor to implement the push of the target word vector and word-index file based on the second preset number When the second preset number of synonyms is for the user to choose, it specifically includes:

Acquiring the target index corresponding to the second preset number of target word vectors;

Query the word vector corresponding to the target index according to the word-index file;

Push the synonyms corresponding to the word vector for the user to choose.