[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2021174783A1 - Near-synonym pushing method and apparatus, electronic device, and medium - Google Patents

Near-synonym pushing method and apparatus, electronic device, and medium Download PDF

Info

Publication number
WO2021174783A1
WO2021174783A1 PCT/CN2020/111915 CN2020111915W WO2021174783A1 WO 2021174783 A1 WO2021174783 A1 WO 2021174783A1 CN 2020111915 W CN2020111915 W CN 2020111915W WO 2021174783 A1 WO2021174783 A1 WO 2021174783A1
Authority
WO
WIPO (PCT)
Prior art keywords
word vector
word
target
preset number
synonyms
Prior art date
Application number
PCT/CN2020/111915
Other languages
French (fr)
Chinese (zh)
Inventor
陈林
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174783A1 publication Critical patent/WO2021174783A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic device, and storage medium for pushing synonyms.
  • the project requirement is the artificial intelligence (AI) interview rule configuration system.
  • AI artificial intelligence
  • Users of some companies can update the answer keywords in the expert rules in real time.
  • the inventor realizes that the user needs to input a large amount of information manually and purely when filling in answer keywords, and the system cannot provide assistance to the user when inputting keywords, such as recommendation of synonyms. This operation reduces the user's writing efficiency, and it is extremely dependent on the user's personal understanding of the answer keywords, and cannot guarantee whether the keywords input by the user are relatively complete and objective.
  • the first aspect of the present application provides a method for pushing synonyms, and the method includes:
  • a second preset number of synonyms are pushed for the user to select.
  • the second aspect of the present application is a device for pushing synonyms, and the device includes:
  • Acquisition module used to acquire interview questions
  • the configuration module is used to configure the first preset number of keywords corresponding to the answers of the interview questions
  • the training module is used to pre-train the target word vector model based on the super-large word vector model
  • the construction module is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index;
  • the construction module is also used to construct a binary tree based on all word vectors in the target word vector model
  • a traversal module configured to traverse the binary tree, query the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;
  • a deduplication module configured to deduplicate the first candidate word vector in the priority queue
  • the acquiring module is also used to acquire the target word vectors of the second preset number in the deduplicated priority queue.
  • the push module is configured to push the second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
  • a third aspect of the present application provides an electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
  • a second preset number of synonyms are pushed for the user to select.
  • a fourth aspect of the present application provides a computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the following steps when executed by a processor:
  • a second preset number of synonyms are pushed for the user to select.
  • the method, device, electronic equipment and storage medium for pushing synonyms described in this application By configuring the first preset number of keywords corresponding to the answers to the interview questions, search for the second preset number of synonyms corresponding to each keyword in the pre-trained word vector model, and push the second preset number Set a number of synonyms for users to choose. You can configure more synonyms of keywords of the answers corresponding to the interview questions during the robot interview process. It is convenient for human resources HR to configure more comprehensive answers for interview questions when interviewing job applicants. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.
  • FIG. 1 is a flowchart of a method for pushing synonyms provided in Embodiment 1 of the present application.
  • Fig. 2 is a functional module diagram of the push device provided in the second embodiment of the present application.
  • Fig. 3 is a schematic diagram of an electronic device provided in a third embodiment of the present application.
  • the method for pushing synonymous words in the embodiment of this application is applied in an electronic device.
  • the synonym push function provided by the method of this application can be directly integrated on the electronic device, or the client for implementing the method of this application can be installed.
  • the method provided in this application can also be run on a server and other devices in the form of a Software Development Kit (SDK), providing an interface for the push function of synonyms in the form of SDK, and electronic devices or other devices provide The interface can realize the push function of synonyms.
  • SDK Software Development Kit
  • FIG. 1 is a flowchart of a method for pushing synonyms provided in Embodiment 1 of the present application. According to different requirements, the execution sequence in the flowchart can be changed, and some steps can be omitted.
  • the robot can better determine whether the job applicant is correct in answering the interview questions in the interview process, and when the job applicant is graded according to the answer result. It is necessary to configure keywords according to the answers corresponding to the interview questions, and after receiving the answers input by the job applicant, extract the keywords according to the input answers. The extracted keywords are matched with the configured keywords to obtain a matching result, and the job applicant is scored according to the matching result.
  • this application provides a way to expand the keywords input by the interviewer when configuring keywords, and push the same. /Method of meaning words.
  • the method includes:
  • Step S1 Obtain interview questions.
  • interview questions will be configured according to different positions.
  • interview questions configured according to R&D positions include "Which programming languages are you familiar with”, "How to break out of the current multiple nested loops in Java” and "Is there a memory leak in Java, please describe briefly” and so on.
  • the robot interview needs to be pre-configured with interview questions and answers.
  • different job applicants give different answers when facing the same interview questions.
  • Step S2 configuring the first preset number of keywords of answers corresponding to the interview questions.
  • the step of configuring the first preset number of keywords for answers corresponding to the interview questions includes:
  • the keyword may also be a keyword related to the query result obtained by performing semantic analysis according to the query result.
  • the step of configuring the first preset number of keywords of answers corresponding to the interview questions includes:
  • the topic analysis model can analyze the topic characteristics of the interview topic.
  • the topic features may include topic intentions and key information. For example, when the interview question is "What programming languages are you good at?", the intent of the question stem is the programming language you are good at, and the key information can be the programming language.
  • the pre-established knowledge base may include C/C++, Java, C#, SQL, etc.
  • Step S3 pre-training based on the super large word vector model to obtain the target word vector model.
  • pre-training is performed based on the super large word vector model to obtain a suitable target word vector model. Specifically, it includes: expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode; according to the expanded corpus The super-large word vector model is trained to obtain the target word vector model.
  • the training corpus of the super large word vector model covers a large number of corpora of different dimensions, such as news, web pages, novels, Baidu Baike, and Wikipedia.
  • the corpus of the specific scene in the super-large word vector model is insufficient. Therefore, the corpus of the robot interview scene is integrated on the basis of the super-large word vector model, and the corpus of question and answer text and similar question text in the robot interview is expanded.
  • the target word vector model is a word vector model that contains the prediction of the robot interview.
  • the final trained target word vector model covers more than 8 million words, and the dimension of each word is about 200 dimensions. Therefore, the target word vector model corpus is extensive, and each word vector therein can well reflect the semantics of each word. At the same time, the order of magnitude of 8 million words can completely replace the traditional way of constructing a dictionary of synonyms, and solve the problem of not being able to find words.
  • Step S4 constructing a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index.
  • the construction of a word vector matrix according to the target word vector model to obtain a word-index file may include:
  • the word vector matrix is a matrix composed of the dimension of each word as the number of rows and the total number of all words as the number of columns.
  • the dimension of each word is 200
  • the target word vector model includes 8 million words. Then, a word vector matrix with 200 columns and 8 million rows can be obtained.
  • each row in the word vector matrix has an index
  • the index corresponding to each word can be obtained.
  • the word-index file is output according to the word vector matrix.
  • the corresponding relationship between each index and each word vector can also be obtained.
  • step S5 a binary tree is constructed based on all word vectors in the target word vector model.
  • a binary tree structure is constructed according to all word vectors in the target word vector model.
  • the word vector is a 200-dimensional vector, that is, a 200-dimensional high-dimensional data space. Each word vector represents a point in the high-dimensional data space.
  • the data space corresponding to all word vectors in the target word vector model can be Expressed as 8 million points. Construct a binary tree according to the target word vector model by the following method:
  • the data space can be divided into multiple subspaces, and a binary tree structure can be constructed according to the multiple subspaces.
  • the subspace is no longer divided.
  • the k is greater than or equal to 8 and less than or equal to 10. In this embodiment, the value of k is 10.
  • each node in the above binary tree structure is those equidistant vertical hyperplanes, and finally, the word vector is the leaf node on the binary tree. That is, the binary tree includes a root node, multiple intermediate nodes, and a final layer of leaf nodes, where each leaf node represents a word vector.
  • the binary tree includes a root node, multiple intermediate nodes, and a final layer of leaf nodes, where each leaf node represents a word vector.
  • there is no need to save the word vector on the leaf node only the index corresponding to the word vector needs to be saved. In this way, the similar word vectors are closer in the binary tree, which provides a faster speed for subsequent search of synonyms.
  • Step S6 Traverse the binary tree, query the binary tree to find a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector.
  • the specific method for constructing a priority queue is: taking the keyword as the root node of the binary tree; traversing all intermediate nodes under the root node; calculating the distance between the root node and each intermediate node; determining that it is greater than
  • the intermediate node corresponding to the target distance of the preset distance threshold is the first-level target node; all the intermediate nodes under the first-level target node are traversed to the last-level leaf node; the word vectors in all the leaf nodes are used as the first candidate Word vector; calculating the similarity between the first candidate word vector and the keyword; inserting the first candidate word vector into the priority queue according to the order of similarity.
  • Step S7 De-duplicate the first candidate word vector in the priority queue.
  • Step S8 Obtain the target word vectors of the second preset number in the prioritized queue after deduplication.
  • Step S9 Push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
  • the method of pushing a second preset number of synonymous words for the user to select based on the second preset number of target word vectors and word-index files includes: obtaining the second preset number of targets The target index corresponding to the word vector; query the word vector corresponding to the target index according to the word-index file; push the synonym corresponding to the word vector for the user to select.
  • the binary tree structure file and the word-index file are stored together, and when it is necessary to query the vocabulary of the neighbor Top N of a certain keyword, only these two files need to be used for indexing.
  • the synonym search function supported by this application is more innovative and convenient. It can generate synonymous words of 5 keywords at a time, and push 8 synonymous words at a time. It supports users to click "change batch" to replace 8 synonymous words in another round, which is convenient for users. View and use. For example, a "change batch" button is displayed on the push interface. After the user clicks the button, the original synonyms can be updated and more synonyms can be pushed.
  • a preset rule is added to filter the queried vocabulary, wherein the preset rule includes at least one of the following rules:
  • the type includes Chinese, English and numbers. For example, preferentially return vocabulary consistent with the keyword type.
  • Chinese return to English; or input English and return to Chinese, return normally.
  • numbers are returned; or English, numbers are returned, and the synonym is deleted directly. It should be noted that a single English letter or a single Chinese character represents 1 character.
  • the synonym push method provided in this application includes obtaining interview questions; configuring the first preset number of keywords corresponding to the answers to the interview questions; searching for each key word in the pre-trained word vector model The second preset number of synonyms corresponding to the word; and pushing the second preset number of synonyms for the user to choose.
  • the word vector used in this application covers a wide range, and the vector dimension of the characterizing word is 200 dimensions.
  • the vector of each word can well reflect the actual semantics of each word; the word vector model of this application includes 8 million words, which is very A good solution to the traditional out-of-word problem.
  • the word vector model used in this application greatly reduces the memory usage, greatly reduces the memory usage by sampling the word-index file, and greatly increases the system stability.
  • this application can configure more synonymous words of keywords corresponding to the answers of the interview questions during the robot interview process. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.
  • Fig. 2 is a diagram of functional modules in a preferred embodiment of a device for pushing synonyms of this application.
  • the synonym pushing device 20 (referred to as “pushing device” for ease of description) runs in an electronic device.
  • the pushing device 20 may include multiple functional modules composed of program code segments.
  • the program code of each program segment in the pushing device 20 can be stored in a memory and executed by at least one processor to perform the function of pushing synonyms.
  • the functional modules of the pushing device 20 may include: an acquisition module 201, a configuration module 202, a training module 203, a construction module 204, a traversal module 205, a deduplication module 206, and a pushing module 207.
  • the function of each module will be detailed in the subsequent embodiments.
  • the module referred to in this application refers to a series of computer program segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory.
  • the obtaining module 201 is used to obtain interview questions.
  • interview questions will be configured according to different positions.
  • interview questions configured according to R&D positions include "Which programming languages are you familiar with”, "How to break out of the current multiple nested loops in Java” and "Is there a memory leak in Java, please describe briefly” and so on.
  • the robot interview needs to be pre-configured with interview questions and answers.
  • different job applicants give different answers when facing the same interview questions.
  • the configuration module 202 is configured to configure a first preset number of keywords of answers corresponding to the interview questions.
  • the keywords for configuring the first preset number of answers corresponding to the interview questions include:
  • the keyword may also be a keyword related to the query result obtained by performing semantic analysis according to the query result.
  • the keywords for configuring the first preset number of answers corresponding to the interview questions include:
  • the topic analysis model can analyze the topic characteristics of the interview topic.
  • the topic features may include topic intentions and key information. For example, when the interview question is "What programming languages are you good at?", the intent of the question stem is the programming language you are good at, and the key information can be the programming language.
  • the pre-established knowledge base may include C/C++, Java, C#, SQL, etc.
  • the training module 203 is used for pre-training to obtain the target word vector model based on the super large word vector model.
  • pre-training is performed based on the super large word vector model to obtain a suitable target word vector model. Specifically, it includes: expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode; according to the expanded corpus The super-large word vector model is trained to obtain the target word vector model.
  • the training corpus of the super large word vector model covers a large number of corpora of different dimensions, such as news, web pages, novels, Baidu Baike, and Wikipedia.
  • the corpus of the specific scene in the super-large word vector model is insufficient. Therefore, the corpus of the robot interview scene is integrated on the basis of the super-large word vector model, and the corpus of question and answer text and similar question text in the robot interview is expanded.
  • the target word vector model is a word vector model that contains the prediction of the robot interview.
  • the final trained target word vector model covers more than 8 million words, and the dimension of each word is about 200 dimensions. Therefore, the target word vector model corpus is extensive, and each word vector therein can well reflect the semantics of each word. At the same time, the order of magnitude of 8 million words can completely replace the traditional way of constructing a dictionary of synonyms, and solve the problem of not being able to find words.
  • the construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index.
  • the construction of a word vector matrix according to the target word vector model to obtain a word-index file may include:
  • the word vector matrix is a matrix composed of the dimension of each word as the number of rows and the total number of all words as the number of columns.
  • the dimension of each word is 200
  • the target word vector model includes 8 million words. Then, a word vector matrix with 200 columns and 8 million rows can be obtained.
  • each row in the word vector matrix has an index
  • the index corresponding to each word can be obtained.
  • the word-index file is output according to the word vector matrix.
  • the corresponding relationship between each index and each word vector can also be obtained.
  • the construction module 204 is also used to construct a binary tree based on all word vectors in the target word vector model.
  • a binary tree structure is constructed for all word vectors in the target word vector model.
  • the word vector is a 200-dimensional vector, that is, a 200-dimensional high-dimensional data space. Each word vector represents a point in the high-dimensional data space.
  • the data space corresponding to all word vectors in the target word vector model can be Expressed as 8 million points. Construct a binary tree according to the target word vector model by the following method:
  • Two points are randomly selected as initial nodes, and the two initial nodes are connected to form an equidistant hyperplane.
  • the data space can be divided into multiple subspaces, and a binary tree structure can be constructed according to the multiple subspaces.
  • the subspace is no longer divided.
  • the k is greater than or equal to 8 and less than or equal to 10. In this embodiment, the value of k is 10.
  • the segmentation condition of each node in the above binary tree structure is those equidistant vertical hyperplanes, and finally, the word vector is the leaf node on the binary tree.
  • the word vector is the leaf node on the binary tree.
  • the traversal module 205 is configured to traverse the binary tree, query the binary tree for the first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector.
  • the specific method for constructing a priority queue is: taking the keyword as the root node of the binary tree; traversing all intermediate nodes under the root node; calculating the distance between the root node and each intermediate node; determining that it is greater than
  • the intermediate node corresponding to the target distance of the preset distance threshold is the first-level target node; all the intermediate nodes under the first-level target node are traversed to the last-level leaf node; the word vectors in all the leaf nodes are used as the first candidate Word vector; calculating the similarity between the first candidate word vector and the keyword; inserting the first candidate word vector into the priority queue according to the order of similarity.
  • the deduplication module 206 is configured to deduplicate the first candidate word vector in the priority queue.
  • the acquiring module 201 is also used for acquiring the target word vectors of the second preset number in the prioritized queue after deduplication.
  • the pushing module 207 is configured to push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
  • the method of pushing a second preset number of synonymous words for the user to select based on the second preset number of target word vectors and word-index files includes: obtaining the second preset number of targets The target index corresponding to the word vector; query the word vector corresponding to the target index according to the word-index file; push the synonym corresponding to the word vector for the user to select.
  • the binary tree structure file and the word-index file are stored together, and when it is necessary to query the vocabulary of the neighbor Top N of a certain keyword, only these two files need to be used for indexing.
  • the synonym search function supported by this application is more innovative and convenient. It can generate synonymous words of 5 keywords at a time, and push 8 synonymous words at a time. It supports users to click "change batch" to replace 8 synonymous words in another round, which is convenient for users. View and use. For example, a "change batch" button is displayed on the push interface. After the user clicks the button, the original synonyms can be updated and more synonyms can be pushed.
  • a preset rule is added to filter the queried vocabulary, wherein the preset rule includes at least one of the following rules:
  • the type includes Chinese, English and numbers. For example, preferentially return vocabulary consistent with the keyword type.
  • Chinese return to English; or input English and return to Chinese, return normally.
  • numbers are returned; or English, numbers are returned, and the synonym is deleted directly. It should be noted that a single English letter or a single Chinese character represents 1 character.
  • the aforementioned pushing device 20 can also be used to push synonyms.
  • the push device 20 described in this application includes an acquisition module 201, a configuration module 202, a training module 203, a construction module 204, a traversal module 205, a deduplication module 206, and a push module 207.
  • the acquisition module 201 is used to obtain interview questions; the configuration module 202 is used to configure a first preset number of keywords corresponding to the answers to the interview questions; the training module 203 is used to pre-set based on the super-large word vector model
  • the target word vector model is obtained through training;
  • the construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
  • the construction module 204 is also used to construct a binary tree based on all word vectors in the target word vector model;
  • the traversal module 205 is used to traverse the binary tree, and query from the binary tree that the distance to the keyword is greater than A first candidate word vector with a preset distance threshold and construct a priority queue based on the first candidate word vector;
  • the de-duplication module 206 is configured to de-duplicate the first candidate word vector in the priority queue;
  • the acquiring module 201 is also configured to acquire the target word
  • the word vector used in this application covers a wide range, and the vector dimension of the characterizing word is 200 dimensions.
  • the vector of each word can well reflect the actual semantics of each word; the word vector model of this application includes 8 million words, which is very A good solution to the traditional out-of-word problem.
  • the word vector model used in this application greatly reduces the memory usage, greatly reduces the memory usage by sampling the word-index file, and greatly increases the system stability.
  • the query return speed of this application has been greatly increased. It used to take a query time of about ten seconds for a word, but now it is reduced to less than 0.01s.
  • this application can configure more synonymous words of keywords corresponding to the answers of the interview questions during the robot interview process.
  • the above-mentioned integrated unit implemented in the form of a software function module may be stored in a computer readable storage medium.
  • the above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a dual-screen device, or a network device, etc.) or a processor to execute the various embodiments of this application. Part of the method.
  • FIG. 3 is a schematic diagram of the electronic device provided in the third embodiment of the application.
  • the electronic device 3 includes a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and running on the at least one processor 32, at least one communication bus 34 and a database 35.
  • the computer program 33 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 31 and executed by the at least one processor 32, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the computer-readable instruction segments are used to describe the execution process of the computer program 33 in the electronic device 3.
  • the electronic device 3 may be a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA) and other devices installed with applications.
  • PDA Personal Digital Assistant
  • the schematic diagram 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3.
  • the electronic device 3 may also include input and output devices, network access devices, buses, and so on.
  • the at least one processor 32 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (ASICs). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the processor 32 may be a microprocessor, or the processor 32 may also be any conventional processor, etc.
  • the processor 32 is the control center of the electronic device 3, and connects the entire electronic device with various interfaces and lines. Various parts of device 3.
  • the memory 31 may be used to store the computer program 33 and/or modules/units.
  • the processor 32 runs or executes the computer programs and/or modules/units stored in the memory 31, and calls the computer programs and/or modules/units stored in the memory 31.
  • the data in 31 realizes various functions of the electronic device 3.
  • the memory 31 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data (such as audio data, etc.) created according to the use of the electronic device 3 and the like are stored.
  • the memory 31 may include a volatile memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card.
  • Flash Card at least one magnetic disk storage device, flash memory device, high-speed random access memory, or other storage device.
  • the memory 31 stores program codes, and the at least one processor 32 can call the program codes stored in the memory 31 to perform related functions.
  • the various modules acquisition module 201, configuration module 202, training module 203, construction module 204, traversal module 205, deduplication module 206, and push module 207) described in FIG. 2 are programs stored in the memory 31
  • the code is executed by the at least one processor 32, so as to realize the functions of the various modules to achieve the purpose of pushing synonyms.
  • the obtaining module 201 is used to obtain interview questions
  • the configuration module 202 is used to configure the first preset number of keywords corresponding to the answers of the interview questions;
  • the training module 203 is used for pre-training to obtain a target word vector model based on the super-large word vector model
  • the construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
  • the construction module 204 is further configured to construct a binary tree based on all word vectors in the target word vector model;
  • the traversal module 205 is configured to traverse the binary tree, query the binary tree for the first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;
  • the deduplication module 206 is configured to deduplicate the first candidate word vector in the priority queue
  • the acquiring module 201 is also used to acquire the target word vectors of the second preset number in the prioritized queue after deduplication;
  • the pushing module 207 is configured to push the second preset number of target word vectors and word-index files to the second preset number of synonyms for selection by the user.
  • the database (Database) 35 is a warehouse built on the electronic device 3 for organizing, storing and managing data according to a data structure. Databases are usually divided into three types: hierarchical database, network database and relational database. In this embodiment, the database 35 is used to store information such as interview questions.
  • the integrated module/unit of the electronic device 3 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program includes computer-readable instruction code
  • the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random access memory, etc.
  • the functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A near-synonym pushing method, comprising: obtaining an interview question (S1); configuring a first preset number of keywords of an answer corresponding to the interview question (S2); performing pre-training on the basis of a super large word vector model to obtain a target word vector model (S3); constructing a word vector matrix according to the target word vector model to obtain a word-index file (S4); constructing a binary tree on the basis of all word vectors (S5); traversing the binary tree, querying, from the binary tree, first candidate word vectors having distances to the keywords greater than a preset distance threshold, and constructing a priority queue (S6); performing deduplication on the first candidate word vectors in the priority queue (S7); obtaining a second preset number of target word vectors at top positions in the deduplicated priority queue (S8); and pushing, on the basis of the target word vectors and the word-index file, a second preset number of near-synonyms to allow a user to select (S9). Also provided are a near-synonym pushing apparatus, an electronic device, and a storage medium. The present invention can achieve quick near-synonym pushing to users.

Description

近义词推送方法、装置、电子设备及介质Synonym push method, device, electronic equipment and medium
本申请要求于2020年3月2日提交中国专利局、申请号为202010136905.7,发明名称为“近义词推送方法、装置、电子设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 2, 2020, the application number is 202010136905.7, and the invention title is "Synonym push method, device, electronic equipment and medium", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及人工智能技术领域,具体涉及一种近义词推送方法、装置、电子设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic device, and storage medium for pushing synonyms.
背景技术Background technique
项目需求为人工智能(AI)面试规则配置系统中,部分公司的用户可实时更新专家规则中的回答关键词。但是,发明人意识到,用户在填写回答关键词时,需要手动、纯人力的输入大量信息,系统无法对用户输入关键词时提供帮助,如近义词的推荐等。这种操作降低了用户的编写效率,也极度依赖用户对回答关键词的个人理解,无法保证用户输入的关键词是否较为全量且客观。The project requirement is the artificial intelligence (AI) interview rule configuration system. Users of some companies can update the answer keywords in the expert rules in real time. However, the inventor realizes that the user needs to input a large amount of information manually and purely when filling in answer keywords, and the system cannot provide assistance to the user when inputting keywords, such as recommendation of synonyms. This operation reduces the user's writing efficiency, and it is extremely dependent on the user's personal understanding of the answer keywords, and cannot guarantee whether the keywords input by the user are relatively complete and objective.
发明内容Summary of the invention
鉴于以上内容,有必要提出一种近义词推送方法、装置、电子设备及存储介质,可以为用户在进行AI面试时提供快速地近义词推送。In view of the above content, it is necessary to propose a method, device, electronic device, and storage medium for pushing synonyms, which can provide users with fast synonym pushing during AI interviews.
本申请的第一方面提供一种近义词推送方法,所述方法包括:The first aspect of the present application provides a method for pushing synonyms, and the method includes:
获取面试题目;Get interview questions;
配置第一预设个数与所述面试题目对应的答案的关键词;Configure the first preset number of keywords for answers corresponding to the interview questions;
基于超大词向量模型预先训练得到目标词向量模型;Pre-trained based on the super-large word vector model to obtain the target word vector model;
根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
基于所述目标词向量模型中的所有词向量构建二叉树;Constructing a binary tree based on all word vectors in the target word vector model;
遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;
对所述优先队列中的所述第一候选词向量进行去重;De-duplicate the first candidate word vector in the priority queue;
获取去重后的优先队列中排序在前第二预设个数的目标词向量;Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;
基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
本申请的第二方面一种近义词推送装置,所述装置包括:The second aspect of the present application is a device for pushing synonyms, and the device includes:
获取模块,用于获取面试题目;Acquisition module, used to acquire interview questions;
配置模块,用于配置第一预设个数与所述面试题目对应的答案的关键词;The configuration module is used to configure the first preset number of keywords corresponding to the answers of the interview questions;
训练模块,用于基于超大词向量模型预先训练得到目标词向量模型;The training module is used to pre-train the target word vector model based on the super-large word vector model;
构建模块,用于根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;The construction module is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index;
所述构建模块,还用于基于所述目标词向量模型中的所有词向量构建二叉树;The construction module is also used to construct a binary tree based on all word vectors in the target word vector model;
遍历模块,用于遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;A traversal module, configured to traverse the binary tree, query the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;
去重模块,用于对所述优先队列中的所述第一候选词向量进行去重;A deduplication module, configured to deduplicate the first candidate word vector in the priority queue;
所述获取模块,还用于获取去重后的优先队列中排序在前第二预设个数的目标词向量;及The acquiring module is also used to acquire the target word vectors of the second preset number in the deduplicated priority queue; and
推送模块,用于基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近 义词供用户选择。The push module is configured to push the second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
本申请的第三方面提供一种电子设备,其中,所述电子设备包括处理器,所述处理器用于执行存储器中存储的计算机可读指令以实现以下步骤:A third aspect of the present application provides an electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
获取面试题目;Get interview questions;
配置第一预设个数与所述面试题目对应的答案的关键词;Configure the first preset number of keywords for answers corresponding to the interview questions;
基于超大词向量模型预先训练得到目标词向量模型;Pre-trained based on the super-large word vector model to obtain the target word vector model;
根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
基于所述目标词向量模型中的所有词向量构建二叉树;Constructing a binary tree based on all word vectors in the target word vector model;
遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;
对所述优先队列中的所述第一候选词向量进行去重;De-duplicate the first candidate word vector in the priority queue;
获取去重后的优先队列中排序在前第二预设个数的目标词向量;Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;
基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
本申请的第四方面提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现以下步骤:A fourth aspect of the present application provides a computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the following steps when executed by a processor:
获取面试题目;Get interview questions;
配置第一预设个数与所述面试题目对应的答案的关键词;Configure the first preset number of keywords for answers corresponding to the interview questions;
基于超大词向量模型预先训练得到目标词向量模型;Pre-trained based on the super-large word vector model to obtain the target word vector model;
根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
基于所述目标词向量模型中的所有词向量构建二叉树;Constructing a binary tree based on all word vectors in the target word vector model;
遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;
对所述优先队列中的所述第一候选词向量进行去重;De-duplicate the first candidate word vector in the priority queue;
获取去重后的优先队列中排序在前第二预设个数的目标词向量;Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;
基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
本申请所述的近义词推送方法、装置、电子设备及存储介质。通过配置第一预设个数与所述面试题目对应的答案的关键词,在预先训练的词向量模型中查找与每个关键词对应的第二预设个数近义词,推送所述第二预设个数近义词供用户选择。可以为机器人面试过程中配置更多面试题目对应的答案的关键字的近义词。方便人力资源HR在对求职者进行面试时,为面试题目配置更加全面的答案。从而在接收到求职者针对面试题目的答案时,可以更准确地分析求职者的答案,方便人力资源给出对求职者的更全面的分析。The method, device, electronic equipment and storage medium for pushing synonyms described in this application. By configuring the first preset number of keywords corresponding to the answers to the interview questions, search for the second preset number of synonyms corresponding to each keyword in the pre-trained word vector model, and push the second preset number Set a number of synonyms for users to choose. You can configure more synonyms of keywords of the answers corresponding to the interview questions during the robot interview process. It is convenient for human resources HR to configure more comprehensive answers for interview questions when interviewing job applicants. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only It is the embodiment of the present application. For those of ordinary skill in the art, other drawings can be obtained according to the provided drawings without creative work.
图1是本申请实施例一提供的近义词推送方法的流程图。FIG. 1 is a flowchart of a method for pushing synonyms provided in Embodiment 1 of the present application.
图2是本申请实施例二提供的推送装置的功能模块图。Fig. 2 is a functional module diagram of the push device provided in the second embodiment of the present application.
图3是本申请实施例三提供的电子设备的示意图。Fig. 3 is a schematic diagram of an electronic device provided in a third embodiment of the present application.
如下具体实施方式将结合上述附图进一步说明本申请。The following specific embodiments will further illustrate this application in conjunction with the above-mentioned drawings.
具体实施方式Detailed ways
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to be able to understand the above objectives, features and advantages of the application more clearly, the application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the application and the features in the embodiments can be combined with each other if there is no conflict.
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following description, many specific details are set forth in order to fully understand the present application. The described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terms used in the specification of the application herein are only for the purpose of describing specific embodiments, and are not intended to limit the application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", and "third" in the specification and claims of the present application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the term "including" and any variations of them are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
本申请实施例的近义词推送方法应用在电子设备中。所述对于需要进近义词推送的电子设备,可以直接在电子设备上集成本申请的方法所提供的近义词推送功能,或者安装用于实现本申请的方法的客户端。再如,本申请所提供的方法还可以以软件开发工具包(Software Development Kit,SDK)的形式运行在服务器等设备上,以SDK的形式提供近义词推送功能的接口,电子设备或其他设备通过提供的接口即可实现近义词推送功能。The method for pushing synonymous words in the embodiment of this application is applied in an electronic device. For the electronic device that needs to push the approach word, the synonym push function provided by the method of this application can be directly integrated on the electronic device, or the client for implementing the method of this application can be installed. For another example, the method provided in this application can also be run on a server and other devices in the form of a Software Development Kit (SDK), providing an interface for the push function of synonyms in the form of SDK, and electronic devices or other devices provide The interface can realize the push function of synonyms.
实施例一Example one
图1是本申请实施例一提供的近义词推送方法的流程图。根据不同的需求,所述流程图中的执行顺序可以改变,某些步骤可以省略。FIG. 1 is a flowchart of a method for pushing synonyms provided in Embodiment 1 of the present application. According to different requirements, the execution sequence in the flowchart can be changed, and some steps can be omitted.
为了在机器人面试过程中,通过机器人更好的判定求职者在回答面试过程中的面试题目是否正确,并根据回答结果给求职者评分时。需要根据所述面试题目对应的答案配置关键词,并在接收到求职者输入的答案后,根据输入的答案提取关键词。将提取的关键词与配置的关键词进行匹配,得到匹配结果,根据匹配结果对所述求职者进行评分。而在根据所述面试题目对应的答案配置关键词时,为了避免关键词不够全面的情况出现,本申请提供了一种在配置关键词时,对面试官输入的关键词进行拓展,推送同近/义词的方法。所述方法包括:In the robot interview process, the robot can better determine whether the job applicant is correct in answering the interview questions in the interview process, and when the job applicant is graded according to the answer result. It is necessary to configure keywords according to the answers corresponding to the interview questions, and after receiving the answers input by the job applicant, extract the keywords according to the input answers. The extracted keywords are matched with the configured keywords to obtain a matching result, and the job applicant is scored according to the matching result. When configuring keywords according to the answers corresponding to the interview questions, in order to avoid the situation where the keywords are not comprehensive enough, this application provides a way to expand the keywords input by the interviewer when configuring keywords, and push the same. /Method of meaning words. The method includes:
步骤S1,获取面试题目。Step S1: Obtain interview questions.
在机器人面试过程中,会根据不同的岗位配置不同的面试题目。例如,根据研发岗位配置的面试题目包括“你熟悉哪些编程语言”、“在Java中,如何跳出当前的多重嵌套循环”和“Java中会存在内存泄漏吗,请简单描述”等等。During the robot interview process, different interview questions will be configured according to different positions. For example, interview questions configured according to R&D positions include "Which programming languages are you familiar with", "How to break out of the current multiple nested loops in Java" and "Is there a memory leak in Java, please describe briefly" and so on.
在本实施方式中,机器人面试需要预先配置好面试题目和答案。然而,不同的求职者面对同样的面试题目时给出的答案也不相同。为了全面的评判求职者的能力,在配置面试题目和答案时需要根据所述面试题目配置详尽完整且全面的答案。In this embodiment, the robot interview needs to be pre-configured with interview questions and answers. However, different job applicants give different answers when facing the same interview questions. In order to comprehensively evaluate the ability of job applicants, when configuring interview questions and answers, it is necessary to configure detailed, complete and comprehensive answers according to the interview questions.
步骤S2,配置第一预设个数与所述面试题目对应的答案的关键词。Step S2, configuring the first preset number of keywords of answers corresponding to the interview questions.
在一实施方式中,所述配置第一预设个数与所述面试题目对应的答案的关键词的步骤包括:In one embodiment, the step of configuring the first preset number of keywords for answers corresponding to the interview questions includes:
在预先建立的面试题目与答案对应表中查询所述面试题目配置对应的答案,得到查询结果;Query the answer corresponding to the interview question configuration in the pre-established interview question and answer correspondence table, and obtain the query result;
提取所述查询结果中的关键词,其中,所述关键词为第一预设个数。Extract keywords in the query result, where the keywords are the first preset number.
可以理解的是,所述关键词还可以是根据所述查询结果进行语义分析得到的与所述查询结果相关的关键词。It is understandable that the keyword may also be a keyword related to the query result obtained by performing semantic analysis according to the query result.
在另一实施方式中,所述配置第一预设个数与所述面试题目对应的答案的关键词的步骤包括:In another embodiment, the step of configuring the first preset number of keywords of answers corresponding to the interview questions includes:
(1)根据预先构建的题目解析模型分析所述面试题目得到对应的题目意图。(1) Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions.
在本实施方式中,所述题目解析模型可以对所述面试题目的题目特征进行分析。所述题目特征可以包括题干意图和关键信息。例如,当面试题目为“你所擅长的编程语言有哪些”,那么题干意图是擅长的编程语言,关键信息可以是编程语言。In this embodiment, the topic analysis model can analyze the topic characteristics of the interview topic. The topic features may include topic intentions and key information. For example, when the interview question is "What programming languages are you good at?", the intent of the question stem is the programming language you are good at, and the key information can be the programming language.
(2)根据所述题目意图和预先建立的知识库,确定所述面试题目对应的答案。(2) Determine the answer corresponding to the interview question according to the purpose of the question and a pre-established knowledge base.
例如,当面试题目为“你所擅长的编程语言有哪些”,那么所述预先建立的知识库中可能包括C/C++、Java、C#和SQL等。For example, when the interview topic is "What programming languages are you good at?", the pre-established knowledge base may include C/C++, Java, C#, SQL, etc.
(3)根据所述对应的答案提取第一预设个数关键词。(3) Extract the first preset number of keywords according to the corresponding answer.
步骤S3,基于超大词向量模型预先训练得到目标词向量模型。Step S3, pre-training based on the super large word vector model to obtain the target word vector model.
在本实施方式中,基于超大词向量模型进行预先训练得到合适的目标词向量模型。具体包括:扩充所述超大词向量模型中的机器人面试场景语料,其中,包括对所述机器人面试场景语料进行分词、去停用词及基于CBOW模式增量训练词向量操作;根据扩充语料后的超大词向量模型训练得到目标词向量模型。In this embodiment, pre-training is performed based on the super large word vector model to obtain a suitable target word vector model. Specifically, it includes: expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode; according to the expanded corpus The super-large word vector model is trained to obtain the target word vector model.
具体地,所述超大词向量模型的训练语料涵盖了大量新闻、网页、小说、百度百科、维基百科等不同维度的语料。而针对机器人面试场景,超大词向量模型中的特定化场景的语料不足。因此,在超大词向量模型基础上融合机器人面试场景的语料,扩充机器人面试中的问答文本、相似问题文本等语料。所述目标词向量模型为包含了机器人面试预料的词向量模型。Specifically, the training corpus of the super large word vector model covers a large number of corpora of different dimensions, such as news, web pages, novels, Baidu Baike, and Wikipedia. For the robot interview scene, the corpus of the specific scene in the super-large word vector model is insufficient. Therefore, the corpus of the robot interview scene is integrated on the basis of the super-large word vector model, and the corpus of question and answer text and similar question text in the robot interview is expanded. The target word vector model is a word vector model that contains the prediction of the robot interview.
再先对机器人面试场景语料进行分词、去停用词、基于CBOW模式增量训练词向量等操作,以扩充它在机器人面试场景下的性能表现。最终训练好的目标词向量模型涵盖了超过800万个词,每个词的维度约有200维。从而使所述目标词向量模型语料广泛,并且其中的每个词向量都能很好地反应出每个词的语义。同时800万个词的数量级能够完全顶替传统的构建近义词词典的方式,很好地解决找不到词的问题。Then, perform word segmentation, remove stop words, and incrementally train word vectors based on the CBOW mode on the robot interview scene corpus to expand its performance in the robot interview scene. The final trained target word vector model covers more than 8 million words, and the dimension of each word is about 200 dimensions. Therefore, the target word vector model corpus is extensive, and each word vector therein can well reflect the semantics of each word. At the same time, the order of magnitude of 8 million words can completely replace the traditional way of constructing a dictionary of synonyms, and solve the problem of not being able to find words.
需要说明的是,基于超大词向量模型预先训练得到目标词向量模型的方法为现有技术,在此不再赘述。It should be noted that the method of pre-training the target word vector model based on the super-large word vector model is the prior art, and will not be repeated here.
步骤S4,根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系。Step S4, constructing a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index.
在本实施方式中,所述根据所述目标词向量模型构建词向量矩阵得到词-索引文件可以包括:In this embodiment, the construction of a word vector matrix according to the target word vector model to obtain a word-index file may include:
(a1)以每个词的维度为行数,以所述目标词向量模型中所有词的总数为列数构建一个词向量矩阵;(a1) Construct a word vector matrix with the dimension of each word as the number of rows and the total number of all words in the target word vector model as the number of columns;
(a2)所述词向量矩阵中的每一行对应一个索引;(a2) Each row in the word vector matrix corresponds to an index;
(a3)根据所述词向量矩阵构建词-索引文件,并输出所述词-索引文件。(a3) Construct a word-index file according to the word vector matrix, and output the word-index file.
具体地,所述词向量矩阵以每个词的维度为行数,以所有词的总数为列数组成的一个矩阵。在本实施方式中,每个词的维度为200,所述目标词向量模型包括800万个词。那么,可以得到一个200列,800万行的词向量矩阵。Specifically, the word vector matrix is a matrix composed of the dimension of each word as the number of rows and the total number of all words as the number of columns. In this embodiment, the dimension of each word is 200, and the target word vector model includes 8 million words. Then, a word vector matrix with 200 columns and 8 million rows can be obtained.
而所述词向量矩阵中的每一行都有一个索引,那么,可以得到每个词对应的索引。从而根据所述词向量矩阵输出词-索引文件。同时,也可以得到每个索引与每个词向量之间的对应关系。And each row in the word vector matrix has an index, then the index corresponding to each word can be obtained. Thus, the word-index file is output according to the word vector matrix. At the same time, the corresponding relationship between each index and each word vector can also be obtained.
步骤S5,基于所述目标词向量模型中的所有词向量构建二叉树。In step S5, a binary tree is constructed based on all word vectors in the target word vector model.
在本实施方式中,根据所述目词向量模型中的所有词向量构建二叉树结构。In this embodiment, a binary tree structure is constructed according to all word vectors in the target word vector model.
所述词向量是一个200维的向量,即是200维的高维数据空间,每个词向量在高维数据空间表示一个点,所述目标词向量模型中的所有词向量对应的数据空间可以表示为800万个点。通过以下方法根据所述目标词向量模型构建二叉树:The word vector is a 200-dimensional vector, that is, a 200-dimensional high-dimensional data space. Each word vector represents a point in the high-dimensional data space. The data space corresponding to all word vectors in the target word vector model can be Expressed as 8 million points. Construct a binary tree according to the target word vector model by the following method:
(1)随机选择两个点为初始节点,连接两个初始节点形成一个等距超平面;(1) Two points are randomly selected as initial nodes, and the two initial nodes are connected to form an equidistant hyperplane;
(2)根据所述两个初始节点的连线的中点垂直线构建一个等距垂直超平面,将所述目标词向量模型中的所有词向量对应的数据空间分成两部分,并得到两个子空间;(2) Construct an equidistant vertical hyperplane according to the midpoint vertical line of the connection between the two initial nodes, divide the data space corresponding to all word vectors in the target word vector model into two parts, and obtain two sub- space;
(3)分别将每个子空间中的每个点与等距超平面的法向量相乘(向量点积),求出每个点与法向量夹角的正负,以正负来分出其属于二叉树的左子树还是右子树;(3) Multiply each point in each subspace by the normal vector of the equidistant hyperplane (vector dot product), find the sign of the angle between each point and the normal vector, and divide it by sign Whether the left subtree belongs to the binary tree or the right subtree;
(4)依此类推,分别在所述两个子空间内重复上述步骤(1)至(3),可以将所述数据空间切分为多个子空间,并根据所述多个子空间构建二叉树结构。(4) By analogy, repeating the above steps (1) to (3) in the two subspaces respectively, the data space can be divided into multiple subspaces, and a binary tree structure can be constructed according to the multiple subspaces.
优选地,当每个子空间最多只剩下k个点时,不再对所述子空间进行切分。优选地,所述k大于等于8且小于等于10。在本实施方式中,所述k的取值为10。Preferably, when there are at most k points left in each subspace, the subspace is no longer divided. Preferably, the k is greater than or equal to 8 and less than or equal to 10. In this embodiment, the value of k is 10.
上述的二叉树结构中的每个节点的分割条件就是那些等距垂直超平面,最终,词向量即为二叉树上的叶子节点。即,所述二叉树包括根节点及多层中间节点和最后一层叶子节点,其中,每一个叶子节点代表一个词向量。在本申请中,无需在所述叶子节点上保存词向量,只需要保存词向量对应的索引即可。如此,相似的词向量在二叉树上的位置更近,为后续查询近义词提供了更快的速度。The segmentation condition of each node in the above binary tree structure is those equidistant vertical hyperplanes, and finally, the word vector is the leaf node on the binary tree. That is, the binary tree includes a root node, multiple intermediate nodes, and a final layer of leaf nodes, where each leaf node represents a word vector. In this application, there is no need to save the word vector on the leaf node, only the index corresponding to the word vector needs to be saved. In this way, the similar word vectors are closer in the binary tree, which provides a faster speed for subsequent search of synonyms.
步骤S6,遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列。Step S6: Traverse the binary tree, query the binary tree to find a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector.
具体的构建优先队列的方法为:以所述关键词作为所述二叉树的根节点;遍历所述根节点下的所有中间节点;计算所述根节点与每一个中间节点之间的距离;确定大于预设距离阈值的目标距离对应的中间节点为第一层目标节点;遍历所述第一层目标节点下的所有中间节点直至最后一层叶子节点;将所有叶子节点中的词向量作为第一候选词向量;计算所述第一候选词向量与所述关键词之间的相似度;根据相似度的大小顺序将所述第一候选词向量插入优先队列中。The specific method for constructing a priority queue is: taking the keyword as the root node of the binary tree; traversing all intermediate nodes under the root node; calculating the distance between the root node and each intermediate node; determining that it is greater than The intermediate node corresponding to the target distance of the preset distance threshold is the first-level target node; all the intermediate nodes under the first-level target node are traversed to the last-level leaf node; the word vectors in all the leaf nodes are used as the first candidate Word vector; calculating the similarity between the first candidate word vector and the keyword; inserting the first candidate word vector into the priority queue according to the order of similarity.
步骤S7,对所述优先队列中的所述第一候选词向量进行去重。Step S7: De-duplicate the first candidate word vector in the priority queue.
步骤S8,获取去重后的优先队列中排序在前第二预设个数的目标词向量。Step S8: Obtain the target word vectors of the second preset number in the prioritized queue after deduplication.
步骤S9,基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。Step S9: Push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
在本实施方式中,基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择的方法包括:获取所述第二预设个数的目标词向量对应的目标索引;根据所述词-索引文件查询与所述目标索引对应的词向量;推送所述词向量对应的近义词供用户选择。In this embodiment, the method of pushing a second preset number of synonymous words for the user to select based on the second preset number of target word vectors and word-index files includes: obtaining the second preset number of targets The target index corresponding to the word vector; query the word vector corresponding to the target index according to the word-index file; push the synonym corresponding to the word vector for the user to select.
在本实施方式中,将二叉树结构文件和词-索引文件一起保存,需要进行查询某个关键词的相近邻Top N的词汇的时候,只需要利用这两个文件进行索引即可。In this embodiment, the binary tree structure file and the word-index file are stored together, and when it is necessary to query the vocabulary of the neighbor Top N of a certain keyword, only these two files need to be used for indexing.
在本实施方式中,通过推送第二预设个数近义词供用户筛选,可以方便用户更全面的配置面试题目对应的答案的关键词。从而在求职者答题时,不会片面的根据求职者的答案给求职者评分。本申请支持的近义词查找功能更加创新和便利,一次性可生成5个关键词的近义词,并且每次推送8个近义词,支持用户点击“换一批”更换另外一轮的8个近义词,便于用户查看和使用。例如,在推送界面显示一个“换一批”按钮,在用户点击所述按钮后,可以更新原来的近义词,推送更多的近义词。In this embodiment, by pushing the second preset number of synonyms for the user to filter, it is convenient for the user to more comprehensively configure the keywords of the answers corresponding to the interview questions. Therefore, when the job seeker answers the question, the job seeker will not be scored unilaterally based on the answer of the job seeker. The synonym search function supported by this application is more innovative and convenient. It can generate synonymous words of 5 keywords at a time, and push 8 synonymous words at a time. It supports users to click "change batch" to replace 8 synonymous words in another round, which is convenient for users. View and use. For example, a "change batch" button is displayed on the push interface. After the user clicks the button, the original synonyms can be updated and more synonyms can be pushed.
优选地,由于很多词语并不是所述面试题目的答案,所以增加了预设规则筛选查询到的词汇,其中,所述预设规则包括以下规则中的至少一种:Preferably, since many words are not the answers to the interview questions, a preset rule is added to filter the queried vocabulary, wherein the preset rule includes at least one of the following rules:
(1)根据词语字数调整查询到的词汇的顺序。例如,优先返回与所述关键词字数一致的词汇。而对于与所述关键词字数不一致的词汇,每增加/减少1个字,则在将查询到的词汇进行排序时增加预设距离(如0.1)。(1) Adjust the order of the searched vocabulary according to the number of words. For example, priority is given to returning vocabulary consistent with the number of words in the keyword. For words that are inconsistent with the number of words of the keyword, for each increase/decrease of 1 word, the preset distance (for example, 0.1) is increased when sorting the queried words.
(2)按词汇的类型来筛选查询到的词汇,所述类型包括中文,英文和数字。例如,优先返回与所述关键词类型一致的词汇。另外对于输入中文,返回英文;或输入英文,返回中文的情况,正常返回。但对于输入中文,返回数字;或输入英文,返回数字的情况,则直接删除该近义词。需要说明的是,单个英文字母或单个中文字代表1个字符。(2) Filter the queried vocabulary according to the type of vocabulary, the type includes Chinese, English and numbers. For example, preferentially return vocabulary consistent with the keyword type. In addition, for inputting Chinese, return to English; or input English and return to Chinese, return normally. However, for Chinese input, numbers are returned; or English, numbers are returned, and the synonym is deleted directly. It should be noted that a single English letter or a single Chinese character represents 1 character.
(3)去除字数多于所述关键字的字数预设个数的词汇。例如,多于所述关键词的字数5个字以上的词汇。(3) Removal of words with more words than the preset number of words of the keyword. For example, a vocabulary with more than 5 characters than the keyword.
可以理解的是,上述方法同样可以用于推送同义词。It is understandable that the above method can also be used to push synonyms.
综上所述,本申请提供的近义词推送方法包括,获取面试题目;配置第一预设个数与所述面试题目对应的答案的关键词;在预先训练的词向量模型中查找与每个关键词对应的第二预设个数近义词;及推送所述第二预设个数近义词供用户选择。本申请采用的词向量涵盖面广泛,表征词的向量维度200维,每个词的向量都能很好地反应出每个词的实际语义;本申请的词向量模型包括800万个词,很好地解决传统的匹配不到词汇(out-of-word)的问题。本申请采用的词向量模型内存占用大大减少,通过采样词-索引文件极大地降低的内存占用率,并且极大增加系统稳定性。另外,本申请查询返回的速度大大增加,原来一个词需要十几秒左右的查询时间,现在降低到0.01s以内返回。最后本申请可以为机器人面试过程中配置更多面试题目对应的答案的关键字的近义词。从而在接收到求职者针对面试题目的答案时,可以更准确地分析求职者的答案,方便人力资源给出对求职者的更全面的分析。In summary, the synonym push method provided in this application includes obtaining interview questions; configuring the first preset number of keywords corresponding to the answers to the interview questions; searching for each key word in the pre-trained word vector model The second preset number of synonyms corresponding to the word; and pushing the second preset number of synonyms for the user to choose. The word vector used in this application covers a wide range, and the vector dimension of the characterizing word is 200 dimensions. The vector of each word can well reflect the actual semantics of each word; the word vector model of this application includes 8 million words, which is very A good solution to the traditional out-of-word problem. The word vector model used in this application greatly reduces the memory usage, greatly reduces the memory usage by sampling the word-index file, and greatly increases the system stability. In addition, the query return speed of this application has been greatly increased. It used to take a query time of about ten seconds for a word, but now it is reduced to less than 0.01s. Finally, this application can configure more synonymous words of keywords corresponding to the answers of the interview questions during the robot interview process. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.
以上所述,仅是本申请的具体实施方式,但本申请的保护范围并不局限于此,对于本领域的普通技术人员来说,在不脱离本申请创造构思的前提下,还可以做出改进,但这些均属于本申请的保护范围。The above are only specific implementations of this application, but the scope of protection of this application is not limited to this. For those of ordinary skill in the art, without departing from the creative concept of this application, they can also make Improvements, but these all belong to the scope of protection of this application.
下面结合图2和图3,分别对实现上述近义词推送方法的电子设备的功能模块及硬件结构进行介绍。The functional modules and hardware structure of the electronic device implementing the above-mentioned synonym pushing method are respectively introduced below in conjunction with FIG. 2 and FIG. 3.
实施例二Example two
图2为本申请近义词推送装置较佳实施例中的功能模块图。Fig. 2 is a diagram of functional modules in a preferred embodiment of a device for pushing synonyms of this application.
在一些实施例中,所述近义词推送装置20(为便于描述,简称为“推送装置”)运行于电子设备中。所述推送装置20可以包括多个由程序代码段所组成的功能模块。所述推送装置20中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行近义词推送功能。In some embodiments, the synonym pushing device 20 (referred to as "pushing device" for ease of description) runs in an electronic device. The pushing device 20 may include multiple functional modules composed of program code segments. The program code of each program segment in the pushing device 20 can be stored in a memory and executed by at least one processor to perform the function of pushing synonyms.
为了在机器人面试过程中,通过机器人更好的判定求职者在回答面试过程中的面试题目是否正确,并根据回答结果给求职者评分时。需要根据所述面试题目对应的答案配置关键词,并在接收到求职者输入的答案后,根据输入的答案提取关键词。将提取的关键词与配置的关键词进行匹配,得到匹配结果,根据匹配结果对所述求职者进行评分。而在根据所述面试题目对应的答案配置关键词时,为了避免关键词不够全面的情况出现,本申请提供了一种在配置关键词时,对面试官输入的关键词进行拓展,推送同近/义词的所述推送装置20。所述推送装置20的功能模块可以包括:获取模块201、配置模块202、训练模块203、构建模块204、遍历模块205、去重模块206及推送模块207。关于各模块的功能将在后续的实施例中详述。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机程序段,其存储在存储器中。In order to use the robot to better determine whether the job applicant is correct in answering the interview questions in the interview process during the robot interview process, and to rate the job applicant according to the answer. It is necessary to configure keywords according to the answers corresponding to the interview questions, and after receiving the answers input by the job applicant, extract the keywords according to the input answers. The extracted keywords are matched with the configured keywords to obtain a matching result, and the job applicant is scored according to the matching result. When configuring keywords according to the answers corresponding to the interview questions, in order to avoid the situation where the keywords are not comprehensive enough, this application provides a way to expand the keywords entered by the interviewer when configuring keywords, and push the same. /Synonymous said pushing device 20. The functional modules of the pushing device 20 may include: an acquisition module 201, a configuration module 202, a training module 203, a construction module 204, a traversal module 205, a deduplication module 206, and a pushing module 207. The function of each module will be detailed in the subsequent embodiments. The module referred to in this application refers to a series of computer program segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory.
所述获取模块201用于获取面试题目。The obtaining module 201 is used to obtain interview questions.
在机器人面试过程中,会根据不同的岗位配置不同的面试题目。例如,根据研发岗位配置的面试题目包括“你熟悉哪些编程语言”、“在Java中,如何跳出当前的多重嵌套循环”和“Java中会存在内存泄漏吗,请简单描述”等等。During the robot interview process, different interview questions will be configured according to different positions. For example, interview questions configured according to R&D positions include "Which programming languages are you familiar with", "How to break out of the current multiple nested loops in Java" and "Is there a memory leak in Java, please describe briefly" and so on.
在本实施方式中,机器人面试需要预先配置好面试题目和答案。然而,不同的求职者面对同样的面试题目时给出的答案也不相同。为了全面的评判求职者的能力,在配置面试题目和答案时需要根据所述面试题目配置详尽完整且全面的答案。In this embodiment, the robot interview needs to be pre-configured with interview questions and answers. However, different job applicants give different answers when facing the same interview questions. In order to comprehensively evaluate the ability of job applicants, when configuring interview questions and answers, it is necessary to configure detailed, complete and comprehensive answers according to the interview questions.
所述配置模块202用于配置第一预设个数与所述面试题目对应的答案的关键词。The configuration module 202 is configured to configure a first preset number of keywords of answers corresponding to the interview questions.
在一实施方式中,所述配置第一预设个数与所述面试题目对应的答案的关键词包括:In one embodiment, the keywords for configuring the first preset number of answers corresponding to the interview questions include:
在预先建立的面试题目与答案对应表中查询所述面试题目配置对应的答案,得到查询结果;Query the answer corresponding to the interview question configuration in the pre-established interview question and answer correspondence table, and obtain the query result;
提取所述查询结果中的关键词,其中,所述关键词为第一预设个数。Extract keywords in the query result, where the keywords are the first preset number.
可以理解的是,所述关键词还可以是根据所述查询结果进行语义分析得到的与所述查询结果相关的关键词。It is understandable that the keyword may also be a keyword related to the query result obtained by performing semantic analysis according to the query result.
在另一实施方式中,所述配置第一预设个数与所述面试题目对应的答案的关键词包括:In another embodiment, the keywords for configuring the first preset number of answers corresponding to the interview questions include:
(1)根据预先构建的题目解析模型分析所述面试题目得到对应的题目意图。(1) Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions.
在本实施方式中,所述题目解析模型可以对所述面试题目的题目特征进行分析。所述题目特征可以包括题干意图和关键信息。例如,当面试题目为“你所擅长的编程语言有哪些”,那么题干意图是擅长的编程语言,关键信息可以是编程语言。In this embodiment, the topic analysis model can analyze the topic characteristics of the interview topic. The topic features may include topic intentions and key information. For example, when the interview question is "What programming languages are you good at?", the intent of the question stem is the programming language you are good at, and the key information can be the programming language.
(2)根据所述题目意图和预先建立的知识库,确定所述面试题目对应的答案。(2) Determine the answer corresponding to the interview question according to the purpose of the question and a pre-established knowledge base.
例如,当面试题目为“你所擅长的编程语言有哪些”,那么所述预先建立的知识库中可能包括C/C++、Java、C#和SQL等。For example, when the interview topic is "What programming languages are you good at?", the pre-established knowledge base may include C/C++, Java, C#, SQL, etc.
(3)根据所述对应的答案提取第一预设个数关键词。(3) Extract the first preset number of keywords according to the corresponding answer.
所述训练模块203用于基于超大词向量模型预先训练得到目标词向量模型。The training module 203 is used for pre-training to obtain the target word vector model based on the super large word vector model.
在本实施方式中,基于超大词向量模型进行预先训练得到合适的目标词向量模型。具体包括:扩充所述超大词向量模型中的机器人面试场景语料,其中,包括对所述机器人面试场景语料进行分词、去停用词及基于CBOW模式增量训练词向量操作;根据扩充语料后的超大词向量模型训练得到目标词向量模型。In this embodiment, pre-training is performed based on the super large word vector model to obtain a suitable target word vector model. Specifically, it includes: expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode; according to the expanded corpus The super-large word vector model is trained to obtain the target word vector model.
具体地,所述超大词向量模型的训练语料涵盖了大量新闻、网页、小说、百度百科、维基百科等不同维度的语料。而针对机器人面试场景,超大词向量模型中的特定化场景的语料不足。因此,在超大词向量模型基础上融合机器人面试场景的语料,扩充机器人面试中的问答文本、相似问题文本等语料。所述目标词向量模型为包含了机器人面试预料的词向量模型。Specifically, the training corpus of the super large word vector model covers a large number of corpora of different dimensions, such as news, web pages, novels, Baidu Baike, and Wikipedia. For the robot interview scene, the corpus of the specific scene in the super-large word vector model is insufficient. Therefore, the corpus of the robot interview scene is integrated on the basis of the super-large word vector model, and the corpus of question and answer text and similar question text in the robot interview is expanded. The target word vector model is a word vector model that contains the prediction of the robot interview.
再先对机器人面试场景语料进行分词、去停用词、基于CBOW模式增量训练词向量等操作,以扩充它在机器人面试场景下的性能表现。最终训练好的目标词向量模型涵盖了超过800万个词,每个词的维度约有200维。从而使所述目标词向量模型语料广泛,并且其中的每个词向量都能很好地反应出每个词的语义。同时800万个词的数量级能够完全顶替传统的构建近义词词典的方式,很好地解决找不到词的问题。Then, perform word segmentation, remove stop words, and incrementally train word vectors based on the CBOW mode on the robot interview scene corpus to expand its performance in the robot interview scene. The final trained target word vector model covers more than 8 million words, and the dimension of each word is about 200 dimensions. Therefore, the target word vector model corpus is extensive, and each word vector therein can well reflect the semantics of each word. At the same time, the order of magnitude of 8 million words can completely replace the traditional way of constructing a dictionary of synonyms, and solve the problem of not being able to find words.
需要说明的是,基于超大词向量模型预先训练得到目标词向量模型的方法为现有技术,在此不再赘述。It should be noted that the method of pre-training the target word vector model based on the super-large word vector model is the prior art, and will not be repeated here.
所述构建模块204用于根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系。The construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index.
在本实施方式中,所述根据所述目标词向量模型构建词向量矩阵得到词-索引文件可以包括:In this embodiment, the construction of a word vector matrix according to the target word vector model to obtain a word-index file may include:
(a1)以每个词的维度为行数,以所述目标词向量模型中所有词的总数为列数构建一个词向量矩阵;(a1) Construct a word vector matrix with the dimension of each word as the number of rows and the total number of all words in the target word vector model as the number of columns;
(a2)所述词向量矩阵中的每一行对应一个索引;(a2) Each row in the word vector matrix corresponds to an index;
(a3)根据所述词向量矩阵构建词-索引文件,并输出所述词-索引文件。(a3) Construct a word-index file according to the word vector matrix, and output the word-index file.
具体地,所述词向量矩阵以每个词的维度为行数,以所有词的总数为列数组成的一个矩阵。在本实施方式中,每个词的维度为200,所述目标词向量模型包括800万个词。那么,可以得到一个200列,800万行的词向量矩阵。Specifically, the word vector matrix is a matrix composed of the dimension of each word as the number of rows and the total number of all words as the number of columns. In this embodiment, the dimension of each word is 200, and the target word vector model includes 8 million words. Then, a word vector matrix with 200 columns and 8 million rows can be obtained.
而所述词向量矩阵中的每一行都有一个索引,那么,可以得到每个词对应的索引。从而根据所述词向量矩阵输出词-索引文件。同时,也可以得到每个索引与每个词向量之间的对应关系。And each row in the word vector matrix has an index, then the index corresponding to each word can be obtained. Thus, the word-index file is output according to the word vector matrix. At the same time, the corresponding relationship between each index and each word vector can also be obtained.
所述构建模块204还用于基于所述目标词向量模型中的所有词向量构建二叉树。The construction module 204 is also used to construct a binary tree based on all word vectors in the target word vector model.
在本实施方式中,将所述目标词向量模型中的所有词向量构建二叉树结构。In this embodiment, a binary tree structure is constructed for all word vectors in the target word vector model.
所述词向量是一个200维的向量,即是200维的高维数据空间,每个词向量在高维数据空间表示一个点,所述目标词向量模型中的所有词向量对应的数据空间可以表示为800万个点。通过以下方法根据所述目标词向量模型构建二叉树:The word vector is a 200-dimensional vector, that is, a 200-dimensional high-dimensional data space. Each word vector represents a point in the high-dimensional data space. The data space corresponding to all word vectors in the target word vector model can be Expressed as 8 million points. Construct a binary tree according to the target word vector model by the following method:
(1)随机选择两个点为初始节点,连接两个初始节点形成一个等距超平面。(1) Two points are randomly selected as initial nodes, and the two initial nodes are connected to form an equidistant hyperplane.
(2)根据所述两个初始节点的连线的中点垂直线构建一个等距垂直超平面,将所述目标 词向量模型中的所有词向量对应的数据空间分成两部分,并得到两个子空间。(2) Construct an equidistant vertical hyperplane according to the midpoint vertical line of the connection between the two initial nodes, divide the data space corresponding to all word vectors in the target word vector model into two parts, and obtain two sub- space.
(3)分别将每个子空间中的每个点与等距超平面的法向量相乘(向量点积),求出每个点与法向量夹角的正负,以正负来分出其属于二叉树的左子树还是右子树。(3) Multiply each point in each subspace by the normal vector of the equidistant hyperplane (vector dot product), find the sign of the angle between each point and the normal vector, and divide it by sign The left subtree belongs to the binary tree or the right subtree.
(4)依此类推,分别在所述两个子空间内重复上述步骤(1)至(3),可以将所述数据空间切分为多个子空间,并根据所述多个子空间构建二叉树结构。(4) By analogy, repeating the above steps (1) to (3) in the two subspaces respectively, the data space can be divided into multiple subspaces, and a binary tree structure can be constructed according to the multiple subspaces.
优选地,当每个子空间最多只剩下k个点时,不再对所述子空间进行切分。优选地,所述k大于等于8且小于等于10。在本实施方式中,所述k的取值为10。Preferably, when there are at most k points left in each subspace, the subspace is no longer divided. Preferably, the k is greater than or equal to 8 and less than or equal to 10. In this embodiment, the value of k is 10.
上述的二叉树结构中的每个节点的分割条件就是那些等距垂直超平面,最终,词向量即为二叉树上的叶子节点。在本申请中,无需在所述叶子节点上保存词向量,只需要保存词向量对应的索引即可。如此,相似的词向量在二叉树上的位置更近,为后续查询近义词提供了更快的速度。The segmentation condition of each node in the above binary tree structure is those equidistant vertical hyperplanes, and finally, the word vector is the leaf node on the binary tree. In this application, there is no need to save the word vector on the leaf node, only the index corresponding to the word vector needs to be saved. In this way, the similar word vectors are closer in the binary tree, which provides a faster speed for subsequent search of synonyms.
所述遍历模块205用于遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列。The traversal module 205 is configured to traverse the binary tree, query the binary tree for the first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector.
具体的构建优先队列的方法为:以所述关键词作为所述二叉树的根节点;遍历所述根节点下的所有中间节点;计算所述根节点与每一个中间节点之间的距离;确定大于预设距离阈值的目标距离对应的中间节点为第一层目标节点;遍历所述第一层目标节点下的所有中间节点直至最后一层叶子节点;将所有叶子节点中的词向量作为第一候选词向量;计算所述第一候选词向量与所述关键词之间的相似度;根据相似度的大小顺序将所述第一候选词向量插入优先队列中。The specific method for constructing a priority queue is: taking the keyword as the root node of the binary tree; traversing all intermediate nodes under the root node; calculating the distance between the root node and each intermediate node; determining that it is greater than The intermediate node corresponding to the target distance of the preset distance threshold is the first-level target node; all the intermediate nodes under the first-level target node are traversed to the last-level leaf node; the word vectors in all the leaf nodes are used as the first candidate Word vector; calculating the similarity between the first candidate word vector and the keyword; inserting the first candidate word vector into the priority queue according to the order of similarity.
所述去重模块206用于对所述优先队列中的所述第一候选词向量进行去重。The deduplication module 206 is configured to deduplicate the first candidate word vector in the priority queue.
所述获取模块201还用于获取去重后的优先队列中排序在前第二预设个数的目标词向量。The acquiring module 201 is also used for acquiring the target word vectors of the second preset number in the prioritized queue after deduplication.
所述推送模块207用于基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。The pushing module 207 is configured to push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for selection by the user.
在本实施方式中,基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择的方法包括:获取所述第二预设个数的目标词向量对应的目标索引;根据所述词-索引文件查询与所述目标索引对应的词向量;推送所述词向量对应的近义词供用户选择。In this embodiment, the method of pushing a second preset number of synonymous words for the user to select based on the second preset number of target word vectors and word-index files includes: obtaining the second preset number of targets The target index corresponding to the word vector; query the word vector corresponding to the target index according to the word-index file; push the synonym corresponding to the word vector for the user to select.
在本实施方式中,将二叉树结构文件和词-索引文件一起保存,需要进行查询某个关键词的相近邻Top N的词汇的时候,只需要利用这两个文件进行索引即可。In this embodiment, the binary tree structure file and the word-index file are stored together, and when it is necessary to query the vocabulary of the neighbor Top N of a certain keyword, only these two files need to be used for indexing.
在本实施方式中,通过推送第二预设个数近义词供用户筛选,可以方便用户更全面的配置面试题目对应的答案的关键词。从而在求职者答题时,不会片面的根据求职者的答案给求职者评分。本申请支持的近义词查找功能更加创新和便利,一次性可生成5个关键词的近义词,并且每次推送8个近义词,支持用户点击“换一批”更换另外一轮的8个近义词,便于用户查看和使用。例如,在推送界面显示一个“换一批”按钮,在用户点击所述按钮后,可以更新原来的近义词,推送更多的近义词。In this embodiment, by pushing the second preset number of synonyms for the user to filter, it is convenient for the user to more comprehensively configure the keywords of the answers corresponding to the interview questions. Therefore, when the job seeker answers the question, the job seeker will not be scored unilaterally based on the answer of the job seeker. The synonym search function supported by this application is more innovative and convenient. It can generate synonymous words of 5 keywords at a time, and push 8 synonymous words at a time. It supports users to click "change batch" to replace 8 synonymous words in another round, which is convenient for users. View and use. For example, a "change batch" button is displayed on the push interface. After the user clicks the button, the original synonyms can be updated and more synonyms can be pushed.
优选地,由于很多词语并不是所述面试题目的答案,所以增加了预设规则筛选查询到的词汇,其中,所述预设规则包括以下规则中的至少一种:Preferably, since many words are not the answers to the interview questions, a preset rule is added to filter the queried vocabulary, wherein the preset rule includes at least one of the following rules:
(1)根据词语字数调整查询到的词汇的顺序。例如,优先返回与所述关键词字数一致的词汇。而对于与所述关键词字数不一致的词汇,每增加/减少1个字,则在将查询到的词汇进行排序时增加预设距离(如0.1)。(1) Adjust the order of the searched vocabulary according to the number of words. For example, priority is given to returning vocabulary consistent with the number of words in the keyword. For words that are inconsistent with the number of words of the keyword, for each increase/decrease of 1 word, the preset distance (for example, 0.1) is increased when sorting the queried words.
(2)按词汇的类型来筛选查询到的词汇,所述类型包括中文,英文和数字。例如,优先返回与所述关键词类型一致的词汇。另外对于输入中文,返回英文;或输入英文,返回中文的情况,正常返回。但对于输入中文,返回数字;或输入英文,返回数字的情况,则直接删除该近义词。需要说明的是,单个英文字母或单个中文字代表1个字符。(2) Filter the queried vocabulary according to the type of vocabulary, the type includes Chinese, English and numbers. For example, preferentially return vocabulary consistent with the keyword type. In addition, for inputting Chinese, return to English; or input English and return to Chinese, return normally. However, for Chinese input, numbers are returned; or English, numbers are returned, and the synonym is deleted directly. It should be noted that a single English letter or a single Chinese character represents 1 character.
(3)去除字数多于所述关键字的字数预设个数的词汇。例如,多于所述关键词的字数5 个字以上的词汇。(3) Removal of words with more words than the preset number of words of the keyword. For example, a vocabulary with more than 5 characters than the keyword.
可以理解的是,上述推送装置20同样可以用于推送同义词。It is understandable that the aforementioned pushing device 20 can also be used to push synonyms.
综上所述,本申请所述的推送装置20,包括获取模块201、配置模块202、训练模块203、构建模块204、遍历模块205、去重模块206及推送模块207。所述获取模块201用于获取面试题目;所述配置模块202用于配置第一预设个数与所述面试题目对应的答案的关键词;所述训练模块203用于基于超大词向量模型预先训练得到目标词向量模型;所述构建模块204用于根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;所述构建模块204还用于基于所述目标词向量模型中的所有词向量构建二叉树;所述遍历模块205用于遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;所述去重模块206用于对所述优先队列中的所述第一候选词向量进行去重;所述获取模块201还用于获取去重后的优先队列中排序在前第二预设个数的目标词向量;及所述推送模块207用于基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。In summary, the push device 20 described in this application includes an acquisition module 201, a configuration module 202, a training module 203, a construction module 204, a traversal module 205, a deduplication module 206, and a push module 207. The acquisition module 201 is used to obtain interview questions; the configuration module 202 is used to configure a first preset number of keywords corresponding to the answers to the interview questions; the training module 203 is used to pre-set based on the super-large word vector model The target word vector model is obtained through training; the construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index; The construction module 204 is also used to construct a binary tree based on all word vectors in the target word vector model; the traversal module 205 is used to traverse the binary tree, and query from the binary tree that the distance to the keyword is greater than A first candidate word vector with a preset distance threshold and construct a priority queue based on the first candidate word vector; the de-duplication module 206 is configured to de-duplicate the first candidate word vector in the priority queue; The acquiring module 201 is also configured to acquire the target word vectors of the second preset number in the prioritized queue after deduplication; and the pushing module 207 is configured to acquire the target word vectors of the second preset number Harmony-the index file pushes the second preset number of synonyms for users to choose.
本申请采用的词向量涵盖面广泛,表征词的向量维度200维,每个词的向量都能很好地反应出每个词的实际语义;本申请的词向量模型包括800万个词,很好地解决传统的匹配不到词汇(out-of-word)的问题。本申请采用的词向量模型内存占用大大减少,通过采样词-索引文件极大地降低的内存占用率,并且极大增加系统稳定性。另外,本申请查询返回的速度大大增加,原来一个词需要十几秒左右的查询时间,现在降低到0.01s以内返回。最后本申请可以为机器人面试过程中配置更多面试题目对应的答案的关键字的近义词。方便人力资源HR在对求职者进行面试时,为面试题目配置更加全面的答案。从而在接收到求职者针对面试题目的答案时,可以更准确地分析求职者的答案,方便人力资源给出对求职者的更全面的分析。The word vector used in this application covers a wide range, and the vector dimension of the characterizing word is 200 dimensions. The vector of each word can well reflect the actual semantics of each word; the word vector model of this application includes 8 million words, which is very A good solution to the traditional out-of-word problem. The word vector model used in this application greatly reduces the memory usage, greatly reduces the memory usage by sampling the word-index file, and greatly increases the system stability. In addition, the query return speed of this application has been greatly increased. It used to take a query time of about ten seconds for a word, but now it is reduced to less than 0.01s. Finally, this application can configure more synonymous words of keywords corresponding to the answers of the interview questions during the robot interview process. It is convenient for human resources HR to configure more comprehensive answers for interview questions when interviewing job applicants. Therefore, when the job applicant’s answer to the interview question is received, the job applicant’s answer can be analyzed more accurately, and it is convenient for human resources to give a more comprehensive analysis of the job applicant.
上述以软件功能模块的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,双屏设备,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分。The above-mentioned integrated unit implemented in the form of a software function module may be stored in a computer readable storage medium. The above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a dual-screen device, or a network device, etc.) or a processor to execute the various embodiments of this application. Part of the method.
图3为本申请实施例三提供的电子设备的示意图。FIG. 3 is a schematic diagram of the electronic device provided in the third embodiment of the application.
所述电子设备3包括:存储器31、至少一个处理器32、存储在所述存储器31中并可在所述至少一个处理器32上运行的计算机程序33、至少一条通讯总线34及数据库35。The electronic device 3 includes a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and running on the at least one processor 32, at least one communication bus 34 and a database 35.
所述至少一个处理器32执行所述计算机程序33时实现上述近义词推送方法实施例中的步骤。When the at least one processor 32 executes the computer program 33, the steps in the above-mentioned synonym push method embodiment are implemented.
示例性的,所述计算机程序33可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器31中,并由所述至少一个处理器32执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,所述计算机可读指令段用于描述所述计算机程序33在所述电子设备3中的执行过程。Exemplarily, the computer program 33 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 31 and executed by the at least one processor 32, To complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the computer-readable instruction segments are used to describe the execution process of the computer program 33 in the electronic device 3.
所述电子设备3可以是手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)等安装有应用程序的设备。本领域技术人员可以理解,所述示意图3仅仅是电子设备3的示例,并不构成对电子设备3的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备3还可以包括输入输出设备、网络接入设备、总线等。The electronic device 3 may be a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA) and other devices installed with applications. Those skilled in the art can understand that the schematic diagram 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3. For example, the electronic device 3 may also include input and output devices, network access devices, buses, and so on.
所述至少一个处理器32可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。所述处理器32可以是微处理器或者所述处理器32也可以是任何常规的处理器等,所述处理器32是所述电子设备3的控制中心,利用各种接口和线路连接整个电子设备3的各个部分。The at least one processor 32 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (ASICs). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The processor 32 may be a microprocessor, or the processor 32 may also be any conventional processor, etc. The processor 32 is the control center of the electronic device 3, and connects the entire electronic device with various interfaces and lines. Various parts of device 3.
所述存储器31可用于存储所述计算机程序33和/或模块/单元,所述处理器32通过运行 或执行存储在所述存储器31内的计算机程序和/或模块/单元,以及调用存储在存储器31内的数据,实现所述电子设备3的各种功能。所述存储器31可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备3的使用所创建的数据(比如音频数据等)等。此外,存储器31可以包括易失性存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、高速随机存取存储器,或其他存储器件。The memory 31 may be used to store the computer program 33 and/or modules/units. The processor 32 runs or executes the computer programs and/or modules/units stored in the memory 31, and calls the computer programs and/or modules/units stored in the memory 31. The data in 31 realizes various functions of the electronic device 3. The memory 31 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data (such as audio data, etc.) created according to the use of the electronic device 3 and the like are stored. In addition, the memory 31 may include a volatile memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card. , Flash Card, at least one magnetic disk storage device, flash memory device, high-speed random access memory, or other storage device.
所述存储器31中存储有程序代码,且所述至少一个处理器32可调用所述存储器31中存储的程序代码以执行相关的功能。例如,图2中所述的各个模块(获取模块201、配置模块202、训练模块203、构建模块204、遍历模块205、去重模块206及推送模块207)是存储在所述存储器31中的程序代码,并由所述至少一个处理器32所执行,从而实现所述各个模块的功能以达到近义词推送的目的。The memory 31 stores program codes, and the at least one processor 32 can call the program codes stored in the memory 31 to perform related functions. For example, the various modules (acquisition module 201, configuration module 202, training module 203, construction module 204, traversal module 205, deduplication module 206, and push module 207) described in FIG. 2 are programs stored in the memory 31 The code is executed by the at least one processor 32, so as to realize the functions of the various modules to achieve the purpose of pushing synonyms.
所述获取模块201用于获取面试题目;The obtaining module 201 is used to obtain interview questions;
所述配置模块202用于配置第一预设个数与所述面试题目对应的答案的关键词;The configuration module 202 is used to configure the first preset number of keywords corresponding to the answers of the interview questions;
所述训练模块203用于基于超大词向量模型预先训练得到目标词向量模型;The training module 203 is used for pre-training to obtain a target word vector model based on the super-large word vector model;
所述构建模块204用于根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;The construction module 204 is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
所述构建模块204还用于基于所述目标词向量模型中的所有词向量构建二叉树;The construction module 204 is further configured to construct a binary tree based on all word vectors in the target word vector model;
所述遍历模块205用于遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;The traversal module 205 is configured to traverse the binary tree, query the binary tree for the first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;
所述去重模块206用于对所述优先队列中的所述第一候选词向量进行去重;The deduplication module 206 is configured to deduplicate the first candidate word vector in the priority queue;
所述获取模块201还用于获取去重后的优先队列中排序在前第二预设个数的目标词向量;及The acquiring module 201 is also used to acquire the target word vectors of the second preset number in the prioritized queue after deduplication; and
所述推送模块207用于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。The pushing module 207 is configured to push the second preset number of target word vectors and word-index files to the second preset number of synonyms for selection by the user.
所述数据库(Database)35是按照数据结构来组织、存储和管理数据的建立在所述电子设备3上的仓库。数据库通常分为层次式数据库、网络式数据库和关系式数据库三种。在本实施方式中,所述数据库35用于存储面试题目等信息。The database (Database) 35 is a warehouse built on the electronic device 3 for organizing, storing and managing data according to a data structure. Databases are usually divided into three types: hierarchical database, network database and relational database. In this embodiment, the database 35 is used to store information such as interview questions.
所述电子设备3集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,所述计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器等。If the integrated module/unit of the electronic device 3 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the computer program includes computer-readable instruction code, and the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random access memory, etc.
在本申请所提供的几个实施例中,应所述理解到,所揭露的电子设备和方法,可以通过其它的方式实现。例如,以上所描述的电子设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed electronic device and method can be implemented in other ways. For example, the electronic device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other division methods in actual implementation.
另外,在本申请各个实施例中的各功能单元可以集成在相同处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一 点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any reference signs in the claims should not be regarded as limiting the claims involved. In addition, it is obvious that the word "including" does not exclude other elements or the singular number does not exclude the plural number. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Modifications or equivalent replacements are made without departing from the spirit and scope of the technical solution of the present application.

Claims (20)

  1. 一种近义词推送方法,其中,所述方法包括:A method for pushing synonyms, wherein the method includes:
    获取面试题目;Get interview questions;
    配置第一预设个数与所述面试题目对应的答案的关键词;Configure the first preset number of keywords for answers corresponding to the interview questions;
    基于超大词向量模型预先训练得到目标词向量模型;Pre-trained based on the super-large word vector model to obtain the target word vector model;
    根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
    基于所述目标词向量模型中的所有词向量构建二叉树;Constructing a binary tree based on all word vectors in the target word vector model;
    遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;
    对所述优先队列中的所述第一候选词向量进行去重;De-duplicate the first candidate word vector in the priority queue;
    获取去重后的优先队列中排序在前第二预设个数的目标词向量;Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;
    基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
  2. 如权利要求1所述的近义词推送方法,其中,所述配置第一预设个数与所述面试题目对应的答案的关键词的步骤包括:8. The method for pushing synonyms according to claim 1, wherein the step of configuring the first preset number of keywords of answers corresponding to the interview questions comprises:
    根据预先构建的题目解析模型分析所述面试题目得到对应的题目意图;Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions;
    根据所述题目意图和预先建立的知识库,确定所述面试题目对应的答案;及According to the purpose of the question and the pre-established knowledge base, determine the answer corresponding to the interview question; and
    根据所述对应的答案提取第一预设个数关键词。Extract the first preset number of keywords according to the corresponding answer.
  3. 如权利要求1所述的近义词推送方法,其中,所述基于超大词向量模型预先训练得到目标词向量模型的步骤包括:8. The method for pushing synonyms according to claim 1, wherein the step of obtaining a target word vector model based on the super-large word vector model pre-training comprises:
    扩充所述超大词向量模型中的机器人面试场景语料,其中,包括对所述机器人面试场景语料进行分词、去停用词及基于CBOW模式增量训练词向量操作;Expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode;
    根据扩充语料后的超大词向量模型训练得到目标词向量模型。The target word vector model is obtained by training the super-large word vector model after the expanded corpus.
  4. 如权利要求3所述的近义词推送方法,其中,根据所述目标词向量模型构建词向量矩阵得到词-索引文件的步骤包括:The method for pushing synonyms according to claim 3, wherein the step of constructing a word vector matrix according to the target word vector model to obtain a word-index file comprises:
    以每个词的维度为行数,以所述目标词向量模型中所有词的总数为列数构建词向量矩阵;Constructing a word vector matrix with the dimension of each word as the number of rows, and the total number of all words in the target word vector model as the number of columns;
    所述词向量矩阵中的每一行对应一个索引;Each row in the word vector matrix corresponds to an index;
    根据所述词向量矩阵构建词-索引文件,并输出所述词-索引文件。Construct a word-index file according to the word vector matrix, and output the word-index file.
  5. 如权利要求3所述的近义词推送方法,其中,所述基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择包括:The method for pushing synonyms according to claim 3, wherein said pushing a second preset number of synonyms based on said second preset number of target word vectors and word-index files for user selection comprises:
    获取所述第二预设个数的目标词向量对应的目标索引;Acquiring the target index corresponding to the second preset number of target word vectors;
    根据所述词-索引文件查询与所述目标索引对应的词向量;Query the word vector corresponding to the target index according to the word-index file;
    推送所述词向量对应的近义词供用户选择。Push the synonyms corresponding to the word vector for the user to choose.
  6. 如权利要求1所述的近义词推送方法,其中,所述从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列包括:The method for pushing synonyms according to claim 1, wherein the first candidate word vector whose distance to the keyword is greater than a preset distance threshold is found from the binary tree and constructed based on the first candidate word vector The priority queue includes:
    以所述关键词作为所述二叉树的根节点;Use the keyword as the root node of the binary tree;
    遍历所述根节点下的所有中间节点;Traverse all intermediate nodes under the root node;
    计算所述根节点与每一个中间节点之间的距离;Calculating the distance between the root node and each intermediate node;
    确定大于预设距离阈值的目标距离对应的中间节点为第一层目标节点;Determine that the intermediate node corresponding to the target distance greater than the preset distance threshold is the first-level target node;
    遍历所述第一层目标节点下的所有中间节点直至最后一层叶子节点;Traverse all intermediate nodes under the first-level target node until the last-level leaf node;
    将所有叶子节点中的词向量作为第一候选词向量;Use the word vectors in all leaf nodes as the first candidate word vector;
    计算所述第一候选词向量与所述关键词之间的相似度;Calculating the similarity between the first candidate word vector and the keyword;
    根据相似度的大小顺序将所述第一候选词向量插入优先队列中。The first candidate word vector is inserted into the priority queue according to the order of similarity.
  7. 如权利要求3所述的近义词推送方法,其中,所述方法还包括:根据预设规则筛选查 找到的第二预设个数近义词,其中,所述预设规则包括以下规则中的至少一种:The method for pushing synonyms according to claim 3, wherein the method further comprises: filtering the second preset number of synonyms found according to a preset rule, wherein the preset rule includes at least one of the following rules :
    根据词语字数调整查询到的第二预设个数近义词的顺序;Adjust the order of the second preset number of synonyms according to the number of words;
    按词汇的类型来筛选查询到的第二预设个数近义词;Filter the second preset number of synonyms according to the type of vocabulary;
    去除所述第二预设个数近义词中字数多于所述关键字的字数预设个数的词汇。The vocabulary of the second predetermined number of synonymous words with more characters than the predetermined number of characters of the keyword is removed.
  8. 一种近义词推送装置,其中,所述装置包括:A device for pushing synonyms, wherein the device includes:
    获取模块,用于获取面试题目;Acquisition module, used to acquire interview questions;
    配置模块,用于配置第一预设个数与所述面试题目对应的答案的关键词;The configuration module is used to configure the first preset number of keywords corresponding to the answers of the interview questions;
    训练模块,用于基于超大词向量模型预先训练得到目标词向量模型;The training module is used to pre-train the target word vector model based on the super-large word vector model;
    构建模块,用于根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;The construction module is configured to construct a word vector matrix according to the target word vector model to obtain a word-index file, wherein the word-index file includes the correspondence between the word vector and the index;
    所述构建模块,还用于基于所述目标词向量模型中的所有词向量构建二叉树;The construction module is also used to construct a binary tree based on all word vectors in the target word vector model;
    遍历模块,用于遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;A traversal module, configured to traverse the binary tree, query the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and construct a priority queue based on the first candidate word vector;
    去重模块,用于对所述优先队列中的所述第一候选词向量进行去重;A deduplication module, configured to deduplicate the first candidate word vector in the priority queue;
    所述获取模块,还用于获取去重后的优先队列中排序在前第二预设个数的目标词向量;及The acquiring module is also used to acquire the target word vectors of the second preset number in the deduplicated priority queue; and
    推送模块,用于基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。The push module is configured to push a second preset number of synonyms based on the second preset number of target word vectors and word-index files for user selection.
  9. 一种电子设备,其中,所述电子设备包括处理器,所述处理器用于执行存储器中存储的计算机可读指令以实现以下步骤:An electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
    获取面试题目;Get interview questions;
    配置第一预设个数与所述面试题目对应的答案的关键词;Configure the first preset number of keywords for answers corresponding to the interview questions;
    基于超大词向量模型预先训练得到目标词向量模型;Pre-trained based on the super-large word vector model to obtain the target word vector model;
    根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
    基于所述目标词向量模型中的所有词向量构建二叉树;Constructing a binary tree based on all word vectors in the target word vector model;
    遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;
    对所述优先队列中的所述第一候选词向量进行去重;De-duplicate the first candidate word vector in the priority queue;
    获取去重后的优先队列中排序在前第二预设个数的目标词向量;Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;
    基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
  10. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机可读指令以实现所述配置第一预设个数与所述面试题目对应的答案的关键词时,具体包括:9. The electronic device according to claim 9, wherein when the processor executes the computer-readable instructions to implement the configuration of the first preset number of keywords corresponding to the interview questions, specifically comprising:
    根据预先构建的题目解析模型分析所述面试题目得到对应的题目意图;Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions;
    根据所述题目意图和预先建立的知识库,确定所述面试题目对应的答案;及According to the purpose of the question and the pre-established knowledge base, determine the answer corresponding to the interview question; and
    根据所述对应的答案提取第一预设个数关键词。Extract the first preset number of keywords according to the corresponding answer.
  11. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机可读指令以实现所述基于超大词向量模型预先训练得到目标词向量模型的步骤时,具体包括:9. The electronic device according to claim 9, wherein when the processor executes the computer-readable instructions to implement the step of obtaining the target word vector model based on the super-large word vector model pre-training, it specifically comprises:
    扩充所述超大词向量模型中的机器人面试场景语料,其中,包括对所述机器人面试场景语料进行分词、去停用词及基于CBOW模式增量训练词向量操作;Expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode;
    根据扩充语料后的超大词向量模型训练得到目标词向量模型。The target word vector model is obtained by training the super-large word vector model after the expanded corpus.
  12. 如权利要求11所述的电子设备,其中,所述处理器执行所述计算机可读指令以实现所述根据所述目标词向量模型构建词向量矩阵得到词-索引文件时,具体包括:11. The electronic device of claim 11, wherein the processor executes the computer-readable instructions to implement the construction of a word vector matrix according to the target word vector model to obtain a word-index file, which specifically includes:
    以每个词的维度为行数,以所述目标词向量模型中所有词的总数为列数构建词向量矩阵;Constructing a word vector matrix with the dimension of each word as the number of rows, and the total number of all words in the target word vector model as the number of columns;
    所述词向量矩阵中的每一行对应一个索引;Each row in the word vector matrix corresponds to an index;
    根据所述词向量矩阵构建词-索引文件,并输出所述词-索引文件。Construct a word-index file according to the word vector matrix, and output the word-index file.
  13. 如权利要求11所述的电子设备,其中,所述处理器执行所述计算机可读指令以实现所述基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择时,具体包括:The electronic device according to claim 11, wherein the processor executes the computer-readable instructions to implement the push of the second preset number of target word vectors and word-index files based on the second preset number When the number of synonyms are for users to choose, it specifically includes:
    获取所述第二预设个数的目标词向量对应的目标索引;Acquiring the target index corresponding to the second preset number of target word vectors;
    根据所述词-索引文件查询与所述目标索引对应的词向量;Query the word vector corresponding to the target index according to the word-index file;
    推送所述词向量对应的近义词供用户选择。Push the synonyms corresponding to the word vector for the user to choose.
  14. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机可读指令以实现所述从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列时,包括:The electronic device of claim 9, wherein the processor executes the computer-readable instructions to implement the query from the binary tree for the first candidate whose distance to the keyword is greater than a preset distance threshold When constructing a priority queue based on the word vector and the first candidate word vector, it includes:
    以所述关键词作为所述二叉树的根节点;Use the keyword as the root node of the binary tree;
    遍历所述根节点下的所有中间节点;Traverse all intermediate nodes under the root node;
    计算所述根节点与每一个中间节点之间的距离;Calculating the distance between the root node and each intermediate node;
    确定大于预设距离阈值的目标距离对应的中间节点为第一层目标节点;Determine that the intermediate node corresponding to the target distance greater than the preset distance threshold is the first-level target node;
    遍历所述第一层目标节点下的所有中间节点直至最后一层叶子节点;Traverse all intermediate nodes under the first-level target node until the last-level leaf node;
    将所有叶子节点中的词向量作为第一候选词向量;Use the word vectors in all leaf nodes as the first candidate word vector;
    计算所述第一候选词向量与所述关键词之间的相似度;Calculating the similarity between the first candidate word vector and the keyword;
    根据相似度的大小顺序将所述第一候选词向量插入优先队列中。The first candidate word vector is inserted into the priority queue according to the order of similarity.
  15. 如权利要求11所述的电子设备,其中,所述处理器执行所述计算机可读指令还用以实现以下步骤:11. The electronic device of claim 11, wherein the processor executing the computer-readable instructions is further used to implement the following steps:
    根据预设规则筛选查找到的第二预设个数近义词,其中,所述预设规则包括以下规则中的至少一种:The second preset number of synonyms found are filtered according to preset rules, where the preset rules include at least one of the following rules:
    根据词语字数调整查询到的第二预设个数近义词的顺序;Adjust the order of the second preset number of synonyms according to the number of words;
    按词汇的类型来筛选查询到的第二预设个数近义词;Filter the second preset number of synonyms according to the type of vocabulary;
    去除所述第二预设个数近义词中字数多于所述关键字的字数预设个数的词汇。The vocabulary of the second predetermined number of synonymous words with more characters than the predetermined number of characters of the keyword is removed.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现以下步骤:A computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the following steps when executed by a processor:
    获取面试题目;Get interview questions;
    配置第一预设个数与所述面试题目对应的答案的关键词;Configure the first preset number of keywords for answers corresponding to the interview questions;
    基于超大词向量模型预先训练得到目标词向量模型;Pre-trained based on the super-large word vector model to obtain the target word vector model;
    根据所述目标词向量模型构建词向量矩阵得到词-索引文件,其中,所述词-索引文件包括词向量与索引之间的对应关系;Constructing a word vector matrix according to the target word vector model to obtain a word-index file, where the word-index file includes the correspondence between the word vector and the index;
    基于所述目标词向量模型中的所有词向量构建二叉树;Constructing a binary tree based on all word vectors in the target word vector model;
    遍历所述二叉树,从所述二叉树中查询出与所述关键词的距离大于预设距离阈值的第一候选词向量并基于所述第一候选词向量构建优先队列;Traversing the binary tree, querying the binary tree for a first candidate word vector whose distance to the keyword is greater than a preset distance threshold, and constructing a priority queue based on the first candidate word vector;
    对所述优先队列中的所述第一候选词向量进行去重;De-duplicate the first candidate word vector in the priority queue;
    获取去重后的优先队列中排序在前第二预设个数的目标词向量;Obtain the target word vectors of the second preset number in the prioritized queue after deduplication;
    基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择。Based on the second preset number of target word vectors and word-index files, a second preset number of synonyms are pushed for the user to select.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行以实现所述配置第一预设个数与所述面试题目对应的答案的关键词时,具体包括:The computer-readable storage medium of claim 16, wherein the computer-readable instructions are executed by the processor to implement the configuration of the first preset number of keywords corresponding to the interview questions , Specifically including:
    根据预先构建的题目解析模型分析所述面试题目得到对应的题目意图;Analyze the interview questions according to the pre-built question analysis model to obtain the corresponding question intentions;
    根据所述题目意图和预先建立的知识库,确定所述面试题目对应的答案;及According to the purpose of the question and the pre-established knowledge base, determine the answer corresponding to the interview question; and
    根据所述对应的答案提取第一预设个数关键词。Extract the first preset number of keywords according to the corresponding answer.
  18. 如权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被所述处 理器执行以实现所述基于超大词向量模型预先训练得到目标词向量模型时,具体包括:The computer-readable storage medium of claim 16, wherein the computer-readable instructions are executed by the processor to implement the pre-training based on the super-large word vector model to obtain the target word vector model, specifically comprising:
    扩充所述超大词向量模型中的机器人面试场景语料,其中,包括对所述机器人面试场景语料进行分词、去停用词及基于CBOW模式增量训练词向量操作;Expanding the robot interview scene corpus in the super-large word vector model, which includes segmenting the robot interview scene corpus, removing stop words, and incrementally training word vector operations based on the CBOW mode;
    根据扩充语料后的超大词向量模型训练得到目标词向量模型。The target word vector model is obtained by training the super-large word vector model after the expanded corpus.
  19. 如权利要求18所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行以实现根据所述目标词向量模型构建词向量矩阵得到词-索引文件时,具体包括:18. The computer-readable storage medium according to claim 18, wherein the computer-readable instructions are executed by the processor to implement the construction of a word vector matrix according to the target word vector model to obtain a word-index file, which specifically includes:
    以每个词的维度为行数,以所述目标词向量模型中所有词的总数为列数构建词向量矩阵;Constructing a word vector matrix with the dimension of each word as the number of rows, and the total number of all words in the target word vector model as the number of columns;
    所述词向量矩阵中的每一行对应一个索引;Each row in the word vector matrix corresponds to an index;
    根据所述词向量矩阵构建词-索引文件,并输出所述词-索引文件。Construct a word-index file according to the word vector matrix, and output the word-index file.
  20. 如权利要求18所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行以实现所述基于所述第二预设个数的目标词向量和词-索引文件推送第二预设个数近义词供用户选择时,具体包括:The computer-readable storage medium of claim 18, wherein the computer-readable instructions are executed by the processor to implement the push of the target word vector and word-index file based on the second preset number When the second preset number of synonyms is for the user to choose, it specifically includes:
    获取所述第二预设个数的目标词向量对应的目标索引;Acquiring the target index corresponding to the second preset number of target word vectors;
    根据所述词-索引文件查询与所述目标索引对应的词向量;Query the word vector corresponding to the target index according to the word-index file;
    推送所述词向量对应的近义词供用户选择。Push the synonyms corresponding to the word vector for the user to choose.
PCT/CN2020/111915 2020-03-02 2020-08-27 Near-synonym pushing method and apparatus, electronic device, and medium WO2021174783A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010136905.7A CN111460798B (en) 2020-03-02 2020-03-02 Method, device, electronic equipment and medium for pushing paraphrasing
CN202010136905.7 2020-03-02

Publications (1)

Publication Number Publication Date
WO2021174783A1 true WO2021174783A1 (en) 2021-09-10

Family

ID=71684962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111915 WO2021174783A1 (en) 2020-03-02 2020-08-27 Near-synonym pushing method and apparatus, electronic device, and medium

Country Status (2)

Country Link
CN (1) CN111460798B (en)
WO (1) WO2021174783A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792133A (en) * 2021-11-11 2021-12-14 北京世纪好未来教育科技有限公司 Question judging method and device, electronic equipment and medium
CN113806311A (en) * 2021-09-17 2021-12-17 平安普惠企业管理有限公司 Deep learning-based file classification method and device, electronic equipment and medium
CN114742042A (en) * 2022-03-22 2022-07-12 杭州未名信科科技有限公司 Text duplicate removal method and device, electronic equipment and storage medium
CN115168661A (en) * 2022-08-31 2022-10-11 深圳市一号互联科技有限公司 Native graph data processing method, device, equipment and storage medium
CN115630613A (en) * 2022-12-19 2023-01-20 长沙冉星信息科技有限公司 Automatic coding system and method for evaluation problems in questionnaire survey
CN118134609A (en) * 2024-05-06 2024-06-04 浙江开心果数智科技有限公司 Commodity retrieval ordering system and method based on artificial intelligence
CN118332011A (en) * 2024-06-13 2024-07-12 苏州元脑智能科技有限公司 Database data compression method, electronic device, storage medium, and program product

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460798B (en) * 2020-03-02 2024-10-18 平安科技(深圳)有限公司 Method, device, electronic equipment and medium for pushing paraphrasing
CN112434188B (en) * 2020-10-23 2023-09-05 杭州未名信科科技有限公司 Data integration method, device and storage medium of heterogeneous database
CN112232065B (en) * 2020-10-29 2024-05-14 腾讯科技(深圳)有限公司 Method and device for mining synonyms
CN114911895A (en) * 2021-02-08 2022-08-16 华为技术有限公司 Text generation method, device and storage medium
CN112906895B (en) * 2021-02-09 2022-12-06 柳州智视科技有限公司 Method for imitating question object
CN113095165A (en) * 2021-03-23 2021-07-09 北京理工大学深圳研究院 Simulation interview method and device for perfecting interview performance
CN113722452B (en) * 2021-07-16 2024-01-19 上海通办信息服务有限公司 Semantic-based rapid knowledge hit method and device in question-answering system
CN117112736B (en) * 2023-10-24 2024-01-05 云南瀚文科技有限公司 Information retrieval analysis method and system based on semantic analysis model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593206A (en) * 2009-06-25 2009-12-02 腾讯科技(深圳)有限公司 Searching method and device based on answer in the question and answer interaction platform
WO2018149326A1 (en) * 2017-02-16 2018-08-23 阿里巴巴集团控股有限公司 Natural language question answering method and apparatus, and server
CN109635094A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for generating answer
CN109947922A (en) * 2019-03-20 2019-06-28 浪潮商用机器有限公司 A kind of question and answer processing method, device and question answering system
CN111460798A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Method and device for pushing similar meaning words, electronic equipment and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220380A (en) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 Question and answer based on artificial intelligence recommend method, device and computer equipment
CN109902283B (en) * 2018-05-03 2023-06-06 华为技术有限公司 Information output method and device
CN109597988B (en) * 2018-10-31 2020-04-28 清华大学 Cross-language vocabulary semantic prediction method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593206A (en) * 2009-06-25 2009-12-02 腾讯科技(深圳)有限公司 Searching method and device based on answer in the question and answer interaction platform
WO2018149326A1 (en) * 2017-02-16 2018-08-23 阿里巴巴集团控股有限公司 Natural language question answering method and apparatus, and server
CN109635094A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for generating answer
CN109947922A (en) * 2019-03-20 2019-06-28 浪潮商用机器有限公司 A kind of question and answer processing method, device and question answering system
CN111460798A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Method and device for pushing similar meaning words, electronic equipment and medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806311A (en) * 2021-09-17 2021-12-17 平安普惠企业管理有限公司 Deep learning-based file classification method and device, electronic equipment and medium
CN113806311B (en) * 2021-09-17 2023-08-29 深圳市深可信科学技术有限公司 File classification method and device based on deep learning, electronic equipment and medium
CN113792133A (en) * 2021-11-11 2021-12-14 北京世纪好未来教育科技有限公司 Question judging method and device, electronic equipment and medium
CN113792133B (en) * 2021-11-11 2022-04-29 北京世纪好未来教育科技有限公司 Question judging method and device, electronic equipment and medium
CN114742042A (en) * 2022-03-22 2022-07-12 杭州未名信科科技有限公司 Text duplicate removal method and device, electronic equipment and storage medium
CN115168661A (en) * 2022-08-31 2022-10-11 深圳市一号互联科技有限公司 Native graph data processing method, device, equipment and storage medium
CN115630613A (en) * 2022-12-19 2023-01-20 长沙冉星信息科技有限公司 Automatic coding system and method for evaluation problems in questionnaire survey
CN115630613B (en) * 2022-12-19 2023-04-07 长沙冉星信息科技有限公司 Automatic coding system and method for evaluation problems in questionnaire survey
CN118134609A (en) * 2024-05-06 2024-06-04 浙江开心果数智科技有限公司 Commodity retrieval ordering system and method based on artificial intelligence
CN118332011A (en) * 2024-06-13 2024-07-12 苏州元脑智能科技有限公司 Database data compression method, electronic device, storage medium, and program product

Also Published As

Publication number Publication date
CN111460798A (en) 2020-07-28
CN111460798B (en) 2024-10-18

Similar Documents

Publication Publication Date Title
WO2021174783A1 (en) Near-synonym pushing method and apparatus, electronic device, and medium
CN109670163B (en) Information identification method, information recommendation method, template construction method and computing device
US10146862B2 (en) Context-based metadata generation and automatic annotation of electronic media in a computer network
CN117235226A (en) Question response method and device based on large language model
CN111581354A (en) FAQ question similarity calculation method and system
US20160078047A1 (en) Method for obtaining search suggestions from fuzzy score matching and population frequencies
CN111339277A (en) Question-answer interaction method and device based on machine learning
CN112559709A (en) Knowledge graph-based question and answer method, device, terminal and storage medium
US10073890B1 (en) Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm
CN114547253A (en) Semantic search method based on knowledge base application
CN108875743B (en) Text recognition method and device
TW202123026A (en) Data archiving method, device, computer device and storage medium
US11360953B2 (en) Techniques for database entries de-duplication
CN113641833A (en) Service requirement matching method and device
CN117076636A (en) Information query method, system and equipment for intelligent customer service
US20170124090A1 (en) Method of discovering and exploring feature knowledge
CN115982346A (en) Question-answer library construction method, terminal device and storage medium
CN113127617A (en) Knowledge question answering method of general domain knowledge graph, terminal equipment and storage medium
CN113076740A (en) Synonym mining method and device in government affair service field
CN109684357B (en) Information processing method and device, storage medium and terminal
CN117609468A (en) Method and device for generating search statement
CN117708270A (en) Enterprise data query method, device, equipment and storage medium
CN112989011B (en) Data query method, data query device and electronic equipment
CN113761213B (en) Knowledge graph-based data query system, method and terminal equipment
US9910890B2 (en) Synthetic events to chain queries against structured data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923243

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923243

Country of ref document: EP

Kind code of ref document: A1