CN110349568B

CN110349568B - Voice retrieval method, device, computer equipment and storage medium

Info

Publication number: CN110349568B
Application number: CN201910492599.8A
Authority: CN
Inventors: 黄锦伦; 陈磊
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2024-05-31
Anticipated expiration: 2039-06-06
Also published as: CN110349568A; WO2020244150A1

Abstract

The invention discloses a voice retrieval method, a voice retrieval device, computer equipment and a storage medium. The method comprises the following steps: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through an N-gram model to obtain a recognition result; word segmentation is carried out on the recognition result, and a sentence word segmentation result corresponding to the recognition result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result. The method adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result is more accurately obtained in a recommended corpus according to the noun part-of-speech keywords.

Description

Voice retrieval method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a speech retrieval method, a speech retrieval device, a computer device, and a storage medium.

Background

At present, the intelligent supermarket searches the commodity through voice recognition, and generally matches the commodity through fuzzy inquiry, so that the voice recognition result needs to be analyzed at this time, and the name of the commodity required to be purchased by the user is intelligently obtained. When used, users often speak the entire sentence, for example: i want to buy XXX, I want to eat XXX, etc., while current speech recognition systems cannot judge exactly their intent to purchase.

Disclosure of Invention

The embodiment of the invention provides a voice retrieval method, a voice retrieval device, computer equipment and a storage medium, and aims to solve the problem that in the prior art, a voice recognition system is low in voice recognition accuracy in a supermarket scene, so that a recognition result is inaccurate.

In a first aspect, an embodiment of the present invention provides a voice retrieval method, including:

receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model;

Receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;

word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained;

Performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and

Searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.

In a second aspect, an embodiment of the present invention provides a voice retrieval apparatus, including:

The model training unit is used for receiving a training set corpus and inputting the training set corpus into an initial N-gram model for training to obtain an N-gram model; wherein the N-gram model is an N-gram model;

The voice recognition unit is used for receiving the voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;

the word segmentation unit is used for segmenting the recognition result to obtain a sentence word segmentation result corresponding to the recognition result;

the part-of-speech analysis unit is used for performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and

And the retrieval unit is used for searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus so as to obtain a retrieval result.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the voice retrieval method according to the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the speech retrieval method according to the first aspect.

The embodiment of the invention provides a voice retrieval method, a voice retrieval device, computer equipment and a storage medium. The method comprises the steps of receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result. The method adopts a voice recognition technology, and realizes accurate acquisition of user requirements by performing lexical analysis on voice recognition results.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an application scenario of a voice retrieval method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a voice retrieval method according to an embodiment of the present invention;

FIG. 3 is a schematic sub-flowchart of a voice retrieval method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another sub-flowchart of a voice retrieval method according to an embodiment of the present invention;

FIG. 5 is a schematic block diagram of a voice retrieval apparatus provided by an embodiment of the present invention;

FIG. 6 is a schematic block diagram of a subunit of a speech retrieval apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic block diagram of another subunit of a speech retrieval apparatus according to an embodiment of the present invention;

Fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic application scenario diagram of a voice search method according to an embodiment of the present invention, and fig. 2 is a schematic flow diagram of a voice search method according to an embodiment of the present invention, where the voice search method is applied to a server, and the method is executed by application software installed in the server.

As shown in fig. 2, the method includes steps S110 to S150.

S110, receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model.

In this embodiment, the technical solution is described in terms of standing on a server. The server can receive the training set corpus and train to obtain an N-gram model, and the N-gram model is used for recognizing the voice to be recognized, which is uploaded to the server by the front-end voice acquisition terminal arranged in the intelligent supermarket.

In this embodiment, the training corpus is a hybrid of a general corpus and a consumer product corpus, and the consumer product corpus is a corpus including a large number of commodity names (such as commodity brands, commodity names, etc.); the universal corpus differs from the consumer corpus in that the vocabulary in the universal corpus is not biased towards a particular domain. And inputting the training set corpus into an initial N-gram model for training, and obtaining the N-gram model for voice recognition.

In one embodiment, as shown in fig. 3, step S110 includes:

s111, obtaining consumer product corpus, and inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;

S112, acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;

s113, fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.

In this embodiment, the consumer product corpus is a corpus including a large number of commodity names, and the generic corpus is different from the consumer product corpus in that the vocabulary in the generic corpus is not biased to a specific domain, but the vocabulary in each domain includes.

The N-gram model is a language model (LanguageModel, LM) which is a probabilistic based discriminant model whose input is a sentence (sequential sequence of words) and whose output is the probability of the sentence, i.e., the joint probability of the words (jointprobability).

Assuming that sentence T is composed of a lexical sequence w ₁,w₂,w₃...w_n, the N-Gram language model is formulated as follows:

P(T)＝P(w₁)*p(w₂)*p(w₃)*…*p(w_n)

＝p(w₁)*p(w₂|w₁)*p(w₃|w₁w₂)*…*p(w_n|w₁w₂w₃...)

The commonly used N-Gram models are Bi-Gram and Tri-Gram. The formulas are respectively as follows:

Bi-Gram:

P(T)＝p(w₁|begin)*p(w₂|w₁)*p(w₃|w₂)*…*p(w_n|w_n-1)

Tri-Gram:

P(T)＝p(w₁|begin₁,begin₂)*p(w₂|w₁,begin₁)*p(w₃|w₂w₁)*…*p(w_n|w_n-1,w_n-2);

It can be seen that the conditional probability for each word in sentence T to occur can be derived by counting in the corpus. The n-gram is as follows:

p(w_n|w₁w₂w₃...)＝C(w_i-n-1,…,w_i)/C(w_i-n-1,…,w_i-1);

Where C (w _i-n-1,…,w_i) represents the number of times the string w _i-n-1,…,w_i is in the corpus.

According to the set model fusion proportion, for example, the proportion of consumer product corpus to general corpus is set to be 2:8, the model fusion proportion of the first N-gram model and the second N-gram model is also set to be 2:8, and the first N-gram model and the second N-gram model are fused, so that the N-gram model for voice recognition is finally obtained. Because the ratio of the consumer product corpus to the general corpus is initially set, the accuracy of the voice recognition of the N-gram model obtained through final fusion in the intelligent supermarket scene is effectively improved.

In one embodiment, as shown in fig. 4, step S111 includes:

s1111, performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;

s1112, inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.

In this embodiment, the word segmentation process of each sentence in the consumer product corpus through the statistical word segmentation model based on probability is as follows:

For example, let c=c1c2..cm, C be the chinese character string to be split, let w=w1w2..wn, W be the result of the split, wa, wb, … …, wk be all possible split schemes of C. Then, based on the probability statistical word segmentation model, the target word string W can be found, so that W satisfies the following conditions: p (w|c) =max (P (wa|c), P (wb|c)..p (wk|c)), and the word string W obtained by the word segmentation model is the word string with the maximum estimated probability. Namely:

For a substring S of a word to be segmented, all candidate words w1, w2, …, wi, … and wn are taken out according to the sequence from left to right; the probability value P (wi) of each candidate word is found in the dictionary, and all left neighbor words of each candidate word are recorded; calculating the cumulative probability of each candidate word, and simultaneously comparing to obtain the optimal left neighbor word of each candidate word; if the current word wn is the tail word of the character string S and the cumulative probability P (wn) is the maximum, the wn is the end word of the S; starting from wn, outputting the best left neighbor word of each word in turn from right to left, namely the word segmentation result of S. And inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model, wherein the first N-gram model has higher statement identification accuracy in an intelligent supermarket scene.

Similarly, word segmentation is carried out on the general corpus based on a probability statistical word segmentation model, so that a second word segmentation result corresponding to the general corpus is obtained; and inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model, wherein the second N-gram model has higher sentence recognition accuracy rate (namely, the recognition rate of sentences which do not deviate from a certain life scene is higher) under the ordinary life scene.

S120, receiving the voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result.

When the voice to be recognized is recognized through the N-gram model, a whole sentence, such as 'I want to buy XX brand instant noodles', is obtained, the voice to be recognized can be effectively recognized through the N-gram model, and a sentence with the largest recognition probability is obtained as a recognition result.

S130, word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained.

In one embodiment, step S130 includes:

And word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.

In this embodiment, step S1111 may be referred to for a specific process of word segmentation using a statistical word segmentation model based on probability when the recognition result is segmented in step S130. And after the recognition result is segmented, part-of-speech analysis can be further performed.

And S140, performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result.

In one embodiment, step S140 includes:

And taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.

In this embodiment, the process of lexical analysis by the joint lexical analysis model is as follows:

the input of the lexical analysis task is a string (which is referred to hereinafter as "sentence"), and the output is word boundaries and parts of speech, entity class in the sentence. Sequence tagging is a classical modeling approach to lexical analysis. And (3) constructing a joint lexical analysis model (namely, a LAC model), learning features by using a network structure based on GRU (gate control loop unit), and accessing the learned features into a CRF decoding layer (CRF, namely, a conditional random field) to finish sequence labeling. The CRF decoding layer essentially changes a linear model in the traditional CRF into a nonlinear neural network, and the problem of label bias can be better solved based on the likelihood probability of sentence level.

The input of the combined lexical analysis model is represented in a one-hot mode, each word represents one-hot sequence in an id through a word list, and the one-hot sequence is converted into a word vector sequence represented by a real vector; the character vector sequence is used as the input of the bidirectional GRU, the characteristic representation of the input sequence is learned, and a new characteristic representation sequence is obtained, wherein two layers of bidirectional GRU are stacked to increase the learning capacity; the CRF takes the learned characteristics of the GRU as input and the marking sequence as a supervision signal to realize the part-of-speech marking of each word in the sentence word segmentation result. Under the scene of the intelligent supermarket, the probability that the name part-of-speech keyword is a commodity brand or commodity name is larger, so that the noun part-of-speech keyword corresponding to the sentence segmentation result is selected as a screening result, and commodity retrieval is further carried out.

And S150, searching the corpus with similarity exceeding a preset similarity threshold value with the noun keywords in a pre-stored recommended corpus to obtain a retrieval result.

In this embodiment, when a noun part-of-speech keyword is obtained, searching is performed on each noun part-of-speech keyword in a preset recommended corpus, so as to obtain a word with a higher similarity with the word part-of-speech keyword, and the word is used as a search result. Searching each noun part-of-speech keyword in a preset recommended corpus to obtain a Word with larger similarity to the Word part-of-speech keyword, specifically, obtaining a Word vector corresponding to the noun part-of-speech keyword according to a Word2Vec model (the Word2Vec model is an efficient tool for representing the Word as a real value vector), and then calculating the similarity to the Word vector corresponding to each corpus in the prestored recommended corpus, wherein the similarity between the two vectors is calculated by calculating the Euclidean distance between the two vectors. If the corpus with similarity exceeding the preset similarity threshold exists in the pre-stored recommended corpus, the corresponding corpus is used as one of the search results, namely a plurality of corpora which accord with the similarity exceeding the preset similarity threshold together form the search result.

The method adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result is more accurately obtained in a recommended corpus according to the noun part-of-speech keywords.

The embodiment of the invention also provides a voice retrieval device which is used for executing any embodiment of the voice retrieval method. Specifically, referring to fig. 5, fig. 5 is a schematic block diagram of a voice retrieval device according to an embodiment of the present invention. The voice retrieval apparatus 100 may be configured in a server.

As shown in fig. 5, the speech search device 100 includes a model training unit 110, a speech recognition unit 120, a word segmentation unit 130, a part-of-speech analysis unit 140, and a search unit 150.

The model training unit 110 is configured to receive a training set corpus, input the training set corpus into an initial N-gram model for training, and obtain an N-gram model; wherein the N-gram model is an N-gram model.

In one embodiment, as shown in fig. 6, the model training unit 110 includes:

The first training unit 111 is configured to obtain a consumer product corpus, input the consumer product corpus to a first initial N-gram model, and perform training to obtain a first N-gram model;

The second training unit 112 is configured to obtain a generic corpus, input the generic corpus to a second initial N-gram model, and perform training to obtain a second N-gram model;

And the model fusion unit 113 is configured to fuse the first N-gram model and the second N-gram model according to the set model fusion ratio, so as to obtain an N-gram model.

In this embodiment, the consumer product corpus is a corpus including a large number of commodity names, and the general corpus is different from the consumer product corpus in that the vocabulary in the general corpus is not biased to a specific field.

P(T)＝P(w₁)*p(w₂)*p(w₃)*…*p(w_n)

＝p(w₁)*p(w₂|w₁)*p(w₃|w₁w₂)*…*p(w_n|w₁w₂w₃...)

Bi-Gram:

P(T)＝p(w₁|begin)*p(w₂|w₁)*p(w₃|w₂)*…*p(w_n|w_n-1)

Tri-Gram:

p(w_n|w₁w₂w₃...)＝C(w_i-n-1,…,w_i)/C(w_i-n-1,…,w_i-1);

In one embodiment, as shown in fig. 7, the first training unit 111 includes:

the word segmentation unit 1111 is configured to segment the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;

The word segmentation training unit 1112 is configured to input the first word segmentation result to a first initial N-gram model for training, so as to obtain a first N-gram model.

The voice recognition unit 120 is configured to receive a voice to be recognized, and recognize the voice to be recognized through the N-gram model to obtain a recognition result.

And the recognition result word segmentation unit 130 is configured to segment the recognition result to obtain a sentence word segmentation result corresponding to the recognition result.

In an embodiment, the recognition result word segmentation unit 130 is further configured to:

In this embodiment, the word segmentation unit 1111 may refer to a specific process of word segmentation by using a statistical word segmentation model based on probability when the recognition result is segmented in the recognition result segmentation unit 130. And after the recognition result is segmented, part-of-speech analysis can be further performed.

The part-of-speech analysis unit 140 is configured to perform lexical analysis according to the sentence segmentation result, so as to obtain a noun part-of-speech keyword corresponding to the sentence segmentation result.

In an embodiment, the part-of-speech analysis unit 140 is further configured to:

The searching unit 150 is configured to search, in a pre-stored recommended corpus, a corpus with similarity to the noun keyword exceeding a preset similarity threshold, so as to obtain a search result.

The device adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result can be obtained more accurately in the recommended corpus according to the noun part-of-speech keywords.

The above-described speech retrieval means may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 8.

Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers.

With reference to FIG. 8, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a speech retrieval method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a speech retrieval method.

The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 8 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 500 to which the present inventive arrangements may be implemented, as a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to perform the following functions: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.

In one embodiment, the processor 502 performs the following operations when executing the step of receiving the training set corpus, inputting the training set corpus into the initial N-gram model for training, and obtaining the N-gram model: obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model; acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model; and fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.

In one embodiment, the processor 502 performs the following operations when performing the step of inputting the consumer product corpus into the first initial N-gram model for training to obtain the first N-gram model: performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus; and inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.

In one embodiment, when executing the step of word segmentation on the recognition result to obtain a sentence word segmentation result corresponding to the recognition result, the processor 502 executes the following operations: and word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.

In one embodiment, when executing the step of performing lexical analysis according to the sentence segmentation result to obtain the noun part-of-speech keyword corresponding to the sentence segmentation result, the processor 502 performs the following operations: and taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.

Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 8 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 8, and will not be described again.

It should be appreciated that in embodiments of the present invention, the processor 502 may be a Central processing unit (Central ProcessingUnit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor performs the steps of: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.

In an embodiment, the receiving a training set corpus, inputting the training set corpus into an initial N-gram model for training, and obtaining the N-gram model includes: obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model; acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model; and fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.

In an embodiment, the inputting the consumer corpus into the first initial N-gram model for training to obtain the first N-gram model includes: performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus; and inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.

In an embodiment, the word segmentation of the recognition result to obtain a sentence word segmentation result corresponding to the recognition result includes: and word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.

In an embodiment, the performing lexical analysis according to the sentence word segmentation result to obtain a noun part-of-speech keyword corresponding to the sentence word segmentation result includes: and taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method of speech retrieval, comprising:

Searching a corpus with similarity with the noun part-of-speech keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a search result; the recommended corpus comprises a plurality of corpora, and each corpus comprises one or more keywords with noun parts of speech;

wherein the training set corpus is a mixed corpus of general corpus and consumer goods corpus; the N-gram model is a probability-based discrimination model;

The receiving training set corpus, inputting the training set corpus into an initial N-gram model for training, and obtaining an N-gram model, comprising:

Obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;

acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;

According to the set model fusion proportion, fusing the first N-gram model and the second N-gram model to obtain an N-gram model;

Inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model, wherein the method comprises the following steps of:

performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;

inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model;

the step of inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model, which comprises the following steps:

performing word segmentation on the general corpus based on a probability statistical word segmentation model to obtain a second word segmentation result corresponding to the general corpus;

And inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model.

2. The method for voice search according to claim 1, wherein said word segmentation of the recognition result to obtain a sentence word segmentation result corresponding to the recognition result comprises:

3. The method for voice search according to claim 1, wherein said performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result comprises:

4. A voice retrieval apparatus, comprising:

The recognition result word segmentation unit is used for segmenting the recognition result to obtain a sentence word segmentation result corresponding to the recognition result;

The retrieval unit is used for searching the corpus with similarity with the noun part-of-speech keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result;

The model training unit includes:

the first training unit is used for obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;

The second training unit is used for acquiring general corpus, inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;

The model fusion unit is used for fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model;

The first training unit includes:

The word segmentation unit is used for segmenting the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;

The word segmentation training unit is used for inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model;

the second training unit includes:

5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the speech retrieval method of any one of claims 1 to 3 when the computer program is executed by the processor.

6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the speech retrieval method according to any one of claims 1 to 3.