CN110349568B - Voice retrieval method, device, computer equipment and storage medium - Google Patents
Voice retrieval method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110349568B CN110349568B CN201910492599.8A CN201910492599A CN110349568B CN 110349568 B CN110349568 B CN 110349568B CN 201910492599 A CN201910492599 A CN 201910492599A CN 110349568 B CN110349568 B CN 110349568B
- Authority
- CN
- China
- Prior art keywords
- corpus
- gram model
- model
- word segmentation
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000011218 segmentation Effects 0.000 claims abstract description 143
- 238000012549 training Methods 0.000 claims abstract description 92
- 238000004458 analytical method Methods 0.000 claims abstract description 41
- 238000004590 computer program Methods 0.000 claims description 14
- 230000004927 fusion Effects 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 abstract description 5
- 239000013598 vector Substances 0.000 description 16
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 5
- 230000002457 bidirectional effect Effects 0.000 description 4
- 230000001186 cumulative effect Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 235000008446 instant noodles Nutrition 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Acoustics & Sound (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a voice retrieval method, a voice retrieval device, computer equipment and a storage medium. The method comprises the following steps: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through an N-gram model to obtain a recognition result; word segmentation is carried out on the recognition result, and a sentence word segmentation result corresponding to the recognition result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result. The method adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result is more accurately obtained in a recommended corpus according to the noun part-of-speech keywords.
Description
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech retrieval method, a speech retrieval device, a computer device, and a storage medium.
Background
At present, the intelligent supermarket searches the commodity through voice recognition, and generally matches the commodity through fuzzy inquiry, so that the voice recognition result needs to be analyzed at this time, and the name of the commodity required to be purchased by the user is intelligently obtained. When used, users often speak the entire sentence, for example: i want to buy XXX, I want to eat XXX, etc., while current speech recognition systems cannot judge exactly their intent to purchase.
Disclosure of Invention
The embodiment of the invention provides a voice retrieval method, a voice retrieval device, computer equipment and a storage medium, and aims to solve the problem that in the prior art, a voice recognition system is low in voice recognition accuracy in a supermarket scene, so that a recognition result is inaccurate.
In a first aspect, an embodiment of the present invention provides a voice retrieval method, including:
receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model;
Receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;
word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained;
Performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and
Searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.
In a second aspect, an embodiment of the present invention provides a voice retrieval apparatus, including:
The model training unit is used for receiving a training set corpus and inputting the training set corpus into an initial N-gram model for training to obtain an N-gram model; wherein the N-gram model is an N-gram model;
The voice recognition unit is used for receiving the voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;
the word segmentation unit is used for segmenting the recognition result to obtain a sentence word segmentation result corresponding to the recognition result;
the part-of-speech analysis unit is used for performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and
And the retrieval unit is used for searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus so as to obtain a retrieval result.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the voice retrieval method according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the speech retrieval method according to the first aspect.
The embodiment of the invention provides a voice retrieval method, a voice retrieval device, computer equipment and a storage medium. The method comprises the steps of receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result. The method adopts a voice recognition technology, and realizes accurate acquisition of user requirements by performing lexical analysis on voice recognition results.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application scenario of a voice retrieval method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a voice retrieval method according to an embodiment of the present invention;
FIG. 3 is a schematic sub-flowchart of a voice retrieval method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another sub-flowchart of a voice retrieval method according to an embodiment of the present invention;
FIG. 5 is a schematic block diagram of a voice retrieval apparatus provided by an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a subunit of a speech retrieval apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of another subunit of a speech retrieval apparatus according to an embodiment of the present invention;
Fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic application scenario diagram of a voice search method according to an embodiment of the present invention, and fig. 2 is a schematic flow diagram of a voice search method according to an embodiment of the present invention, where the voice search method is applied to a server, and the method is executed by application software installed in the server.
As shown in fig. 2, the method includes steps S110 to S150.
S110, receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model.
In this embodiment, the technical solution is described in terms of standing on a server. The server can receive the training set corpus and train to obtain an N-gram model, and the N-gram model is used for recognizing the voice to be recognized, which is uploaded to the server by the front-end voice acquisition terminal arranged in the intelligent supermarket.
In this embodiment, the training corpus is a hybrid of a general corpus and a consumer product corpus, and the consumer product corpus is a corpus including a large number of commodity names (such as commodity brands, commodity names, etc.); the universal corpus differs from the consumer corpus in that the vocabulary in the universal corpus is not biased towards a particular domain. And inputting the training set corpus into an initial N-gram model for training, and obtaining the N-gram model for voice recognition.
In one embodiment, as shown in fig. 3, step S110 includes:
s111, obtaining consumer product corpus, and inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;
S112, acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;
s113, fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.
In this embodiment, the consumer product corpus is a corpus including a large number of commodity names, and the generic corpus is different from the consumer product corpus in that the vocabulary in the generic corpus is not biased to a specific domain, but the vocabulary in each domain includes.
The N-gram model is a language model (LanguageModel, LM) which is a probabilistic based discriminant model whose input is a sentence (sequential sequence of words) and whose output is the probability of the sentence, i.e., the joint probability of the words (jointprobability).
Assuming that sentence T is composed of a lexical sequence w 1,w2,w3...wn, the N-Gram language model is formulated as follows:
P(T)=P(w1)*p(w2)*p(w3)*…*p(wn)
=p(w1)*p(w2|w1)*p(w3|w1w2)*…*p(wn|w1w2w3...)
The commonly used N-Gram models are Bi-Gram and Tri-Gram. The formulas are respectively as follows:
Bi-Gram:
P(T)=p(w1|begin)*p(w2|w1)*p(w3|w2)*…*p(wn|wn-1)
Tri-Gram:
P(T)=p(w1|begin1,begin2)*p(w2|w1,begin1)*p(w3|w2w1)*…*p(wn|wn-1,wn-2);
It can be seen that the conditional probability for each word in sentence T to occur can be derived by counting in the corpus. The n-gram is as follows:
p(wn|w1w2w3...)=C(wi-n-1,…,wi)/C(wi-n-1,…,wi-1);
Where C (w i-n-1,…,wi) represents the number of times the string w i-n-1,…,wi is in the corpus.
According to the set model fusion proportion, for example, the proportion of consumer product corpus to general corpus is set to be 2:8, the model fusion proportion of the first N-gram model and the second N-gram model is also set to be 2:8, and the first N-gram model and the second N-gram model are fused, so that the N-gram model for voice recognition is finally obtained. Because the ratio of the consumer product corpus to the general corpus is initially set, the accuracy of the voice recognition of the N-gram model obtained through final fusion in the intelligent supermarket scene is effectively improved.
In one embodiment, as shown in fig. 4, step S111 includes:
s1111, performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
s1112, inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.
In this embodiment, the word segmentation process of each sentence in the consumer product corpus through the statistical word segmentation model based on probability is as follows:
For example, let c=c1c2..cm, C be the chinese character string to be split, let w=w1w2..wn, W be the result of the split, wa, wb, … …, wk be all possible split schemes of C. Then, based on the probability statistical word segmentation model, the target word string W can be found, so that W satisfies the following conditions: p (w|c) =max (P (wa|c), P (wb|c)..p (wk|c)), and the word string W obtained by the word segmentation model is the word string with the maximum estimated probability. Namely:
For a substring S of a word to be segmented, all candidate words w1, w2, …, wi, … and wn are taken out according to the sequence from left to right; the probability value P (wi) of each candidate word is found in the dictionary, and all left neighbor words of each candidate word are recorded; calculating the cumulative probability of each candidate word, and simultaneously comparing to obtain the optimal left neighbor word of each candidate word; if the current word wn is the tail word of the character string S and the cumulative probability P (wn) is the maximum, the wn is the end word of the S; starting from wn, outputting the best left neighbor word of each word in turn from right to left, namely the word segmentation result of S. And inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model, wherein the first N-gram model has higher statement identification accuracy in an intelligent supermarket scene.
Similarly, word segmentation is carried out on the general corpus based on a probability statistical word segmentation model, so that a second word segmentation result corresponding to the general corpus is obtained; and inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model, wherein the second N-gram model has higher sentence recognition accuracy rate (namely, the recognition rate of sentences which do not deviate from a certain life scene is higher) under the ordinary life scene.
S120, receiving the voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result.
When the voice to be recognized is recognized through the N-gram model, a whole sentence, such as 'I want to buy XX brand instant noodles', is obtained, the voice to be recognized can be effectively recognized through the N-gram model, and a sentence with the largest recognition probability is obtained as a recognition result.
S130, word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained.
In one embodiment, step S130 includes:
And word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
In this embodiment, step S1111 may be referred to for a specific process of word segmentation using a statistical word segmentation model based on probability when the recognition result is segmented in step S130. And after the recognition result is segmented, part-of-speech analysis can be further performed.
And S140, performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result.
In one embodiment, step S140 includes:
And taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
In this embodiment, the process of lexical analysis by the joint lexical analysis model is as follows:
the input of the lexical analysis task is a string (which is referred to hereinafter as "sentence"), and the output is word boundaries and parts of speech, entity class in the sentence. Sequence tagging is a classical modeling approach to lexical analysis. And (3) constructing a joint lexical analysis model (namely, a LAC model), learning features by using a network structure based on GRU (gate control loop unit), and accessing the learned features into a CRF decoding layer (CRF, namely, a conditional random field) to finish sequence labeling. The CRF decoding layer essentially changes a linear model in the traditional CRF into a nonlinear neural network, and the problem of label bias can be better solved based on the likelihood probability of sentence level.
The input of the combined lexical analysis model is represented in a one-hot mode, each word represents one-hot sequence in an id through a word list, and the one-hot sequence is converted into a word vector sequence represented by a real vector; the character vector sequence is used as the input of the bidirectional GRU, the characteristic representation of the input sequence is learned, and a new characteristic representation sequence is obtained, wherein two layers of bidirectional GRU are stacked to increase the learning capacity; the CRF takes the learned characteristics of the GRU as input and the marking sequence as a supervision signal to realize the part-of-speech marking of each word in the sentence word segmentation result. Under the scene of the intelligent supermarket, the probability that the name part-of-speech keyword is a commodity brand or commodity name is larger, so that the noun part-of-speech keyword corresponding to the sentence segmentation result is selected as a screening result, and commodity retrieval is further carried out.
And S150, searching the corpus with similarity exceeding a preset similarity threshold value with the noun keywords in a pre-stored recommended corpus to obtain a retrieval result.
In this embodiment, when a noun part-of-speech keyword is obtained, searching is performed on each noun part-of-speech keyword in a preset recommended corpus, so as to obtain a word with a higher similarity with the word part-of-speech keyword, and the word is used as a search result. Searching each noun part-of-speech keyword in a preset recommended corpus to obtain a Word with larger similarity to the Word part-of-speech keyword, specifically, obtaining a Word vector corresponding to the noun part-of-speech keyword according to a Word2Vec model (the Word2Vec model is an efficient tool for representing the Word as a real value vector), and then calculating the similarity to the Word vector corresponding to each corpus in the prestored recommended corpus, wherein the similarity between the two vectors is calculated by calculating the Euclidean distance between the two vectors. If the corpus with similarity exceeding the preset similarity threshold exists in the pre-stored recommended corpus, the corresponding corpus is used as one of the search results, namely a plurality of corpora which accord with the similarity exceeding the preset similarity threshold together form the search result.
The method adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result is more accurately obtained in a recommended corpus according to the noun part-of-speech keywords.
The embodiment of the invention also provides a voice retrieval device which is used for executing any embodiment of the voice retrieval method. Specifically, referring to fig. 5, fig. 5 is a schematic block diagram of a voice retrieval device according to an embodiment of the present invention. The voice retrieval apparatus 100 may be configured in a server.
As shown in fig. 5, the speech search device 100 includes a model training unit 110, a speech recognition unit 120, a word segmentation unit 130, a part-of-speech analysis unit 140, and a search unit 150.
The model training unit 110 is configured to receive a training set corpus, input the training set corpus into an initial N-gram model for training, and obtain an N-gram model; wherein the N-gram model is an N-gram model.
In this embodiment, the technical solution is described in terms of standing on a server. The server can receive the training set corpus and train to obtain an N-gram model, and the N-gram model is used for recognizing the voice to be recognized, which is uploaded to the server by the front-end voice acquisition terminal arranged in the intelligent supermarket.
In this embodiment, the training corpus is a hybrid of a general corpus and a consumer product corpus, and the consumer product corpus is a corpus including a large number of commodity names (such as commodity brands, commodity names, etc.); the universal corpus differs from the consumer corpus in that the vocabulary in the universal corpus is not biased towards a particular domain. And inputting the training set corpus into an initial N-gram model for training, and obtaining the N-gram model for voice recognition.
In one embodiment, as shown in fig. 6, the model training unit 110 includes:
The first training unit 111 is configured to obtain a consumer product corpus, input the consumer product corpus to a first initial N-gram model, and perform training to obtain a first N-gram model;
The second training unit 112 is configured to obtain a generic corpus, input the generic corpus to a second initial N-gram model, and perform training to obtain a second N-gram model;
And the model fusion unit 113 is configured to fuse the first N-gram model and the second N-gram model according to the set model fusion ratio, so as to obtain an N-gram model.
In this embodiment, the consumer product corpus is a corpus including a large number of commodity names, and the general corpus is different from the consumer product corpus in that the vocabulary in the general corpus is not biased to a specific field.
The N-gram model is a language model (LanguageModel, LM) which is a probabilistic based discriminant model whose input is a sentence (sequential sequence of words) and whose output is the probability of the sentence, i.e., the joint probability of the words (jointprobability).
Assuming that sentence T is composed of a lexical sequence w 1,w2,w3...wn, the N-Gram language model is formulated as follows:
P(T)=P(w1)*p(w2)*p(w3)*…*p(wn)
=p(w1)*p(w2|w1)*p(w3|w1w2)*…*p(wn|w1w2w3...)
The commonly used N-Gram models are Bi-Gram and Tri-Gram. The formulas are respectively as follows:
Bi-Gram:
P(T)=p(w1|begin)*p(w2|w1)*p(w3|w2)*…*p(wn|wn-1)
Tri-Gram:
P(T)=p(w1|begin1,begin2)*p(w2|w1,begin1)*p(w3|w2w1)*…*p(wn|wn-1,wn-2);
It can be seen that the conditional probability for each word in sentence T to occur can be derived by counting in the corpus. The n-gram is as follows:
p(wn|w1w2w3...)=C(wi-n-1,…,wi)/C(wi-n-1,…,wi-1);
Where C (w i-n-1,…,wi) represents the number of times the string w i-n-1,…,wi is in the corpus.
According to the set model fusion proportion, for example, the proportion of consumer product corpus to general corpus is set to be 2:8, the model fusion proportion of the first N-gram model and the second N-gram model is also set to be 2:8, and the first N-gram model and the second N-gram model are fused, so that the N-gram model for voice recognition is finally obtained. Because the ratio of the consumer product corpus to the general corpus is initially set, the accuracy of the voice recognition of the N-gram model obtained through final fusion in the intelligent supermarket scene is effectively improved.
In one embodiment, as shown in fig. 7, the first training unit 111 includes:
the word segmentation unit 1111 is configured to segment the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
The word segmentation training unit 1112 is configured to input the first word segmentation result to a first initial N-gram model for training, so as to obtain a first N-gram model.
In this embodiment, the word segmentation process of each sentence in the consumer product corpus through the statistical word segmentation model based on probability is as follows:
For example, let c=c1c2..cm, C be the chinese character string to be split, let w=w1w2..wn, W be the result of the split, wa, wb, … …, wk be all possible split schemes of C. Then, based on the probability statistical word segmentation model, the target word string W can be found, so that W satisfies the following conditions: p (w|c) =max (P (wa|c), P (wb|c)..p (wk|c)), and the word string W obtained by the word segmentation model is the word string with the maximum estimated probability. Namely:
For a substring S of a word to be segmented, all candidate words w1, w2, …, wi, … and wn are taken out according to the sequence from left to right; the probability value P (wi) of each candidate word is found in the dictionary, and all left neighbor words of each candidate word are recorded; calculating the cumulative probability of each candidate word, and simultaneously comparing to obtain the optimal left neighbor word of each candidate word; if the current word wn is the tail word of the character string S and the cumulative probability P (wn) is the maximum, the wn is the end word of the S; starting from wn, outputting the best left neighbor word of each word in turn from right to left, namely the word segmentation result of S. And inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model, wherein the first N-gram model has higher statement identification accuracy in an intelligent supermarket scene.
Similarly, word segmentation is carried out on the general corpus based on a probability statistical word segmentation model, so that a second word segmentation result corresponding to the general corpus is obtained; and inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model, wherein the second N-gram model has higher sentence recognition accuracy rate (namely, the recognition rate of sentences which do not deviate from a certain life scene is higher) under the ordinary life scene.
The voice recognition unit 120 is configured to receive a voice to be recognized, and recognize the voice to be recognized through the N-gram model to obtain a recognition result.
When the voice to be recognized is recognized through the N-gram model, a whole sentence, such as 'I want to buy XX brand instant noodles', is obtained, the voice to be recognized can be effectively recognized through the N-gram model, and a sentence with the largest recognition probability is obtained as a recognition result.
And the recognition result word segmentation unit 130 is configured to segment the recognition result to obtain a sentence word segmentation result corresponding to the recognition result.
In an embodiment, the recognition result word segmentation unit 130 is further configured to:
And word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
In this embodiment, the word segmentation unit 1111 may refer to a specific process of word segmentation by using a statistical word segmentation model based on probability when the recognition result is segmented in the recognition result segmentation unit 130. And after the recognition result is segmented, part-of-speech analysis can be further performed.
The part-of-speech analysis unit 140 is configured to perform lexical analysis according to the sentence segmentation result, so as to obtain a noun part-of-speech keyword corresponding to the sentence segmentation result.
In an embodiment, the part-of-speech analysis unit 140 is further configured to:
And taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
In this embodiment, the process of lexical analysis by the joint lexical analysis model is as follows:
the input of the lexical analysis task is a string (which is referred to hereinafter as "sentence"), and the output is word boundaries and parts of speech, entity class in the sentence. Sequence tagging is a classical modeling approach to lexical analysis. And (3) constructing a joint lexical analysis model (namely, a LAC model), learning features by using a network structure based on GRU (gate control loop unit), and accessing the learned features into a CRF decoding layer (CRF, namely, a conditional random field) to finish sequence labeling. The CRF decoding layer essentially changes a linear model in the traditional CRF into a nonlinear neural network, and the problem of label bias can be better solved based on the likelihood probability of sentence level.
The input of the combined lexical analysis model is represented in a one-hot mode, each word represents one-hot sequence in an id through a word list, and the one-hot sequence is converted into a word vector sequence represented by a real vector; the character vector sequence is used as the input of the bidirectional GRU, the characteristic representation of the input sequence is learned, and a new characteristic representation sequence is obtained, wherein two layers of bidirectional GRU are stacked to increase the learning capacity; the CRF takes the learned characteristics of the GRU as input and the marking sequence as a supervision signal to realize the part-of-speech marking of each word in the sentence word segmentation result. Under the scene of the intelligent supermarket, the probability that the name part-of-speech keyword is a commodity brand or commodity name is larger, so that the noun part-of-speech keyword corresponding to the sentence segmentation result is selected as a screening result, and commodity retrieval is further carried out.
The searching unit 150 is configured to search, in a pre-stored recommended corpus, a corpus with similarity to the noun keyword exceeding a preset similarity threshold, so as to obtain a search result.
In this embodiment, when a noun part-of-speech keyword is obtained, searching is performed on each noun part-of-speech keyword in a preset recommended corpus, so as to obtain a word with a higher similarity with the word part-of-speech keyword, and the word is used as a search result. Searching each noun part-of-speech keyword in a preset recommended corpus to obtain a Word with larger similarity to the Word part-of-speech keyword, specifically, obtaining a Word vector corresponding to the noun part-of-speech keyword according to a Word2Vec model (the Word2Vec model is an efficient tool for representing the Word as a real value vector), and then calculating the similarity to the Word vector corresponding to each corpus in the prestored recommended corpus, wherein the similarity between the two vectors is calculated by calculating the Euclidean distance between the two vectors. If the corpus with similarity exceeding the preset similarity threshold exists in the pre-stored recommended corpus, the corresponding corpus is used as one of the search results, namely a plurality of corpora which accord with the similarity exceeding the preset similarity threshold together form the search result.
The device adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result can be obtained more accurately in the recommended corpus according to the noun part-of-speech keywords.
The above-described speech retrieval means may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 8.
Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers.
With reference to FIG. 8, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a speech retrieval method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a speech retrieval method.
The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 8 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 500 to which the present inventive arrangements may be implemented, as a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to perform the following functions: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.
In one embodiment, the processor 502 performs the following operations when executing the step of receiving the training set corpus, inputting the training set corpus into the initial N-gram model for training, and obtaining the N-gram model: obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model; acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model; and fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.
In one embodiment, the processor 502 performs the following operations when performing the step of inputting the consumer product corpus into the first initial N-gram model for training to obtain the first N-gram model: performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus; and inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.
In one embodiment, when executing the step of word segmentation on the recognition result to obtain a sentence word segmentation result corresponding to the recognition result, the processor 502 executes the following operations: and word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
In one embodiment, when executing the step of performing lexical analysis according to the sentence segmentation result to obtain the noun part-of-speech keyword corresponding to the sentence segmentation result, the processor 502 performs the following operations: and taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 8 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 8, and will not be described again.
It should be appreciated that in embodiments of the present invention, the processor 502 may be a Central processing unit (Central ProcessingUnit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor performs the steps of: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.
In an embodiment, the receiving a training set corpus, inputting the training set corpus into an initial N-gram model for training, and obtaining the N-gram model includes: obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model; acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model; and fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.
In an embodiment, the inputting the consumer corpus into the first initial N-gram model for training to obtain the first N-gram model includes: performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus; and inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.
In an embodiment, the word segmentation of the recognition result to obtain a sentence word segmentation result corresponding to the recognition result includes: and word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
In an embodiment, the performing lexical analysis according to the sentence word segmentation result to obtain a noun part-of-speech keyword corresponding to the sentence word segmentation result includes: and taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.
Claims (6)
1. A method of speech retrieval, comprising:
receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model;
Receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;
word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained;
Performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and
Searching a corpus with similarity with the noun part-of-speech keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a search result; the recommended corpus comprises a plurality of corpora, and each corpus comprises one or more keywords with noun parts of speech;
wherein the training set corpus is a mixed corpus of general corpus and consumer goods corpus; the N-gram model is a probability-based discrimination model;
The receiving training set corpus, inputting the training set corpus into an initial N-gram model for training, and obtaining an N-gram model, comprising:
Obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;
acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;
According to the set model fusion proportion, fusing the first N-gram model and the second N-gram model to obtain an N-gram model;
Inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model, wherein the method comprises the following steps of:
performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model;
the step of inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model, which comprises the following steps:
performing word segmentation on the general corpus based on a probability statistical word segmentation model to obtain a second word segmentation result corresponding to the general corpus;
And inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model.
2. The method for voice search according to claim 1, wherein said word segmentation of the recognition result to obtain a sentence word segmentation result corresponding to the recognition result comprises:
And word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
3. The method for voice search according to claim 1, wherein said performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result comprises:
And taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
4. A voice retrieval apparatus, comprising:
The model training unit is used for receiving a training set corpus and inputting the training set corpus into an initial N-gram model for training to obtain an N-gram model; wherein the N-gram model is an N-gram model;
The voice recognition unit is used for receiving the voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;
The recognition result word segmentation unit is used for segmenting the recognition result to obtain a sentence word segmentation result corresponding to the recognition result;
the part-of-speech analysis unit is used for performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and
The retrieval unit is used for searching the corpus with similarity with the noun part-of-speech keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result;
wherein the training set corpus is a mixed corpus of general corpus and consumer goods corpus; the N-gram model is a probability-based discrimination model;
The model training unit includes:
the first training unit is used for obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;
The second training unit is used for acquiring general corpus, inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;
The model fusion unit is used for fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model;
The first training unit includes:
The word segmentation unit is used for segmenting the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
The word segmentation training unit is used for inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model;
the second training unit includes:
performing word segmentation on the general corpus based on a probability statistical word segmentation model to obtain a second word segmentation result corresponding to the general corpus;
And inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the speech retrieval method of any one of claims 1 to 3 when the computer program is executed by the processor.
6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the speech retrieval method according to any one of claims 1 to 3.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910492599.8A CN110349568B (en) | 2019-06-06 | 2019-06-06 | Voice retrieval method, device, computer equipment and storage medium |
PCT/CN2019/117872 WO2020244150A1 (en) | 2019-06-06 | 2019-11-13 | Speech retrieval method and apparatus, computer device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910492599.8A CN110349568B (en) | 2019-06-06 | 2019-06-06 | Voice retrieval method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110349568A CN110349568A (en) | 2019-10-18 |
CN110349568B true CN110349568B (en) | 2024-05-31 |
Family
ID=68181598
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910492599.8A Active CN110349568B (en) | 2019-06-06 | 2019-06-06 | Voice retrieval method, device, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110349568B (en) |
WO (1) | WO2020244150A1 (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110349568B (en) * | 2019-06-06 | 2024-05-31 | 平安科技(深圳)有限公司 | Voice retrieval method, device, computer equipment and storage medium |
CN110825844A (en) * | 2019-10-21 | 2020-02-21 | 拉扎斯网络科技(上海)有限公司 | Voice retrieval method and device, readable storage medium and electronic equipment |
CN111291195B (en) * | 2020-01-21 | 2021-08-10 | 腾讯科技(深圳)有限公司 | Data processing method, device, terminal and readable storage medium |
CN111460257B (en) * | 2020-03-27 | 2023-10-31 | 北京百度网讯科技有限公司 | Thematic generation method, apparatus, electronic device and storage medium |
CN113642329B (en) * | 2020-04-27 | 2024-10-29 | 阿里巴巴集团控股有限公司 | Method and device for establishing term identification model, and method and device for term identification |
CN113569128A (en) * | 2020-04-29 | 2021-10-29 | 北京金山云网络技术有限公司 | Data retrieval method and device and electronic equipment |
CN111862970A (en) * | 2020-06-05 | 2020-10-30 | 珠海高凌信息科技股份有限公司 | False propaganda treatment application method and device based on intelligent voice robot |
CN111783424B (en) * | 2020-06-17 | 2024-02-13 | 泰康保险集团股份有限公司 | Text sentence dividing method and device |
CN112183114B (en) * | 2020-08-10 | 2024-05-14 | 招联消费金融股份有限公司 | Model training and semantic integrity recognition method and device |
CN112381038B (en) * | 2020-11-26 | 2024-04-19 | 中国船舶工业系统工程研究院 | Text recognition method, system and medium based on image |
CN112735413B (en) * | 2020-12-25 | 2024-05-31 | 浙江大华技术股份有限公司 | Instruction analysis method based on camera device, electronic equipment and storage medium |
CN112905869B (en) * | 2021-03-26 | 2024-07-26 | 深圳好学多智能科技有限公司 | Self-adaptive training method, device, storage medium and equipment for language model |
CN113256379A (en) * | 2021-05-24 | 2021-08-13 | 北京小米移动软件有限公司 | Method for correlating shopping demands for commodities |
CN113256378A (en) * | 2021-05-24 | 2021-08-13 | 北京小米移动软件有限公司 | Method for determining shopping demand of user |
CN114329225B (en) * | 2022-01-24 | 2024-04-23 | 平安国际智慧城市科技股份有限公司 | Search method, device, equipment and storage medium based on search statement |
CN115563394B (en) * | 2022-11-24 | 2023-03-28 | 腾讯科技(深圳)有限公司 | Search recall method, recall model training method, device and computer equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107154260A (en) * | 2017-04-11 | 2017-09-12 | 北京智能管家科技有限公司 | A kind of domain-adaptive audio recognition method and device |
CN107204184A (en) * | 2017-05-10 | 2017-09-26 | 平安科技(深圳)有限公司 | Audio recognition method and system |
CN108538286A (en) * | 2017-03-02 | 2018-09-14 | 腾讯科技(深圳)有限公司 | A kind of method and computer of speech recognition |
CN108804414A (en) * | 2018-05-04 | 2018-11-13 | 科沃斯商用机器人有限公司 | Text modification method, device, smart machine and readable storage medium storing program for executing |
CN109388743A (en) * | 2017-08-11 | 2019-02-26 | 阿里巴巴集团控股有限公司 | The determination method and apparatus of language model |
CN109817217A (en) * | 2019-01-17 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Self-service based on speech recognition peddles method, apparatus, equipment and medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139239A (en) * | 2014-05-27 | 2015-12-09 | 无锡韩光电器有限公司 | Supermarket shopping system with voice query function |
JP6353408B2 (en) * | 2015-06-11 | 2018-07-04 | 日本電信電話株式会社 | Language model adaptation device, language model adaptation method, and program |
CN106875941B (en) * | 2017-04-01 | 2020-02-18 | 彭楚奥 | Voice semantic recognition method of service robot |
CN107247759A (en) * | 2017-05-31 | 2017-10-13 | 深圳正品创想科技有限公司 | A kind of Method of Commodity Recommendation and device |
CN109344830B (en) * | 2018-08-17 | 2024-06-28 | 平安科技(深圳)有限公司 | Sentence output and model training method and device computer device and storage medium |
CN109840323A (en) * | 2018-12-14 | 2019-06-04 | 深圳壹账通智能科技有限公司 | The voice recognition processing method and server of insurance products |
CN110349568B (en) * | 2019-06-06 | 2024-05-31 | 平安科技(深圳)有限公司 | Voice retrieval method, device, computer equipment and storage medium |
-
2019
- 2019-06-06 CN CN201910492599.8A patent/CN110349568B/en active Active
- 2019-11-13 WO PCT/CN2019/117872 patent/WO2020244150A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108538286A (en) * | 2017-03-02 | 2018-09-14 | 腾讯科技(深圳)有限公司 | A kind of method and computer of speech recognition |
CN107154260A (en) * | 2017-04-11 | 2017-09-12 | 北京智能管家科技有限公司 | A kind of domain-adaptive audio recognition method and device |
CN107204184A (en) * | 2017-05-10 | 2017-09-26 | 平安科技(深圳)有限公司 | Audio recognition method and system |
CN109388743A (en) * | 2017-08-11 | 2019-02-26 | 阿里巴巴集团控股有限公司 | The determination method and apparatus of language model |
CN108804414A (en) * | 2018-05-04 | 2018-11-13 | 科沃斯商用机器人有限公司 | Text modification method, device, smart machine and readable storage medium storing program for executing |
CN109817217A (en) * | 2019-01-17 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Self-service based on speech recognition peddles method, apparatus, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN110349568A (en) | 2019-10-18 |
WO2020244150A1 (en) | 2020-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110349568B (en) | Voice retrieval method, device, computer equipment and storage medium | |
CN109885660B (en) | Knowledge graph energizing question-answering system and method based on information retrieval | |
CN109840287B (en) | Cross-modal information retrieval method and device based on neural network | |
CN112164391B (en) | Statement processing method, device, electronic equipment and storage medium | |
US9977778B1 (en) | Probabilistic matching for dialog state tracking with limited training data | |
Zhang et al. | Joint word segmentation and POS tagging using a single perceptron | |
WO2020244073A1 (en) | Speech-based user classification method and device, computer apparatus, and storage medium | |
CN107480143B (en) | Method and system for segmenting conversation topics based on context correlation | |
US7493251B2 (en) | Using source-channel models for word segmentation | |
CN112800170A (en) | Question matching method and device and question reply method and device | |
CN112069298A (en) | Human-computer interaction method, device and medium based on semantic web and intention recognition | |
US20070100814A1 (en) | Apparatus and method for detecting named entity | |
CN110121706A (en) | Response in session is provided | |
US20060020448A1 (en) | Method and apparatus for capitalizing text using maximum entropy | |
EP1619620A1 (en) | Adaptation of Exponential Models | |
CN114580382A (en) | Text error correction method and device | |
CN110096572B (en) | Sample generation method, device and computer readable medium | |
CN113672708A (en) | Language model training method, question and answer pair generation method, device and equipment | |
CN114154487A (en) | Text automatic error correction method and device, electronic equipment and storage medium | |
CN109948140B (en) | Word vector embedding method and device | |
CN113326702B (en) | Semantic recognition method, semantic recognition device, electronic equipment and storage medium | |
CN110874408B (en) | Model training method, text recognition device and computing equipment | |
CN113806510B (en) | Legal provision retrieval method, terminal equipment and computer storage medium | |
CN115274086A (en) | Intelligent diagnosis guiding method and system | |
CN116955579B (en) | Chat reply generation method and device based on keyword knowledge retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |