[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110349568B - Voice retrieval method, device, computer equipment and storage medium - Google Patents

Voice retrieval method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110349568B
CN110349568B CN201910492599.8A CN201910492599A CN110349568B CN 110349568 B CN110349568 B CN 110349568B CN 201910492599 A CN201910492599 A CN 201910492599A CN 110349568 B CN110349568 B CN 110349568B
Authority
CN
China
Prior art keywords
corpus
gram model
model
word segmentation
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910492599.8A
Other languages
Chinese (zh)
Other versions
CN110349568A (en
Inventor
黄锦伦
陈磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910492599.8A priority Critical patent/CN110349568B/en
Publication of CN110349568A publication Critical patent/CN110349568A/en
Priority to PCT/CN2019/117872 priority patent/WO2020244150A1/en
Application granted granted Critical
Publication of CN110349568B publication Critical patent/CN110349568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a voice retrieval method, a voice retrieval device, computer equipment and a storage medium. The method comprises the following steps: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through an N-gram model to obtain a recognition result; word segmentation is carried out on the recognition result, and a sentence word segmentation result corresponding to the recognition result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result. The method adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result is more accurately obtained in a recommended corpus according to the noun part-of-speech keywords.

Description

Voice retrieval method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech retrieval method, a speech retrieval device, a computer device, and a storage medium.
Background
At present, the intelligent supermarket searches the commodity through voice recognition, and generally matches the commodity through fuzzy inquiry, so that the voice recognition result needs to be analyzed at this time, and the name of the commodity required to be purchased by the user is intelligently obtained. When used, users often speak the entire sentence, for example: i want to buy XXX, I want to eat XXX, etc., while current speech recognition systems cannot judge exactly their intent to purchase.
Disclosure of Invention
The embodiment of the invention provides a voice retrieval method, a voice retrieval device, computer equipment and a storage medium, and aims to solve the problem that in the prior art, a voice recognition system is low in voice recognition accuracy in a supermarket scene, so that a recognition result is inaccurate.
In a first aspect, an embodiment of the present invention provides a voice retrieval method, including:
receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model;
Receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;
word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained;
Performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and
Searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.
In a second aspect, an embodiment of the present invention provides a voice retrieval apparatus, including:
The model training unit is used for receiving a training set corpus and inputting the training set corpus into an initial N-gram model for training to obtain an N-gram model; wherein the N-gram model is an N-gram model;
The voice recognition unit is used for receiving the voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;
the word segmentation unit is used for segmenting the recognition result to obtain a sentence word segmentation result corresponding to the recognition result;
the part-of-speech analysis unit is used for performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and
And the retrieval unit is used for searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus so as to obtain a retrieval result.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the voice retrieval method according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the speech retrieval method according to the first aspect.
The embodiment of the invention provides a voice retrieval method, a voice retrieval device, computer equipment and a storage medium. The method comprises the steps of receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result. The method adopts a voice recognition technology, and realizes accurate acquisition of user requirements by performing lexical analysis on voice recognition results.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application scenario of a voice retrieval method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a voice retrieval method according to an embodiment of the present invention;
FIG. 3 is a schematic sub-flowchart of a voice retrieval method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another sub-flowchart of a voice retrieval method according to an embodiment of the present invention;
FIG. 5 is a schematic block diagram of a voice retrieval apparatus provided by an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a subunit of a speech retrieval apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of another subunit of a speech retrieval apparatus according to an embodiment of the present invention;
Fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic application scenario diagram of a voice search method according to an embodiment of the present invention, and fig. 2 is a schematic flow diagram of a voice search method according to an embodiment of the present invention, where the voice search method is applied to a server, and the method is executed by application software installed in the server.
As shown in fig. 2, the method includes steps S110 to S150.
S110, receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model.
In this embodiment, the technical solution is described in terms of standing on a server. The server can receive the training set corpus and train to obtain an N-gram model, and the N-gram model is used for recognizing the voice to be recognized, which is uploaded to the server by the front-end voice acquisition terminal arranged in the intelligent supermarket.
In this embodiment, the training corpus is a hybrid of a general corpus and a consumer product corpus, and the consumer product corpus is a corpus including a large number of commodity names (such as commodity brands, commodity names, etc.); the universal corpus differs from the consumer corpus in that the vocabulary in the universal corpus is not biased towards a particular domain. And inputting the training set corpus into an initial N-gram model for training, and obtaining the N-gram model for voice recognition.
In one embodiment, as shown in fig. 3, step S110 includes:
s111, obtaining consumer product corpus, and inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;
S112, acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;
s113, fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.
In this embodiment, the consumer product corpus is a corpus including a large number of commodity names, and the generic corpus is different from the consumer product corpus in that the vocabulary in the generic corpus is not biased to a specific domain, but the vocabulary in each domain includes.
The N-gram model is a language model (LanguageModel, LM) which is a probabilistic based discriminant model whose input is a sentence (sequential sequence of words) and whose output is the probability of the sentence, i.e., the joint probability of the words (jointprobability).
Assuming that sentence T is composed of a lexical sequence w 1,w2,w3...wn, the N-Gram language model is formulated as follows:
P(T)=P(w1)*p(w2)*p(w3)*…*p(wn)
=p(w1)*p(w2|w1)*p(w3|w1w2)*…*p(wn|w1w2w3...)
The commonly used N-Gram models are Bi-Gram and Tri-Gram. The formulas are respectively as follows:
Bi-Gram:
P(T)=p(w1|begin)*p(w2|w1)*p(w3|w2)*…*p(wn|wn-1)
Tri-Gram:
P(T)=p(w1|begin1,begin2)*p(w2|w1,begin1)*p(w3|w2w1)*…*p(wn|wn-1,wn-2);
It can be seen that the conditional probability for each word in sentence T to occur can be derived by counting in the corpus. The n-gram is as follows:
p(wn|w1w2w3...)=C(wi-n-1,…,wi)/C(wi-n-1,…,wi-1);
Where C (w i-n-1,…,wi) represents the number of times the string w i-n-1,…,wi is in the corpus.
According to the set model fusion proportion, for example, the proportion of consumer product corpus to general corpus is set to be 2:8, the model fusion proportion of the first N-gram model and the second N-gram model is also set to be 2:8, and the first N-gram model and the second N-gram model are fused, so that the N-gram model for voice recognition is finally obtained. Because the ratio of the consumer product corpus to the general corpus is initially set, the accuracy of the voice recognition of the N-gram model obtained through final fusion in the intelligent supermarket scene is effectively improved.
In one embodiment, as shown in fig. 4, step S111 includes:
s1111, performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
s1112, inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.
In this embodiment, the word segmentation process of each sentence in the consumer product corpus through the statistical word segmentation model based on probability is as follows:
For example, let c=c1c2..cm, C be the chinese character string to be split, let w=w1w2..wn, W be the result of the split, wa, wb, … …, wk be all possible split schemes of C. Then, based on the probability statistical word segmentation model, the target word string W can be found, so that W satisfies the following conditions: p (w|c) =max (P (wa|c), P (wb|c)..p (wk|c)), and the word string W obtained by the word segmentation model is the word string with the maximum estimated probability. Namely:
For a substring S of a word to be segmented, all candidate words w1, w2, …, wi, … and wn are taken out according to the sequence from left to right; the probability value P (wi) of each candidate word is found in the dictionary, and all left neighbor words of each candidate word are recorded; calculating the cumulative probability of each candidate word, and simultaneously comparing to obtain the optimal left neighbor word of each candidate word; if the current word wn is the tail word of the character string S and the cumulative probability P (wn) is the maximum, the wn is the end word of the S; starting from wn, outputting the best left neighbor word of each word in turn from right to left, namely the word segmentation result of S. And inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model, wherein the first N-gram model has higher statement identification accuracy in an intelligent supermarket scene.
Similarly, word segmentation is carried out on the general corpus based on a probability statistical word segmentation model, so that a second word segmentation result corresponding to the general corpus is obtained; and inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model, wherein the second N-gram model has higher sentence recognition accuracy rate (namely, the recognition rate of sentences which do not deviate from a certain life scene is higher) under the ordinary life scene.
S120, receiving the voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result.
When the voice to be recognized is recognized through the N-gram model, a whole sentence, such as 'I want to buy XX brand instant noodles', is obtained, the voice to be recognized can be effectively recognized through the N-gram model, and a sentence with the largest recognition probability is obtained as a recognition result.
S130, word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained.
In one embodiment, step S130 includes:
And word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
In this embodiment, step S1111 may be referred to for a specific process of word segmentation using a statistical word segmentation model based on probability when the recognition result is segmented in step S130. And after the recognition result is segmented, part-of-speech analysis can be further performed.
And S140, performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result.
In one embodiment, step S140 includes:
And taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
In this embodiment, the process of lexical analysis by the joint lexical analysis model is as follows:
the input of the lexical analysis task is a string (which is referred to hereinafter as "sentence"), and the output is word boundaries and parts of speech, entity class in the sentence. Sequence tagging is a classical modeling approach to lexical analysis. And (3) constructing a joint lexical analysis model (namely, a LAC model), learning features by using a network structure based on GRU (gate control loop unit), and accessing the learned features into a CRF decoding layer (CRF, namely, a conditional random field) to finish sequence labeling. The CRF decoding layer essentially changes a linear model in the traditional CRF into a nonlinear neural network, and the problem of label bias can be better solved based on the likelihood probability of sentence level.
The input of the combined lexical analysis model is represented in a one-hot mode, each word represents one-hot sequence in an id through a word list, and the one-hot sequence is converted into a word vector sequence represented by a real vector; the character vector sequence is used as the input of the bidirectional GRU, the characteristic representation of the input sequence is learned, and a new characteristic representation sequence is obtained, wherein two layers of bidirectional GRU are stacked to increase the learning capacity; the CRF takes the learned characteristics of the GRU as input and the marking sequence as a supervision signal to realize the part-of-speech marking of each word in the sentence word segmentation result. Under the scene of the intelligent supermarket, the probability that the name part-of-speech keyword is a commodity brand or commodity name is larger, so that the noun part-of-speech keyword corresponding to the sentence segmentation result is selected as a screening result, and commodity retrieval is further carried out.
And S150, searching the corpus with similarity exceeding a preset similarity threshold value with the noun keywords in a pre-stored recommended corpus to obtain a retrieval result.
In this embodiment, when a noun part-of-speech keyword is obtained, searching is performed on each noun part-of-speech keyword in a preset recommended corpus, so as to obtain a word with a higher similarity with the word part-of-speech keyword, and the word is used as a search result. Searching each noun part-of-speech keyword in a preset recommended corpus to obtain a Word with larger similarity to the Word part-of-speech keyword, specifically, obtaining a Word vector corresponding to the noun part-of-speech keyword according to a Word2Vec model (the Word2Vec model is an efficient tool for representing the Word as a real value vector), and then calculating the similarity to the Word vector corresponding to each corpus in the prestored recommended corpus, wherein the similarity between the two vectors is calculated by calculating the Euclidean distance between the two vectors. If the corpus with similarity exceeding the preset similarity threshold exists in the pre-stored recommended corpus, the corresponding corpus is used as one of the search results, namely a plurality of corpora which accord with the similarity exceeding the preset similarity threshold together form the search result.
The method adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result is more accurately obtained in a recommended corpus according to the noun part-of-speech keywords.
The embodiment of the invention also provides a voice retrieval device which is used for executing any embodiment of the voice retrieval method. Specifically, referring to fig. 5, fig. 5 is a schematic block diagram of a voice retrieval device according to an embodiment of the present invention. The voice retrieval apparatus 100 may be configured in a server.
As shown in fig. 5, the speech search device 100 includes a model training unit 110, a speech recognition unit 120, a word segmentation unit 130, a part-of-speech analysis unit 140, and a search unit 150.
The model training unit 110 is configured to receive a training set corpus, input the training set corpus into an initial N-gram model for training, and obtain an N-gram model; wherein the N-gram model is an N-gram model.
In this embodiment, the technical solution is described in terms of standing on a server. The server can receive the training set corpus and train to obtain an N-gram model, and the N-gram model is used for recognizing the voice to be recognized, which is uploaded to the server by the front-end voice acquisition terminal arranged in the intelligent supermarket.
In this embodiment, the training corpus is a hybrid of a general corpus and a consumer product corpus, and the consumer product corpus is a corpus including a large number of commodity names (such as commodity brands, commodity names, etc.); the universal corpus differs from the consumer corpus in that the vocabulary in the universal corpus is not biased towards a particular domain. And inputting the training set corpus into an initial N-gram model for training, and obtaining the N-gram model for voice recognition.
In one embodiment, as shown in fig. 6, the model training unit 110 includes:
The first training unit 111 is configured to obtain a consumer product corpus, input the consumer product corpus to a first initial N-gram model, and perform training to obtain a first N-gram model;
The second training unit 112 is configured to obtain a generic corpus, input the generic corpus to a second initial N-gram model, and perform training to obtain a second N-gram model;
And the model fusion unit 113 is configured to fuse the first N-gram model and the second N-gram model according to the set model fusion ratio, so as to obtain an N-gram model.
In this embodiment, the consumer product corpus is a corpus including a large number of commodity names, and the general corpus is different from the consumer product corpus in that the vocabulary in the general corpus is not biased to a specific field.
The N-gram model is a language model (LanguageModel, LM) which is a probabilistic based discriminant model whose input is a sentence (sequential sequence of words) and whose output is the probability of the sentence, i.e., the joint probability of the words (jointprobability).
Assuming that sentence T is composed of a lexical sequence w 1,w2,w3...wn, the N-Gram language model is formulated as follows:
P(T)=P(w1)*p(w2)*p(w3)*…*p(wn)
=p(w1)*p(w2|w1)*p(w3|w1w2)*…*p(wn|w1w2w3...)
The commonly used N-Gram models are Bi-Gram and Tri-Gram. The formulas are respectively as follows:
Bi-Gram:
P(T)=p(w1|begin)*p(w2|w1)*p(w3|w2)*…*p(wn|wn-1)
Tri-Gram:
P(T)=p(w1|begin1,begin2)*p(w2|w1,begin1)*p(w3|w2w1)*…*p(wn|wn-1,wn-2);
It can be seen that the conditional probability for each word in sentence T to occur can be derived by counting in the corpus. The n-gram is as follows:
p(wn|w1w2w3...)=C(wi-n-1,…,wi)/C(wi-n-1,…,wi-1);
Where C (w i-n-1,…,wi) represents the number of times the string w i-n-1,…,wi is in the corpus.
According to the set model fusion proportion, for example, the proportion of consumer product corpus to general corpus is set to be 2:8, the model fusion proportion of the first N-gram model and the second N-gram model is also set to be 2:8, and the first N-gram model and the second N-gram model are fused, so that the N-gram model for voice recognition is finally obtained. Because the ratio of the consumer product corpus to the general corpus is initially set, the accuracy of the voice recognition of the N-gram model obtained through final fusion in the intelligent supermarket scene is effectively improved.
In one embodiment, as shown in fig. 7, the first training unit 111 includes:
the word segmentation unit 1111 is configured to segment the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
The word segmentation training unit 1112 is configured to input the first word segmentation result to a first initial N-gram model for training, so as to obtain a first N-gram model.
In this embodiment, the word segmentation process of each sentence in the consumer product corpus through the statistical word segmentation model based on probability is as follows:
For example, let c=c1c2..cm, C be the chinese character string to be split, let w=w1w2..wn, W be the result of the split, wa, wb, … …, wk be all possible split schemes of C. Then, based on the probability statistical word segmentation model, the target word string W can be found, so that W satisfies the following conditions: p (w|c) =max (P (wa|c), P (wb|c)..p (wk|c)), and the word string W obtained by the word segmentation model is the word string with the maximum estimated probability. Namely:
For a substring S of a word to be segmented, all candidate words w1, w2, …, wi, … and wn are taken out according to the sequence from left to right; the probability value P (wi) of each candidate word is found in the dictionary, and all left neighbor words of each candidate word are recorded; calculating the cumulative probability of each candidate word, and simultaneously comparing to obtain the optimal left neighbor word of each candidate word; if the current word wn is the tail word of the character string S and the cumulative probability P (wn) is the maximum, the wn is the end word of the S; starting from wn, outputting the best left neighbor word of each word in turn from right to left, namely the word segmentation result of S. And inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model, wherein the first N-gram model has higher statement identification accuracy in an intelligent supermarket scene.
Similarly, word segmentation is carried out on the general corpus based on a probability statistical word segmentation model, so that a second word segmentation result corresponding to the general corpus is obtained; and inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model, wherein the second N-gram model has higher sentence recognition accuracy rate (namely, the recognition rate of sentences which do not deviate from a certain life scene is higher) under the ordinary life scene.
The voice recognition unit 120 is configured to receive a voice to be recognized, and recognize the voice to be recognized through the N-gram model to obtain a recognition result.
When the voice to be recognized is recognized through the N-gram model, a whole sentence, such as 'I want to buy XX brand instant noodles', is obtained, the voice to be recognized can be effectively recognized through the N-gram model, and a sentence with the largest recognition probability is obtained as a recognition result.
And the recognition result word segmentation unit 130 is configured to segment the recognition result to obtain a sentence word segmentation result corresponding to the recognition result.
In an embodiment, the recognition result word segmentation unit 130 is further configured to:
And word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
In this embodiment, the word segmentation unit 1111 may refer to a specific process of word segmentation by using a statistical word segmentation model based on probability when the recognition result is segmented in the recognition result segmentation unit 130. And after the recognition result is segmented, part-of-speech analysis can be further performed.
The part-of-speech analysis unit 140 is configured to perform lexical analysis according to the sentence segmentation result, so as to obtain a noun part-of-speech keyword corresponding to the sentence segmentation result.
In an embodiment, the part-of-speech analysis unit 140 is further configured to:
And taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
In this embodiment, the process of lexical analysis by the joint lexical analysis model is as follows:
the input of the lexical analysis task is a string (which is referred to hereinafter as "sentence"), and the output is word boundaries and parts of speech, entity class in the sentence. Sequence tagging is a classical modeling approach to lexical analysis. And (3) constructing a joint lexical analysis model (namely, a LAC model), learning features by using a network structure based on GRU (gate control loop unit), and accessing the learned features into a CRF decoding layer (CRF, namely, a conditional random field) to finish sequence labeling. The CRF decoding layer essentially changes a linear model in the traditional CRF into a nonlinear neural network, and the problem of label bias can be better solved based on the likelihood probability of sentence level.
The input of the combined lexical analysis model is represented in a one-hot mode, each word represents one-hot sequence in an id through a word list, and the one-hot sequence is converted into a word vector sequence represented by a real vector; the character vector sequence is used as the input of the bidirectional GRU, the characteristic representation of the input sequence is learned, and a new characteristic representation sequence is obtained, wherein two layers of bidirectional GRU are stacked to increase the learning capacity; the CRF takes the learned characteristics of the GRU as input and the marking sequence as a supervision signal to realize the part-of-speech marking of each word in the sentence word segmentation result. Under the scene of the intelligent supermarket, the probability that the name part-of-speech keyword is a commodity brand or commodity name is larger, so that the noun part-of-speech keyword corresponding to the sentence segmentation result is selected as a screening result, and commodity retrieval is further carried out.
The searching unit 150 is configured to search, in a pre-stored recommended corpus, a corpus with similarity to the noun keyword exceeding a preset similarity threshold, so as to obtain a search result.
In this embodiment, when a noun part-of-speech keyword is obtained, searching is performed on each noun part-of-speech keyword in a preset recommended corpus, so as to obtain a word with a higher similarity with the word part-of-speech keyword, and the word is used as a search result. Searching each noun part-of-speech keyword in a preset recommended corpus to obtain a Word with larger similarity to the Word part-of-speech keyword, specifically, obtaining a Word vector corresponding to the noun part-of-speech keyword according to a Word2Vec model (the Word2Vec model is an efficient tool for representing the Word as a real value vector), and then calculating the similarity to the Word vector corresponding to each corpus in the prestored recommended corpus, wherein the similarity between the two vectors is calculated by calculating the Euclidean distance between the two vectors. If the corpus with similarity exceeding the preset similarity threshold exists in the pre-stored recommended corpus, the corresponding corpus is used as one of the search results, namely a plurality of corpora which accord with the similarity exceeding the preset similarity threshold together form the search result.
The device adopts a voice recognition technology, and the noun part-of-speech keywords are obtained after lexical analysis is carried out on the voice recognition result, so that the retrieval result can be obtained more accurately in the recommended corpus according to the noun part-of-speech keywords.
The above-described speech retrieval means may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 8.
Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers.
With reference to FIG. 8, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a speech retrieval method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a speech retrieval method.
The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 8 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 500 to which the present inventive arrangements may be implemented, as a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to perform the following functions: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.
In one embodiment, the processor 502 performs the following operations when executing the step of receiving the training set corpus, inputting the training set corpus into the initial N-gram model for training, and obtaining the N-gram model: obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model; acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model; and fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.
In one embodiment, the processor 502 performs the following operations when performing the step of inputting the consumer product corpus into the first initial N-gram model for training to obtain the first N-gram model: performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus; and inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.
In one embodiment, when executing the step of word segmentation on the recognition result to obtain a sentence word segmentation result corresponding to the recognition result, the processor 502 executes the following operations: and word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
In one embodiment, when executing the step of performing lexical analysis according to the sentence segmentation result to obtain the noun part-of-speech keyword corresponding to the sentence segmentation result, the processor 502 performs the following operations: and taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 8 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 8, and will not be described again.
It should be appreciated that in embodiments of the present invention, the processor 502 may be a Central processing unit (Central ProcessingUnit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor performs the steps of: receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model; receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result; word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained; performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; searching the corpus with similarity with the noun keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result.
In an embodiment, the receiving a training set corpus, inputting the training set corpus into an initial N-gram model for training, and obtaining the N-gram model includes: obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model; acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model; and fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model.
In an embodiment, the inputting the consumer corpus into the first initial N-gram model for training to obtain the first N-gram model includes: performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus; and inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model.
In an embodiment, the word segmentation of the recognition result to obtain a sentence word segmentation result corresponding to the recognition result includes: and word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
In an embodiment, the performing lexical analysis according to the sentence word segmentation result to obtain a noun part-of-speech keyword corresponding to the sentence word segmentation result includes: and taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (6)

1. A method of speech retrieval, comprising:
receiving a training set corpus, and inputting the training set corpus into an initial N-gram model for training to obtain the N-gram model; wherein the N-gram model is an N-gram model;
Receiving voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;
word segmentation is carried out on the identification result, and a sentence word segmentation result corresponding to the identification result is obtained;
Performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and
Searching a corpus with similarity with the noun part-of-speech keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a search result; the recommended corpus comprises a plurality of corpora, and each corpus comprises one or more keywords with noun parts of speech;
wherein the training set corpus is a mixed corpus of general corpus and consumer goods corpus; the N-gram model is a probability-based discrimination model;
The receiving training set corpus, inputting the training set corpus into an initial N-gram model for training, and obtaining an N-gram model, comprising:
Obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;
acquiring a general corpus, and inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;
According to the set model fusion proportion, fusing the first N-gram model and the second N-gram model to obtain an N-gram model;
Inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model, wherein the method comprises the following steps of:
performing word segmentation on the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model;
the step of inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model, which comprises the following steps:
performing word segmentation on the general corpus based on a probability statistical word segmentation model to obtain a second word segmentation result corresponding to the general corpus;
And inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model.
2. The method for voice search according to claim 1, wherein said word segmentation of the recognition result to obtain a sentence word segmentation result corresponding to the recognition result comprises:
And word segmentation is carried out on the recognition result based on a probability statistics word segmentation model, so that a sentence word segmentation result corresponding to the recognition result is obtained.
3. The method for voice search according to claim 1, wherein said performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result comprises:
And taking the sentence segmentation result as input of a pre-trained joint lexical analysis model to obtain noun part-of-speech keywords in the sentence segmentation result.
4. A voice retrieval apparatus, comprising:
The model training unit is used for receiving a training set corpus and inputting the training set corpus into an initial N-gram model for training to obtain an N-gram model; wherein the N-gram model is an N-gram model;
The voice recognition unit is used for receiving the voice to be recognized, and recognizing the voice to be recognized through the N-gram model to obtain a recognition result;
The recognition result word segmentation unit is used for segmenting the recognition result to obtain a sentence word segmentation result corresponding to the recognition result;
the part-of-speech analysis unit is used for performing lexical analysis according to the sentence segmentation result to obtain noun part-of-speech keywords corresponding to the sentence segmentation result; and
The retrieval unit is used for searching the corpus with similarity with the noun part-of-speech keywords exceeding a preset similarity threshold value in a pre-stored recommended corpus to obtain a retrieval result;
wherein the training set corpus is a mixed corpus of general corpus and consumer goods corpus; the N-gram model is a probability-based discrimination model;
The model training unit includes:
the first training unit is used for obtaining consumer product corpus, inputting the consumer product corpus into a first initial N-gram model for training to obtain a first N-gram model;
The second training unit is used for acquiring general corpus, inputting the general corpus into a second initial N-gram model for training to obtain a second N-gram model;
The model fusion unit is used for fusing the first N-gram model and the second N-gram model according to the set model fusion proportion to obtain an N-gram model;
The first training unit includes:
The word segmentation unit is used for segmenting the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
The word segmentation training unit is used for inputting the first word segmentation result into a first initial N-gram model for training to obtain a first N-gram model;
the second training unit includes:
performing word segmentation on the general corpus based on a probability statistical word segmentation model to obtain a second word segmentation result corresponding to the general corpus;
And inputting the second word segmentation result into a second initial N-gram model for training to obtain a second N-gram model.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the speech retrieval method of any one of claims 1 to 3 when the computer program is executed by the processor.
6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the speech retrieval method according to any one of claims 1 to 3.
CN201910492599.8A 2019-06-06 2019-06-06 Voice retrieval method, device, computer equipment and storage medium Active CN110349568B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910492599.8A CN110349568B (en) 2019-06-06 2019-06-06 Voice retrieval method, device, computer equipment and storage medium
PCT/CN2019/117872 WO2020244150A1 (en) 2019-06-06 2019-11-13 Speech retrieval method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910492599.8A CN110349568B (en) 2019-06-06 2019-06-06 Voice retrieval method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110349568A CN110349568A (en) 2019-10-18
CN110349568B true CN110349568B (en) 2024-05-31

Family

ID=68181598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910492599.8A Active CN110349568B (en) 2019-06-06 2019-06-06 Voice retrieval method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN110349568B (en)
WO (1) WO2020244150A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110349568B (en) * 2019-06-06 2024-05-31 平安科技(深圳)有限公司 Voice retrieval method, device, computer equipment and storage medium
CN110825844A (en) * 2019-10-21 2020-02-21 拉扎斯网络科技(上海)有限公司 Voice retrieval method and device, readable storage medium and electronic equipment
CN111291195B (en) * 2020-01-21 2021-08-10 腾讯科技(深圳)有限公司 Data processing method, device, terminal and readable storage medium
CN111460257B (en) * 2020-03-27 2023-10-31 北京百度网讯科技有限公司 Thematic generation method, apparatus, electronic device and storage medium
CN113642329B (en) * 2020-04-27 2024-10-29 阿里巴巴集团控股有限公司 Method and device for establishing term identification model, and method and device for term identification
CN113569128A (en) * 2020-04-29 2021-10-29 北京金山云网络技术有限公司 Data retrieval method and device and electronic equipment
CN111862970A (en) * 2020-06-05 2020-10-30 珠海高凌信息科技股份有限公司 False propaganda treatment application method and device based on intelligent voice robot
CN111783424B (en) * 2020-06-17 2024-02-13 泰康保险集团股份有限公司 Text sentence dividing method and device
CN112183114B (en) * 2020-08-10 2024-05-14 招联消费金融股份有限公司 Model training and semantic integrity recognition method and device
CN112381038B (en) * 2020-11-26 2024-04-19 中国船舶工业系统工程研究院 Text recognition method, system and medium based on image
CN112735413B (en) * 2020-12-25 2024-05-31 浙江大华技术股份有限公司 Instruction analysis method based on camera device, electronic equipment and storage medium
CN112905869B (en) * 2021-03-26 2024-07-26 深圳好学多智能科技有限公司 Self-adaptive training method, device, storage medium and equipment for language model
CN113256379A (en) * 2021-05-24 2021-08-13 北京小米移动软件有限公司 Method for correlating shopping demands for commodities
CN113256378A (en) * 2021-05-24 2021-08-13 北京小米移动软件有限公司 Method for determining shopping demand of user
CN114329225B (en) * 2022-01-24 2024-04-23 平安国际智慧城市科技股份有限公司 Search method, device, equipment and storage medium based on search statement
CN115563394B (en) * 2022-11-24 2023-03-28 腾讯科技(深圳)有限公司 Search recall method, recall model training method, device and computer equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154260A (en) * 2017-04-11 2017-09-12 北京智能管家科技有限公司 A kind of domain-adaptive audio recognition method and device
CN107204184A (en) * 2017-05-10 2017-09-26 平安科技(深圳)有限公司 Audio recognition method and system
CN108538286A (en) * 2017-03-02 2018-09-14 腾讯科技(深圳)有限公司 A kind of method and computer of speech recognition
CN108804414A (en) * 2018-05-04 2018-11-13 科沃斯商用机器人有限公司 Text modification method, device, smart machine and readable storage medium storing program for executing
CN109388743A (en) * 2017-08-11 2019-02-26 阿里巴巴集团控股有限公司 The determination method and apparatus of language model
CN109817217A (en) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 Self-service based on speech recognition peddles method, apparatus, equipment and medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139239A (en) * 2014-05-27 2015-12-09 无锡韩光电器有限公司 Supermarket shopping system with voice query function
JP6353408B2 (en) * 2015-06-11 2018-07-04 日本電信電話株式会社 Language model adaptation device, language model adaptation method, and program
CN106875941B (en) * 2017-04-01 2020-02-18 彭楚奥 Voice semantic recognition method of service robot
CN107247759A (en) * 2017-05-31 2017-10-13 深圳正品创想科技有限公司 A kind of Method of Commodity Recommendation and device
CN109344830B (en) * 2018-08-17 2024-06-28 平安科技(深圳)有限公司 Sentence output and model training method and device computer device and storage medium
CN109840323A (en) * 2018-12-14 2019-06-04 深圳壹账通智能科技有限公司 The voice recognition processing method and server of insurance products
CN110349568B (en) * 2019-06-06 2024-05-31 平安科技(深圳)有限公司 Voice retrieval method, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108538286A (en) * 2017-03-02 2018-09-14 腾讯科技(深圳)有限公司 A kind of method and computer of speech recognition
CN107154260A (en) * 2017-04-11 2017-09-12 北京智能管家科技有限公司 A kind of domain-adaptive audio recognition method and device
CN107204184A (en) * 2017-05-10 2017-09-26 平安科技(深圳)有限公司 Audio recognition method and system
CN109388743A (en) * 2017-08-11 2019-02-26 阿里巴巴集团控股有限公司 The determination method and apparatus of language model
CN108804414A (en) * 2018-05-04 2018-11-13 科沃斯商用机器人有限公司 Text modification method, device, smart machine and readable storage medium storing program for executing
CN109817217A (en) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 Self-service based on speech recognition peddles method, apparatus, equipment and medium

Also Published As

Publication number Publication date
CN110349568A (en) 2019-10-18
WO2020244150A1 (en) 2020-12-10

Similar Documents

Publication Publication Date Title
CN110349568B (en) Voice retrieval method, device, computer equipment and storage medium
CN109885660B (en) Knowledge graph energizing question-answering system and method based on information retrieval
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
CN112164391B (en) Statement processing method, device, electronic equipment and storage medium
US9977778B1 (en) Probabilistic matching for dialog state tracking with limited training data
Zhang et al. Joint word segmentation and POS tagging using a single perceptron
WO2020244073A1 (en) Speech-based user classification method and device, computer apparatus, and storage medium
CN107480143B (en) Method and system for segmenting conversation topics based on context correlation
US7493251B2 (en) Using source-channel models for word segmentation
CN112800170A (en) Question matching method and device and question reply method and device
CN112069298A (en) Human-computer interaction method, device and medium based on semantic web and intention recognition
US20070100814A1 (en) Apparatus and method for detecting named entity
CN110121706A (en) Response in session is provided
US20060020448A1 (en) Method and apparatus for capitalizing text using maximum entropy
EP1619620A1 (en) Adaptation of Exponential Models
CN114580382A (en) Text error correction method and device
CN110096572B (en) Sample generation method, device and computer readable medium
CN113672708A (en) Language model training method, question and answer pair generation method, device and equipment
CN114154487A (en) Text automatic error correction method and device, electronic equipment and storage medium
CN109948140B (en) Word vector embedding method and device
CN113326702B (en) Semantic recognition method, semantic recognition device, electronic equipment and storage medium
CN110874408B (en) Model training method, text recognition device and computing equipment
CN113806510B (en) Legal provision retrieval method, terminal equipment and computer storage medium
CN115274086A (en) Intelligent diagnosis guiding method and system
CN116955579B (en) Chat reply generation method and device based on keyword knowledge retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant